Tuesday, 20 September 2016

Mighty morphin power elephant

Back in the 2013 I started playing with sqlalchemy to create a simple extractor from heterogeneous systems to be pushed in postgresql.
I decided to give the project a name which recalled the transformation and I called pg_chameleon.

To be honest I didn't like sqlalchemy.  Like any other ORM adds an interface to the data layer with a mental approach to the data itself. I lost the interest to developing a migrator very soon, and after all there are thousands of similar tools thousands of times better than mine (e.g. the awesome pgloader)

However recently I revamped the project after discovering a python library capable to read the mysql replication protocol. In few weeks I cleaned all the sqlalchemy stuff, rebuilt the metadata extraction using the information_schema and finally I had an usable tool to replicate the data across the two systems.

I've also changed the license from GPL to the 2 clause BSD.

The tool requires testing. I'm absolutely sure is full of bugs and issues, but it seems to work quite nice.

Some key aspects:

  • Is developed in python 2.7. Bear with me, I'll build a port to python 3.x when/if the project will get to an initial  release.
  • I use tabs (4 space tabs). Bear with me again. I tried to use spaces and I almost thrown my laptop out of the window
  • setup.py is not working. I'll fix this as soon as I'll do a release.
  • Yes, the sql part use the "hungarian notation" and the keywords are uppercase with strange indentation on the statements .  
  • The DDL are not yet replicated. I'm thinking to a clever approach to the problem.

That's it. If you want to test it please do and try to break the tool :)

The tool is on github here: https://github.com/the4thdoctor/pg_chameleon/

Friday, 2 September 2016

News from the outer ring

After the summer break the Brighton PostgreSQL meetup restarts with the monthly technical talks.

This time is my round again. I'll speak on how to scale the backup and recovery on large postgres installations.

Actually this is the talk I've submitted to the european pgconf.
I made the talk in a storytelling form in order to avoid to bore the audience to the death. The talk should be  quite entertaining with explanation of  the issues solved  by the DBA over the years.

As google is removing the hangouts on air I'm using youtube live and OBS to stream this event. It's the first time I try and I cannot guarantee it will work.

I'll record the presentation just in case the stream is broken.

Event details

PostgreSQL - backup and recovery with large databases


Friday 9th September 19.00 London Time Zone

Location: Brandwatch - 1st Floor Sovereign House, Church St, Brighton, East Sussex BN1 1UJ, Brighton

Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.

RSVP here

Live stream (hopefully) here