MySQL to PostgreSQL Replica

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+





The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The github repository is available here https://github.com/the4thdoctor/pg_chameleon/ 

The system is currently in alpha4 and is the first release available on Pypi for tests.

The documentation is available here http://pg-chameleon.readthedocs.io/en/v1.0-beta.1/

The replica initialisation  pulls the data from MySQL locking the database in read only mode using FLUSH TABLE WITH READ LOCK; .

It's possible to have a MySQL -> MySQL -> PostgreSQL cascading replica f the MySQL slave is configured with log-slave-updates.

 Changelog from v1.0-alpha.4

  • changed not python files in package  to work properly with system wide installations
  • fixed issue with ALTER TABLE ADD CONSTRAINT
  • add datetime.timedelta to json encoding exceptions
  • added support for enum in ALTER TABLE MODIFY
  • requires psycopg2 2.7 which installs without postgresql headers
  • the write_batch function is now using the copy_expert in order to speedup the batch load. The fallback to inserts is still present.

 Changelog from v1.0-alpha.3

  • Add batch retention to avoid bloating of t_replica_batch
  • Packaged for pip, now you can install the replica tool in a virtual env just typing pip install pg_chameleon

 Changelog from v1.0-alpha.2

  • Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY)
  • Replica from multiple MySQL schema or servers
  • Python 3 support

Installation in virtualenv

The system is designed to work within a virtualenv. Is still possible to install the library and script system wide.

However when installed system wide the user directory .pg_chameleon is not created automatically. you can either create it by hand or use the global /usr/local/etc/pg_chameleon dir (not recommended for security reasons).

No daemon yet

The script should be executed in a screen session to keep it running. Currently there's no respawning of the process on failure nor failure detector.

DDL replica limitations

DDL and DML mixed in the same transaction are not decoded in the right order. This can result in a replica breakage caused by a wrong jsonb descriptor if the DML change the data on the same table modified by the DDL. I know the issue and I'm working on a solution.
Test please!
Please submit the issues you find.
Bear in mind this is an alpha release. if you use the software in production keep an eye on the process to ensure the data is correctly replicated.

No comments:

Post a Comment