MySQL to PostgreSQL Replica

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+





The system uses the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The github repository is available here https://github.com/the4thdoctor/pg_chameleon/ 

The system is now in RC1 and is available on Pypi for tests.

The documentation is available here 

The replica initialisation  pulls the data from MySQL locking the database in read only mode using FLUSH TABLE WITH READ LOCK; .

It's possible to pull the data from a MySQL  cascading replica if the MySQL slave is configured with log-slave-updates.

Installation in virtualenv

The system is designed to work within a virtualenv. Is still possible to install the library and script system wide though.

No daemon

The replica should be started using a cron job. As the replica detects if another process is running is safe to have a periodic start (e.g. every 30 minutes) to keep the script running in the case of crash.


DDL replica limitations

DDL and DML mixed in the same transaction are not decoded in the right order. This can result in a replica issues caused by a wrong jsonb structure, should the DML change the data on the same table modified by the DDL.

Changelog from 1.0 Beta 2

  • add support for primay key defined as column constraint
  • fix regression if null constraint is omitted in a primary key column
  • add foreign key generation to detach replica. keys are added invalid and a validation attempt is performed.
  • add support for truncate table
  • add parameter out_dir to set the output destination for the csv files during init_replica
  • add set tables_only to table_limit when streaming the mysql replica
  • force a close batch on rotate event if binlog changes without any row events collected
  • fixed replica regression with python 3.x and empty binary data
  • added event_update in hexlify strategy
  • add tabulate for nice display for sources/status
  • logs are rotated on a daily basis
  • removed parameter log_append
  • add parameter log_days_keep to specify how many days keep the logs
  • feature freeze

Changelog from 1.0 Beta 1

  • add detach replica with automatic sequence reset (no FK generation yet)
  • replica speed improvement with the exclusion of BEGIN,COMMIT when capturing the queries from MySQL
  • fix the capturing regexp for primary keys and foreign keys
  • fix version in documentation

 Changelog from v1.0-alpha.4

  • changed not python files in package  to work properly with system wide installations
  • fixed issue with ALTER TABLE ADD CONSTRAINT
  • add datetime.timedelta to json encoding exceptions
  • added support for enum in ALTER TABLE MODIFY
  • requires psycopg2 2.7 which installs without postgresql headers
  • the write_batch function is now using the copy_expert in order to speedup the batch load. The fallback to inserts is still present.

 Changelog from v1.0-alpha.3

  • Add batch retention to avoid bloating of t_replica_batch
  • Packaged for pip, now you can install the replica tool in a virtual env just typing pip install pg_chameleon

 Changelog from v1.0-alpha.2

  • Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY)
  • Replica from multiple MySQL schema or servers
  • Python 3 support

No comments:

Post a Comment