Sunday, 27 July 2014

Chapter 8 - part 3, under the bonnet. With a bonus

Snapshot exports

Since PostgreSQL 9.2 are supported the transaction's snapshot exports. A session with an open transaction, can export its consistent snapshot to any other session. The snapshot remains valid meanwhile the transaction is open. Using this functionality offers a way to run multiple backends on a consistent data set frozen in time. This feature resulted in the brilliant parallel export in the 9.3's pg_dump as described in 8.3.
In the following example, let's consider the table created in 5.5. We'll first start an explicit transaction and then we'll export the current snapshot.

 We are first starting a transaction with the REPEATABLE READ isolation level. The second statement exports the current snapshot using the function pg_export_snapshot(). Finally we are checking with a simple row count the table t_data have data inside.

We can now login with in a different session and delete all the rows from the t_data table.

With the table now empty let's import the snapshot exported by the first backend.

The function pg_export_snapshot saves the current snapshot returning the text string which identifies the snapshot. Passing the string to clients that want to import the snapshot gives to independent sessions a single consistent vision. The import is possible only until the end of the transaction that exported it. The export is useful only in the READ COMMITTED transactions, because the REPEATABLE READ and higher isolation levels use the same snapshot within their lifetime.

pg_dump under the bonnet

The pg_dump source code gives a very good picture of what exactly the backup software does. The process runs into with fixed transaction's isolation level accordingly with the server's version. The distinction is required because, becoming PostgreSQL more sophisticated at each major release, the isolation levels became more and more strict with their meanings.
More informations about the transactions are in 5.7.
From PostgreSQL 9.1 the transaction serializable became a real serialisation. The transaction's behaviour offered by the serializable in the version up to 9.0 were assigned to the REPETABLE READ transaction's isolation, its real kind. The SERIALIZABLE transaction's isolation level is still used with DEFERRABLE option when pg_dump is executed with the option -serializable-deferrable as seen in 8.1.3. The switch have effect only on the remote server with version 9.1 and later. The transaction is also set to READ ONLY, when supported by the server, in order to limit the XID generation.

Table 8.1: pg_dump's transaction isolation levels
Server version Command
>= 9.1 with -serializable-deferrable SERIALIZABLE, READ ONLY, DEFERRABLE

From the version 9.3 pg_dump supports the parallel dump using the feature seen in 5.7.1. As the snapshot export is supported from the version 9.2 this permit a parallel dump from the older major version if using the newer backup program. However, pg_dump accepts the option -no-synchronized-snapshots in order to dump in parallel jobs from the older versions. The data is not supposed to be consistent if there are read write transactions during this kind of export. To have a consistent export in this case all the transactions which modify the data must be stopped meanwhile the export is in progress.

When exporting in parallel the only permitted format is the directory. The pg_restore program since the version 9.3 supports also the directory format for the paralelle data restoration. The combination of the parallel jobs backup and the parallel jobs restore, can improve massively either the backup and the recovery speed in case of disaster.

No comments:

Post a Comment