Sunday, 22 June 2014

Chapter 7 - Part 3, reindex


A B-tree index entry carries the indexed value with the pointer to the corresponding heap page where the full tuple is stored. The pages are organised in the form of a balanced tree referring each other using the special space seen in 6.1. Until the heap tuple does not change page, the index entry doesn't need update. The HOT strategy tries to achieve this goal keeping the heap tuples in the same page. When the heap tuple's page change then the index entry needs to be updates as well. Having the index pages fillfactor of 70 for the not leaf pages and a fillfactor of 90 for the leaf pages the index tuple's page change is somewhat limited.

The VACUUM is not so effective with the indices, even clearing down the dead tuple's space this is reusable only if compatible with the B-tree position. Only the completely empty pages can be recycled and this requires at least two VACUUM runs, because when an index page is empty then is marked as deleted, it's stamped with the next XID and becomes invisible. The empty page then is not immediately recycled because there might be running scans which need to access the page which is still visible in their snapshot. The next VACUUM run will ensure the page is no longer required by the running transactions and the it marks as reusable.

For those reasons the indices are affected by the data bloat more than the tables. Alongside with a bigger disk space allocation, the bloat results in a generally bad index's performances. The reindex is then the only solution to restore an index good shape.

Unlike the VACUUM, REINDEX have a substantial impact on the database activity. To ensure the data is consistently read the REINDEX sets a table lock which prevents the table's writes. The reads are also blocked for the SELECT queries using the rebuilding index.

A B-tree index build requires a data sort to build the balanced tree. PostgreSQL comes with a handy GUC parameter to track the sort, the trace_sort. The message verbosity level by the trace is DEBUG, then also the client_min_messages needs adjustment for getting the informations from the trace. Let's then reindex the table's primary key seen in 7.1.

The reindex performs a data sort which does not fit in the maintenance_work_mem. PostgreSQL then starts a slower disk sort to build up the index. The first LOG entry with begin index sort: shows the available maintenance_work_mem for the index sort. If after the table scan the available memory is exhausted then an external sort on disk will happen. Otherwise a faster sort in memory will build the index. Increasing then the maintenance_work_mem can improve the reindex. Unfortunately the determining the value when the sort in memory happens is not simple and can just be guessed from the index size. The previous reindex with 1 GB maintenance_work_mem runs 40% faster.

The reindex create a completely new filenode for the index and when the build is complete, the pg_class entry is then updated with the new relfilenode's value and old filenode is deleted. The entire sequence can be emulated creating a new index with a different name with the CREATE INDEX statement. After the index is ready, dropping the old one, renaming the new index to the old name, will result in a brand new index without blocking the reads.

Since the version 8.2 PostgreSQL supports the CREATE INDEX CONCURRENTLY which doesn't block reads nor writes. Using this method, the index creation starts adding a new invalid index in the system catalogue starting a first table scan to builds the entries without caring for the changing data. When the first build is complete a second table scan fixes the not consistent tuple's references and finally the index's status is set to valid becoming available. This approach, combined with the swap strategy can limit greatly the impact of the index maintenance.

The concurrent index build have indeed some caveats and limitations.
  • Any problem with the table scan will fail the command and leave behind an invalid index which is ignored for the reads but adds overhead for the inserts and updates.
  • When building an unique index concurrently this start enforcing the uniqueness when the second table scan starts. Some transactions could then start reporting the uniqueness violation before the index becomes available and if the build fails during the second scan then the invalid index continues to enforce the uniqueness.
  • Regular index builds can run in parallel on the same table. Concurrent index builds cannot.
  • Concurrent index builds cannot run within a transaction block.
With the primary keys and unique constraints is also possible to use the index swap strategy with an extra little trick. PostgreSQL since the version 9.1 supports the ALTER TABLE table_name ADD table_constraint using_index statement. Combining this statement in a single command with a DROP CONSTRAINT it's possible to swap the constraint's index without losing the uniqueness enforcement.

The example uses a regular index build and then blocks the writes. It's also possible to build the new index concurrently.
As this method works well for the unreferenced primary or unique keys, any foreign key referencing the unique field will cause the drop constraint's failure.

In this case the safest way to proceed is the conventional REINDEX.

No comments:

Post a Comment