Sunday, 15 June 2014

Chapter 7 - Part 1 - VACUUM


The database maintenance is something crucial for the efficiency of the data access, the integrity and the reliability. Any database sooner or later will need a proper maintenance plan.
When a new tuple's version is generated by an update it can be put everywhere there's free space. Frequent updates can result in tuples moving across the data pages many and many times leaving a trail of dead tuples behind them. Because the dead tuples are physically stored but no longer visible this creates an extra overhead causing the table to bloat. Indices makes things more complicated because when a tuple changes page the index entry is updated to point the new page and because of the index's ordered structure, the bloating is more probable than the table.


VACUUM is a PostgreSQL specific command which reclaims back the dead tuple's space. When called without specifying a target table, the command processes all the tables in the database. Running regulary VACUUM have some beneficial effects.
  • It reclaims back the dead tuple's disk space.
  • It updates the visibility map making the index scans run faster.
  • It freezes the tuples with old XID protecting from the XID wraparound data loss
The optional ANALYZE clause also gather the statistics on processed table, more details here 7.2.

A standard VACUUM's run, frees the space used by the dead rows inside the data files but doesn't returns the space to the operating system. VACUUM doesn't affects the common database activity but prevents any schema change on the processed table. Because the pages are rewritten, a VACUUM run increases substantially the I/O activity.

The presence of one or more empty pages in the table's end can be removed by VACUUM if an exclusive lock on the relation can be obtained immediately. When this happens the table is scanned backward to find all the empty pages and then it's truncated to the first not empty page. The index pages are scanned as well and the dead tuples are also cleared. The VACUUM's truncate scan works only on the heap data files. VACUUM'S performances are influenced by the maintenance_work_mem only if the table have indices, otherwise the VACUUM will run the cleanup sequentially without storing the tuple's references for the index cleanup.

To show the effect of the maintenance_work_mem let's build build a simple table with 10 million rows.

To have a statical environment we'll disable the table's autovacuum. More infos on autovacuum here 7.5. We'll also increase the session's verbosity to look out what's happening during the VACUUM's run.

We are now executing a complete table rewrite running an UPDATE without the WHERE condition. This will create 10 millions of dead rows.

Before running the VACUUM we'll change the maintenance_work_mem to a small value enabling the the timing to check the query duration.

During the VACUUM the the maintenance_work_mem is used to store an array of TCID referencing the dead tuples for the index cleanup. If the maintenance_work_mem is small and the dead rows are many, the memory fills up often. When this happens the table scan pauses and the index is scanned searching for the tuples stored into the array. When the index scan is complete the array of TCID is emptied and the table's scan resumes. Increasing the maintenance_work_mem to 2 GB7.1 the index scan is executed in one single run resulting in a VACUUM 32 seconds faster.

A table without indices does use the maintenance_work_mem. For example if we run the VACUUM after dropping the table's primary key the execution is faster even with the low maintenance_work_mem setting.

The table seen in the example begins with a size of 806 MB . After the update the table double its size which remains the same during the VACUUM runs the updates. This happens because after the first insert the table had all the rows packed together; the update added in the table's bottom the new row versions leaving the previous 10 millions row on the table's top as dead tuples. The VACUUM's run cleared the space on the table's top but weren't able to truncate because all the rows packed in the table's bottom. Running a new UPDATE followed by VACUUM would free the space in the table's bottom and a truncate scan would succeed but only if there's no tuple in the table's end free space. To check if the vacuum is running effectively the tables should show an initial growt followed by a substantial size stability in time. This happens only if the new rows are versioned at the same rate of the old rows clear down.

The XID wraparound failure protection is performed automatically by VACUUM which when it finds a live tuple with a t_xmin's age bigger than the GUC parameter vacuum_freeze_min_age, then it replaces the tuple's creation XID with the FrozenXID preserving the tuple's visibility forever. Because VACUUM by default skips the pages without dead tuples it will miss some aging tuples. That's the reason why it's present a second GUC parameter, vacuum_freeze_table_age, which triggers a VACUUM's full table scan when the table's relfrozenxid age exceeds the value.

VACUUM accepts the FREEZE clause which forces a complete tuple freeze regardless to the age. That's equivalent to run the VACUUM setting the vacuum_freeze_min_age to zero.

There are few GUC parameters controlling VACUUM


As seen before, VACUUM this parameter triggers a whole table scan if the table's pg_class.relfrozenxid field is older than the parameter's value. The default is 150 million transactions. The accepted values are from zero to one billion. Anyway VACUUM will silently set the effective value to the 95% of autovacuum_freeze_max_age, in order to give more chance to the manual VACUUM to run before an anti-wraparound autovacuum.


Sets the age in transactions when VACUUM replaces the tuple's transaction IDs with the FrozenXID. The default is 50 million transactions. Same as for vacuum_freeze_table_age tThe accepted values are from zero to one billion. With this parameter VACUUM will silently set the effective value to half the value of autovacuum_freeze_max_age in order to increase the time between the forced autovacuums.


As from 9.3 VACUUM take care also of the multiple transaction ID, used to store the row locks in the tuple's header. Because the multixact ID is implemented same as the XID the value suffers of the same wraparound failure. This parameter sets the age's limit for VACUUM to performs a whole table scan. The table's pg_class.relminmxid is used to check the transaction's age. . The default is 150 million multixacts. Also VACUUM limits the effective value to 95% of autovacuum_multixact_freeze_max_age, so that a periodical manual VACUUM has a chance to run before an anti-wraparound is launched for the table.


Sets the age in multixacts when VACUUM replaces the multixact IDs with a newer transaction ID or multixact ID while scanning a table. The default is 5 million multixacts. The accepted values are from zero to one billion but VACUUM will silently limit the effective value to half the value of autovacuum_multixact_freeze_max_age, in order to increase the time between forced autovacuums.


This parameter have effect only on the master in hot standby configurations. When set to a positive value on the master, can reduce the risk of query conflicts on the standby. Does not have effect on the standby though.


This parameter, if set to a not zero value enables the cost based vacuum delay and sets the time, in milliseconds, that the VACUUM process sleeps when the cost limit has been exceeded. The default value is zero, which disables the cost-based vacuum delay feature.


This parameter sets the value when the internal counter puts the VACUUM process to sleep. The default value is 200.


The counter's cost is determined using arbitrary values for each vacuum operation. This parameter sets the estimated cost for vacuuming one buffer found in the shared buffer cache. It represents the cost to lock the buffer pool, lookup the shared hash table and scan the content of the page. The default value is one.


This parameter sets the estimated cost for vacuuming a buffer not present in the shared buffer. This represents the effort to lock the buffer pool, lookup the shared hash table, read the desired block in from the disk and scan its content. The default value is 10.


This parameter sets the estimated cost charged when vacuum changes a previously clean block. It represents the extra I/O required to flush the dirty block out to disk again. The default value is 20.

1 comment: