MaintenanceThe database maintenance is something crucial for the efficiency of the data access, the integrity and the reliability. Any database sooner or later will need a proper maintenance plan.
When a new tuple's version is generated by an update it can be put everywhere there's free space. Frequent updates can result in tuples moving across the data pages many and many times leaving a trail of dead tuples behind them. Because the dead tuples are physically stored but no longer visible this creates an extra overhead causing the table to bloat. Indices makes things more complicated because when a tuple changes page the index entry is updated to point the new page and because of the index's ordered structure, the bloating is more probable than the table.
vacuumVACUUM is a PostgreSQL specific command which reclaims back the dead tuple's space. When called without specifying a target table, the command processes all the tables in the database. Running regulary VACUUM have some beneficial effects.
- It reclaims back the dead tuple's disk space.
- It updates the visibility map making the index scans run faster.
- It freezes the tuples with old XID protecting from the XID wraparound data loss
A standard VACUUM's run, frees the space used by the dead rows inside the data files but doesn't returns the space to the operating system. VACUUM doesn't affects the common database activity but prevents any schema change on the processed table. Because the pages are rewritten, a VACUUM run increases substantially the I/O activity.
The presence of one or more empty pages in the table's end can be removed by VACUUM if an exclusive lock on the relation can be obtained immediately. When this happens the table is scanned backward to find all the empty pages and then it's truncated to the first not empty page. The index pages are scanned as well and the dead tuples are also cleared. The VACUUM's truncate scan works only on the heap data files. VACUUM'S performances are influenced by the maintenance_work_mem only if the table have indices, otherwise the VACUUM will run the cleanup sequentially without storing the tuple's references for the index cleanup.
To show the effect of the maintenance_work_mem let's build build a simple table with 10 million rows.
To have a statical environment we'll disable the table's autovacuum. More infos on autovacuum here 7.5. We'll also increase the session's verbosity to look out what's happening during the VACUUM's run.
We are now executing a complete table rewrite running an UPDATE without the WHERE condition. This will create 10 millions of dead rows.
Before running the VACUUM we'll change the maintenance_work_mem to a small value enabling the the timing to check the query duration.
During the VACUUM the the maintenance_work_mem is used to store an array of TCID referencing the dead tuples for the index cleanup. If the maintenance_work_mem is small and the dead rows are many, the memory fills up often. When this happens the table scan pauses and the index is scanned searching for the tuples stored into the array. When the index scan is complete the array of TCID is emptied and the table's scan resumes. Increasing the maintenance_work_mem to 2 GB7.1 the index scan is executed in one single run resulting in a VACUUM 32 seconds faster.
A table without indices does use the maintenance_work_mem. For example if we run the VACUUM after dropping the table's primary key the execution is faster even with the low maintenance_work_mem setting.
The table seen in the example begins with a size of 806 MB . After the update the table double its size which remains the same during the VACUUM runs the updates. This happens because after the first insert the table had all the rows packed together; the update added in the table's bottom the new row versions leaving the previous 10 millions row on the table's top as dead tuples. The VACUUM's run cleared the space on the table's top but weren't able to truncate because all the rows packed in the table's bottom. Running a new UPDATE followed by VACUUM would free the space in the table's bottom and a truncate scan would succeed but only if there's no tuple in the table's end free space. To check if the vacuum is running effectively the tables should show an initial growt followed by a substantial size stability in time. This happens only if the new rows are versioned at the same rate of the old rows clear down.
The XID wraparound failure protection is performed automatically by VACUUM which when it finds a live tuple with a t_xmin's age bigger than the GUC parameter vacuum_freeze_min_age, then it replaces the tuple's creation XID with the FrozenXID preserving the tuple's visibility forever. Because VACUUM by default skips the pages without dead tuples it will miss some aging tuples. That's the reason why it's present a second GUC parameter, vacuum_freeze_table_age, which triggers a VACUUM's full table scan when the table's relfrozenxid age exceeds the value.
VACUUM accepts the FREEZE clause which forces a complete tuple freeze regardless to the age. That's equivalent to run the VACUUM setting the vacuum_freeze_min_age to zero.
There are few GUC parameters controlling VACUUM