Heap Only Tuple and Index-Only Scans
Heap Only Tuple (HOT)
The HOT was implemented in version 8.3 to effectively use the pages of both index and table when the updated row is stored in the same table page that stores the old row. The HOT also reduces the necessity of VACUUM processing.
Update a Row Without HOT
In this case, PostgreSQL inserts not only the new table tuple but also the new index tuple in the index page
Update a row with HOT
For example, in this case, Tuple_1
and Tuple_2
are set to the HEAP_HOT_UPDATED
bit and the HEAP_ONLY_TUPLE
bit, respectively. These bits are used regardless of the pruning
and the defragmentation
processes.
Pruning of the line pointers
Defragmentation of the dead tuples
Note that the cost of defragmentation is less than the cost of normal VACUUM processing because defragmentation does not involve removing the index tuples.
Thus, using HOT reduces the consumption of both indexes and tables of pages; this also reduces the number of tuples that the VACUUM processing has to process. Therefore, HOT has a good influence on performance because it eventually reduces the number of insertions of the index tuples by updating and the necessity of VACUUM processing.
The Cases in which HOT is not available
When the updated tuple is stored in the other page, which does not store the old tuple, the index tuple that points to the tuple is also inserted in the index page.
Index-Only Scans
To reduce the I/O (input/output) cost, index-only scans (often called index-only access) directly use the index key without accessing the corresponding table pages when all of the target entries of the SELECT statement are included in the index key.
In the following, using a specific example, a description of how index-only scans in PostgreSQL perform is given.
We have a table ‘tbl’ of which the definition is shown below:
1 2 3 4 5 6 7 8 9 |
|
Let us explore how PostgreSQL reads tuples when the following SELECT command is executed.
1 |
|
This query gets data from two columns of the table: ‘id’ and ‘name’, and the index ‘tbl_idx’ is composed of these columns. Thus, when using index scan, it seems at first glance that accessing the table pages is not required because the index tuples contain the necessary data.
However, in fact, PostgreSQL has to check the visibility of the tuples in principle, and the index tuples do not have any information about transactions such as the t_xmin
and t_xmax
of the heap tuples.
PostgreSQL uses the visibility map
of the target table. If all tuples stored in a page are visible, PostgreSQL uses the key of the index tuple and does not access the table page that is pointed at from the index tuple to check its visibility; otherwise, PostgreSQL reads the table tuple that is pointed at from the index tuple and checks the visibility of the tuple, which is the ordinary process.
In above example, Tuple_18
need not be accessed because the 0th
page that stores Tuple_18
is visible, that is, all tuples including Tuple_18
in the 0th
page are visible. In contrast, Tuple_19
needs to be accessed to treat the concurrency control because the visibility of the 1st
page is not visible