GIN Index Pending Lists (Fast Updates)
A GIN - generalised inverted - index is a useful alternative index access method in PostgreSQL. It
stores a list of row IDs for each key, rather than (key, rowid) pairs like the btree (default)
access method. This makes it useful for indexing composite values (e.g. arrays, hstore), where the
same value is likely to appear in many rows, and a single row many have many values in the indexed
item. As well as this, the btree_gin
extension implements operator classes with equivalent
behaviour to btree indexes for most default PostgreSQL data types. This can be useful when
indexing a column has that has few distinct values relative to the number of rows in the table, as
the index is much more compact: keys are stored once, rather than once per indexed tuple.
However, a problem with GIN is that updating the index can be slow: a single row may contain
multiple values in the indexed item, requiring multiple insertions into the index when a row is
inserted or updated. For this reason, by default GIN accumulates new tuples to insert into the
index in a pending list. Tuples are added to this list until the index is vacuumed or it reaches
gin_pending_list_limit
, when the tuples are added in bulk to the index itself. The best
situation is to have these updates happen in the background with an autovacuum or autoanalyze. If
they happen in the foreground due to getting too large, it will considerably slow down the command
that triggered the update.
The problem with delaying index updates is that all items in the pending list have to be scanned every time the index is searched.
Example
We'll use an index on a value with a relatively small number of distinct integer values to
illustrate the size advantage of a GIN index, using the btree_gin
extension.
SET ROLE postgres; CREATE EXTENSION IF NOT EXISTS btree_gin; RESET ROLE;
SET CREATE EXTENSION RESET
DROP TABLE IF EXISTS testtable;
DROP TABLE
CREATE TABLE testtable AS SELECT generate_series(1, 10) x, y FROM (SELECT generate_series(1, 10000)) y(y);
SELECT 100000
CREATE INDEX testtable_x_idx_btree ON testtable (x);
CREATE INDEX
CREATE INDEX testtable_x_idx_gin ON testtable USING gin (x);
CREATE INDEX
The GIN index is about ten times smaller than the btree index (the size of the GIN index was reduced with the posting list compression added in 9.4 - see backend/access/gin/ginpostinglist.c):
\di+ testtable_x_idx_*
List of relations | ||||||
Schema | Name | Type | Owner | Table | Size | Description |
public | testtable_x_idx_btree | index | pete | testtable | 2208 kB | |
public | testtable_x_idx_gin | index | pete | testtable | 256 kB |
Determining the current size of the pending list
We can use the pageinspect
extension to find out how many items are in the GIN pending
list. This requires superuser access:
SET ROLE postgres; CREATE EXTENSION IF NOT EXISTS pageinspect; RESET ROLE;
SET CREATE EXTENSION RESET
The pageinspect
extension has a gin_metapage
function in 9.5+:
SET ROLE postgres; SELECT pending_head, pending_tail, n_pending_pages, pg_size_pretty(n_pending_pages * 8192) AS size, n_pending_tuples FROM gin_metapage_info(get_raw_page('testtable_x_idx_gin', 0)); RESET ROLE;
SET | ||||
---|---|---|---|---|
pending_head | pending_tail | n_pending_pages | size | n_pending_tuples |
4294967295 | 4294967295 | 0 | 0 bytes | 0 |
RESET |
In earlier versions, we have to decode the metapage manually (little-endian only - works on x86):
SET ROLE postgres; WITH ints AS ( SELECT substring(v, 25, 4) AS low, substring(v, 29, 4) AS high FROM get_raw_page('testtable_x_idx_gin', 0) v), decoded AS ( SELECT get_byte(low, 0)::bigint + (get_byte(low, 1) << 8) + (get_byte(low, 2) << 16) + (get_byte(low, 3) << 24) low, get_byte(high, 0)::bigint + (get_byte(high, 1) << 8) + (get_byte(high, 2) << 16) + (get_byte(high, 3) << 24) high FROM ints ), decoded_pages AS ( SELECT low, high, CASE WHEN low = -1 THEN 0 ELSE high - low + 1 END AS pages FROM decoded ) SELECT low, high, pages AS pages, pg_size_pretty((high - low + 1) * 8192) AS size FROM decoded_pages; RESET ROLE;
SET | |||
---|---|---|---|
low | high | pages | size |
-1 | -1 | 0 | 8192 bytes |
RESET |
Turning Off Fast Updates
The FASTUPDATE
storage option can be used to turn off the fast update feature. Inserts are
slower, but lookups will generally be faster, as the pending list does not have to be
scanned. This can get very slow for large pending lists.
ALTER INDEX testtable_x_idx_gin SET (fastupdate = false);
ALTER INDEX
You can see the current values of any index options with the \d+
command in psql
:
\d+ testtable_x_idx_gin
Index "public.testtable_x_idx_gin" | |||
---|---|---|---|
Column | Type | Definition | Storage |
x | integer | x | plain |
gin, for table "public.testtable" | |||
Options: fastupdate=false |