GIN (Generalized Inverted Index) supports data types that are divisible into smaller components, e.g., elements of an array, words of a text document, or properties of a JSON object. We call them compound data types. Unlike B-Trees, GIN does not generate a single index entry for the complete value but one index entry for each component.
Every index entry consists of the value of the individual component plus the tuple ID (TID). Please notice that TIDs don't contain the sequence number of the component within the complete value. They contain only the number of the physical page in the data file plus the number of the row within the page.
Useable operators in a
WHERE clause depend on the data type:
<@(is contained in),
&&(overlaps / has some common elements).
- Text queries (lexems):
->(JSON object field with the given key),
->>(JSON object field with the given key, as text).
-- create a table with a column that holds an array of integers DROP TABLE IF EXISTS t2; CREATE TABLE t2 (id INTEGER, arr INTEGER); INSERT INTO t2 VALUES (1, ARRAY [11, 12, 13]); INSERT INTO t2 VALUES (2, ARRAY [21, 22, 23]); -- insert a lot of other rows to enforce index usage INSERT INTO t2 (SELECT generate_series(3, 10000, 1), ARRAY[0, 1, 2, 3]); -- ------------------------------------------ -- create the GIN index -- ------------------------------------------ CREATE INDEX t2_gin_idx ON t2 USING GIN(arr); -- use the index ANALYSE t2; EXPLAIN SELECT * FROM t2 WHERE arr @> ARRAY;
A GIN index consists of different sub-structures:
- The Meta Page
- One Entry B-Tree
- Some Posting B-trees
- One Pending List
Among others, the Meta Page contains pointers to the Entry B-Tree and the Pending List.
The Entry B-Tree implements a tree where the keys consist of the original component's values, e.g., the value of a single array element. At the non-leaf levels, their pointers point to child pages at the next level. At the leaf level, pages consist of two types of entries: First, there are Posting Lists. They consist of the key, followed by a list of TIDs. Second, if such a list of TIDs exceeds the capacity of the physical page, the list gets rearranged to a Posting B-Tree, which is stored on one or more other pages. The original Posting List gets replaced by a pointer to the new Posting B-Tree.
A Posting B-Tree is part of one or more physical pages and contains a B-Tree over tuple IDs which all point to such rows in the data file where its key can be found as the value of one of its components.
The implementation of the two B-Tree types differs from PostgreSQL's standard B-Tree implementation.
The Pending List is a list of pages where keys (component's values) and their dedicated TIDs are stored sequentially. The Pending List exists for optimization purposes; see below.
GIN indices use two special optimizations. First, if the value of different components (possibly in different rows) is used often, the set of TIDs is rearranged to a B-Tree within the GIN. Consider the case of many text documents: Many words will likely be repeatedly used in the same and in different documents. In this case, the list of TIDs may grow to the thousands, and a tree is better suited to manage them than a list.
UPDATEs of compound data creates many index entries: one per component, e.g., one per word of a text. In the first step, such new index entries are collected in a separate Pending List outside of the index tree (in opposite to the original
CREATE INDEX command). During the next
VACUUM-run, the entries are moved from the pending list to the GIN tree structure using the same bulk insert technique used during initial index creation. This bulk technique speeds up the process, and - even more helpful - the work is delegated to a background process. Of course, there is a drawback: Every query must scan the pending list in addition to traversing the index tree.