ltree
mdx: format: md
This module implements a data type ltree for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are also provided.
This module is considered "trusted", meaning it can be installed by non-superusers who have CREATE privilege on the current database.
1. Definitions
A label is a sequence of alphanumeric characters and underscores (for example, in the C locale, the characters A-Za-z0-9_ are allowed).
Label length must be less than 256 characters.
Examples: 42, Personal_Services
A label path is a sequence of zero or more labels separated by dots, for example L1.L2.L3, representing a path from the root of a hierarchical tree to a particular node. The length of a label path must not exceed 65535 labels.
Examples: Top.Countries.Europe.Russia
The ltree module provides several data types:
• ltree stores a label path.
• lquery represents a regular-expression-like pattern for matching ltree values. A simple word matches that label in a path. A star (*) matches zero or more labels. These can be joined with dots to form a pattern that must match the entire label path. For example:
foo matches exactly the label path foo
.foo. matches any label path containing the label foo
*.foo matches any label path whose last label is foo
Both stars and simple words can be quantified to restrict how many labels they can match:
*{n} matches exactly n labels
*{n,} matches at least n labels
*{n,m} matches at least n but not more than m labels
*{,m} matches at most m labels — same as *{0,m}
foo{n,m} matches at least n but not more than m consecutive foo
foo{,} matches any number of consecutive foo, including zero
In the absence of any explicit quantifier, the default for a star is to match any number of labels (i.e., {,}), while the default for a non-star item is to match exactly once (i.e., {1}).
There are several modifiers that can be placed at the end of a non-star lquery item to make it match more than just an exact match:
@ performs case-insensitive matching, e.g., a@ matches A
- matches any label with this prefix, e.g., foo* matches foobar
% matches words separated by underscores at the beginning
The behavior of % is somewhat complex. It tries to match words rather than the entire label. For example, foo_bar% matches foo_bar_baz but not foo_barbaz. If combined with , prefix matching can be applied to each word individually, e.g., foo_bar% matches foo1_bar2_baz but not foo1_br2_baz.
In addition, you can write multiple possibly modified non-star items separated by | (OR) to match any of those items, and you can put ! (NOT) at the beginning of a non-star group to match any label not matching those alternatives. Quantifiers, if any, go at the end of the group; they mean some number of matches for the group as a whole (that is, some number of labels matching or not matching any alternative).
Here is an example of an lquery:
Top.{0,2}.sport@.!football|tennis{1,}.Russ*|Spain
a. b. c. d. e.
This query will match any label path that:
a. begins with the label Top
b. then has 0 to 2 labels
c. followed by a label beginning with the case-insensitive prefix sport
d. then has one or more labels, none matching football or tennis
e. and ends with a label beginning with Russ, or a label exactly matching Spain
• ltxtquery represents a full-text-search-like pattern for matching ltree values. An ltxtquery value contains words, possibly with the modifiers @, *, % at the end, which have the same meanings as in lquery. Words can be combined using & (AND), | (OR), ! (NOT), and parentheses. The key difference between lquery and ltxtquery is that the former matches words without regard to their position in the label path.
Here is an example of an ltxtquery:
Europe & Russia*@ & !Transportation
This will match paths containing the label Europe and any label beginning with Russia (case-insensitive), but not paths containing the label Transportation. The position of these words in the path is not important. Also, when using %, the word can match any underscore-separated word within a label, regardless of its position.
Note: ltxtquery allows whitespace between symbols, but ltree and lquery do not.
2. Operators and Functions
Type ltree has the usual comparison operators =, <>, <, >, <=, >=. Comparisons are sorted in tree traversal order, with the children of a node sorted by label text. In addition, there are the special operators shown in Table C.13.
Table C.13. ltree Operators
| Operator/Description |
|---|
| ltree @> ltree → boolean Left argument is an ancestor of the right argument (or equal) |
| ltree <@ ltree → boolean Left argument is a descendant of the right argument (or equal) |
| ltree ~ lquery → boolean lquery ~ ltree → boolean ltree matches lquery |
| ltree ? lquery[] → boolean lquery[] ? ltree → boolean ltree matches any lquery in the array |
| ltree @ ltxtquery → boolean ltxtquery @ ltree → boolean ltree matches ltxtquery |
| ltree [] @> ltree → boolean Array contains an ancestor of ltree |
| ltree [] <@ ltree → boolean Array contains a descendant of ltree |
| ltree[] @> ltree → boolean ltree <@ ltree[] → boolean Array contains an ancestor of ltree |
| ltree[] <@ ltree → boolean ltree @> ltree[] → boolean Array contains a descendant of ltree |
| ltree[] ~ lquery → boolean lquery ~ ltree[] → boolean Array contains any path matching lquery |
| ltree[] ? lquery[] → boolean lquery[] ? ltree[] → boolean Array contains any path matching any lquery |
| ltree[] @ ltxtquery → boolean ltxtquery @ ltree[] → boolean Array contains any path matching ltxtquery |
| ltree[] ?@> ltree → ltree Returns the first array entry that is an ancestor of ltree, or NULL if none |
| ltree[] ?<@ ltree → ltree Returns the first array entry that is a descendant of ltree, or NULL if none |
| ltree[] ?~ lquery → ltree Returns the first array entry matching lquery, or NULL if none |
| ltree[] ?@ ltxtquery → ltree Returns the first array entry matching ltxtquery, or NULL if none |
The operators <@, @>, @, and ~ have analogous operators ^<@, ^@>, ^@, ^~ that do not use indexes. They are useful only for testing purposes.
The available functions are listed in Table C.14.
Table C.14. ltree Functions
| Function/Description/Example |
|---|
| subltree ( ltree, start integer, end integer ) → ltree Returns the subpath of ltree from position start to position end-1 (counting from 0). subltree('Top.Child1.Child2', 1, 2) → Child1 |
| subpath ( ltree, offset integer, len integer ) → ltree Returns the subpath of ltree starting at position offset with length len. If offset is negative, the subpath starts from that many labels from the end of the path. If len is negative, that many labels are left off the end of the path. subpath('Top.Child1.Child2', 0, 2) → Top.Child1 |
| subpath ( ltree, offset integer ) → ltree Returns the subpath of ltree starting at position offset, extending to the end of the path. If offset is negative, the subpath starts from that many labels from the end of the path. subpath('Top.Child1.Child2', 1) → Child1.Child2 |
| nlevel ( ltree ) → integer Returns the number of labels in the path. nlevel('Top.Child1.Child2') → 3 |
| index ( a ltree, b ltree ) → integer Returns the position of the first occurrence of b in a, or -1 if not found. index('0.1.2.3.5.4.5.6.8.5.6.8', '5.6') → 6 |
| index ( a ltree, b ltree, offset integer ) → integer Returns the position of the first occurrence of b in a, or -1 if not found. The search starts at position offset; a negative offset means starting -offset labels from the end of the path. index('0.1.2.3.5.4.5.6.8.5.6.8', '5.6', -4) → 9 |
| text2ltree ( text ) → ltree Converts text to ltree ltree2text ( ltree ) → text Converts ltree to text |
| lca ( ltree [, ltree [, ... ]] ) → ltree Computes the longest common ancestor of the paths (supports up to 8 arguments). lca('1.2.3', '1.2.3.4.5.6') → 1.2 |
| lca ( ltree[] ) → ltree Computes the longest common ancestor of the paths in the array. lca(array['1.2.3'::ltree,'1.2.3.4']) → 1.2 |
3. Indexes
ltree supports several index types that can speed up the above operators:
• B-tree indexes on ltree: <, <=, =, >=, >
• GiST indexes on ltree (gist_ltree_ops opclass): <, <=, =, >=, >, @>, <@, @, ~, ?
The gist_ltree_ops GiST opclass approximates a set of path labels as a bitmap signature. Its optional integer parameter siglen determines the signature length in bytes. The default signature length is 8 bytes. Valid values for signature length are between 1 and 2024 bytes. Longer signatures result in more precise searches (scanning a smaller portion of the index and fewer heap pages), but at the cost of a larger index.
Example of creating an index with the default signature length of 8 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path);
Example of creating an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path
gist_ltree_ops(siglen=100));
• GiST indexes on ltree[] (gist__ltree_ops opclass): ltree[] <@ ltree, ltree @> ltree[], @, ~, ?
The gist__ltree_ops GiST opclass works similarly to gist_ltree_ops and also uses signature length as a parameter. The default value of siglen in gist__ltree_ops is 28 bytes.
Example of creating an index with the default signature length of 28 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
Example of creating an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path
gist__ltree_ops(siglen=100));
Note: This index type is lossy.
4. Examples
This example uses the following data:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
Now we have a table test populated with data describing the following hierarchy:
Top
/ |
Science Hobbies Collections
/ |
Astronomy Amateurs_Astronomy Pictures
/ |
Astrophysics Cosmology Astronomy
/ |
Galaxies Stars Astronauts
We can query the descendants:
test=## SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
Here are some path matching examples:
test=## SELECT path FROM test WHERE path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
test=## SELECT path FROM test WHERE path @'Astro*&!pictures@ ';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Here are some full-text search examples:
test=## SELECT path FROM test WHERE path @ 'Astro*% & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies.Amateurs_Astronomy
(4 rows)
test=## SELECT path FROM test WHERE path @ 'Astro* & !pictures@';
path
------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(3 rows)
Path construction using functions:
test=## SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test test-## WHERE
test-## path <@ 'Top.Science.Astronomy';
?column?
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
We can simplify this by creating an SQL function that inserts a label at a specified position in the path:
test=## CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree
test-## AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
test-## LANGUAGE SQL IMMUTABLE;
CREATE FUNCTION
test=## SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top.Science.Astronomy';
ins_label
------------------------------------------
Top.Science.Space.Astronomy
Top.Science.Space.Astronomy.Astrophysics
Top.Science.Space.Astronomy.Cosmology
(3 rows)
5. Transforms
There are additional extensions that implement transforms for the ltree type for PL/Python.
These extensions are ltree_plpythonu, ltree_plpython2u, and ltree_plpython3u. If these transforms are installed and specified when creating functions, ltree values are mapped to Python lists (however, the reverse transform is not currently supported).
| Caution: It is strongly recommended that transform extensions be installed in the same schema as ltree. Otherwise, there are security concerns during installation if the transform extension's schema contains objects defined by malicious users. |
|---|