Skip to main content
Version: 1.0.16

citext

The citext module provides a case-insensitive string type: citext. It internally calls lower when comparing values. Otherwise, its behavior is almost identical to text.

:::tip Tip: Consider using nondeterministic collations instead of this module. They can be used for case-insensitive comparisons, accent-insensitive comparisons, and other combinations, and they correctly handle more Unicode special cases. :::

This module is considered "trusted", meaning it can be installed by non-superusers who have CREATE privilege on the current database.

1. Rationale

The standard approach to case-insensitive matching in Halo has been to use the lower function when comparing values, for example:

SELECT * FROM tab WHERE lower(col) = LOWER(?);

This works reasonably well but has some drawbacks:

  • It makes your SQL statements verbose, and you must always remember to use lower on both the column and the query value.

  • It will not use an index unless you create a function index with lower.

  • If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive. Therefore, it is useless for case-insensitive searches and does not enforce case-insensitive uniqueness.

The citext data type allows you to eliminate lower calls in SQL queries and allows a primary key to be case-insensitive. Like text, citext is locale-dependent, meaning the matching of uppercase and lowercase characters depends on the rules of the database's LC_CTYPE setting. Additionally, this behavior is the same as using lower in queries. But since it is done transparently by the data type, you don't need to remember to do anything special in your queries.

2. How to Use It

Here is a simple usage example:

test=## CREATE TABLE users (

test(## nick CITEXT PRIMARY KEY,

test(## pass TEXT NOT NULL

test(## );

CREATE TABLE

test=## INSERT INTO users VALUES ( 'larry', sha256(random()::text::bytea) );

INSERT 0 1

test=## INSERT INTO users VALUES ('Tom', sha256(random()::text::bytea) );

INSERT 0 1

test=## INSERT INTO users VALUES ( 'Damian', sha256(random()::text::bytea) );

INSERT 0 1

test=## INSERT INTO users VALUES ( 'NEAL', sha256(random()::text::bytea) );

INSERT 0 1

test=## INSERT INTO users VALUES ( 'Bjørn', sha256(random()::text::bytea) );

INSERT 0 1

test=## SELECT * FROM users WHERE nick = 'Larry';

nick | pass

-------+------------------------------------------------------------------

Larry | xe7dee32074f92f34e4e838c1ba66a3e9e8cb8729dcebfcaa92edb08614666fab

(1 row)

Even though the nick column was set to larry and the query is Larry, the SELECT statement returns only one tuple.

3. String Comparison Behavior

citext performs comparisons by first converting each string to lowercase (calling lower) and then comparing the results normally. Therefore, if two strings produce the same result through lower, they are considered equal.

To simulate a case-insensitive collation as closely as possible, some string processing operators and functions have citext-specific versions. For example, when applied to citext, the regular expression operators ~ and * exhibit the same behavior: they both match in a case-insensitive manner. The same is true for ! and !~, as well as the LIKE operators ~~ and *, and ! and !~~. If you want to match in a case-sensitive manner, you can cast the operator's arguments to text.

Similarly, if the arguments of the following functions are citext, they perform matching in a case-insensitive manner:

  • regexp_match()

  • regexp_matches()

  • regexp_replace()

  • regexp_split_to_array()

  • regexp_split_to_table()

  • replace()

  • split_part()

  • strpos()

  • translate()

For the regexp functions, if you want to match in a case-sensitive manner, you can specify the "c" flag to force case-sensitive matching. Otherwise, if you want case-sensitive behavior, you must cast to text before using these functions.

4. Limitations

  • The case folding behavior of citext depends on your database's LC_CTYPE setting. Therefore, how it compares values is determined when the database is created. In terms defined by the Unicode standard, it is not truly case-insensitive. In practice, this means that as long as you are satisfied with your collation, you should be satisfied with citext comparisons. If your database contains data in different languages and the collation is for one language, users of another language may find that their query results are not as expected.

  • You can attach a COLLATE specification to a citext column or data value. Currently, citext operators respect a non-default COLLATE specification when comparing case-folded strings, but the initial folding to lowercase is done according to the database's LC_CTYPE setting (that is, even if COLLATE "default" is specified). This may be changed in a future release so that both steps follow the input COLLATE specification.

  • citext is less efficient than text because operator functions and B-tree comparison functions must create copies of the data and convert it to lowercase for comparison. Additionally, only text can support B-tree deduplication. However, citext is slightly more efficient than using lower for case-insensitive matching.

  • If you need to compare data case-sensitively in some contexts and case-insensitively in others, citext is not helpful. The standard answer is to use the text type and manually use the lower function when you need case-insensitive comparison. This works well if the need for case-insensitive comparison is infrequent. If you mostly need case-insensitive behavior, consider storing data as citext and explicitly casting the column to text when performing case-sensitive comparisons. In either case, you need two indexes to make both types of searches faster.

  • The schema containing the citext operators must be in the current search_path (typically public). If it is not in the search path, the standard case-sensitive text operators will be used instead.

  • The approach of comparing lowercase strings does not correctly handle some Unicode special cases, such as when a single uppercase letter equals two lowercase letters. Therefore, Unicode distinguishes between case mapping and case folding.