REINDEX
REINDEX — Rebuild indexes
Synopsis
REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM }
[ CONCURRENTLY ] name
where option can be one of:
VERBOSE
Description
REINDEX rebuilds an index using the data stored in the index's table, replacing the old copy of the index. There are several scenarios in which REINDEX is needed:
-
An index has become corrupted and no longer contains valid data. While this should not happen in theory, in practice indexes can become corrupted due to software bugs or hardware failures. REINDEX provides a recovery method.
-
An index has become "bloated", containing many empty or nearly-empty pages. B-tree indexes in Halo can experience this under certain unusual access patterns. REINDEX provides a way to reduce the index's space consumption by producing a new version of the index without dead pages.
-
The storage parameters of an index have been changed (such as fillfactor), and you want to ensure that the change takes full effect.
-
If an index creation with the CONCURRENTLY option failed, the index remains as "invalid". Such indexes are useless, but can be conveniently rebuilt with REINDEX. Note that only REINDEX INDEX can perform concurrent creation on an invalid index.
Parameters
INDEX
Rebuild the specified index.
TABLE
Rebuild all indexes of the specified table. If the table has a secondary "TOAST" table, it will also be reindexed.
SCHEMA
Rebuild all indexes in the specified schema. If a table in this schema has a secondary "TOAST" table, it will also be reindexed. Indexes on shared system catalogs are also processed. This form of REINDEX cannot be executed within a transaction block.
DATABASE
Rebuild all indexes in the current database. Indexes on shared system catalogs are also processed. This form of REINDEX cannot be executed within a transaction block.
SYSTEM
Rebuild all indexes on system catalogs in the current database. Indexes on shared system catalogs are included. Indexes on user tables are not processed. This form of REINDEX cannot be executed within a transaction block.
name
The name of the specific index, table, or database to be reindexed. Index and table names can be schema-qualified. Currently, REINDEX DATABASE and REINDEX SYSTEM can only reindex the current database, so their parameter must match the current database name.
CONCURRENTLY
When this option is used, Halo will rebuild the index without taking any locks on the table that prevent concurrent inserts, updates, or deletes; a standard index rebuild will lock writes (but not reads) on the table until it completes. There are several things to be aware of when using this option; see Rebuilding Indexes Concurrently below.
For temporary tables, REINDEX is always non-concurrent, since no other session can access them, and non-concurrent reindexing is less expensive.
VERBOSE
Print a progress report as each index is rebuilt.
Notes
If you suspect an index on a user table is corrupted, you can simply rebuild that index or all indexes on the table using REINDEX INDEX or REINDEX TABLE.
Recovering from a corrupted index on a system table is more difficult. In this case, it is important that no suspicious index itself is used (in fact, in this scenario, you might find that the server process crashes immediately at startup due to its dependence on the corrupted index). To recover safely, the server must be started with the -P option, which prevents it from using indexes for system catalog lookups.
One way to do this is to shut down the server and start a single-user Halo server with the -P option included on its command line. Then, you can issue REINDEX DATABASE, REINDEX SYSTEM, REINDEX TABLE, or REINDEX INDEX, depending on how much you want to rebuild. When in doubt, use REINDEX SYSTEM to select all system indexes in the database for rebuilding. Then exit the single-user server session and restart the regular server.
Alternatively, you can start a regular server session with the -P option in its command line options. The method for doing this depends on the client, but in all libpq-based clients you can set the PGOPTIONS environment variable to -P before starting the client. Note that while this approach does not require locking out other clients, it is more prudent to prevent other users from connecting to the damaged database until the repair is complete.
REINDEX is similar to dropping and recreating the index, in that the index contents are built from scratch. However, the locking considerations are quite different. REINDEX will lock out writes but not reads on the index's parent table. It will also acquire an exclusive lock on the index being processed, which will block attempts to use that index. In contrast, DROP INDEX temporarily acquires an exclusive lock on the parent table, blocking both writes and reads. A subsequent CREATE INDEX will lock out writes but not reads; since the index does not exist, there will be no attempts to read from it, meaning there will be no blocking, but reads may be forced into expensive sequential scans.
Reindexing a single index or table requires the user to be the owner of that index or table. Reindexing a schema or database requires the user to be the owner of that schema or database. Note in particular that non-superusers may therefore be unable to rebuild indexes on tables owned by other users. However, as a special case, when a non-superuser issues REINDEX DATABASE, REINDEX SCHEMA, or REINDEX SYSTEM, indexes on shared catalogs will be skipped unless the user owns the catalog (which is typically not the case). Of course, superusers can always rebuild all indexes.
Rebuilding indexes on partitioned tables or partition indexes is not supported. However, individual partitions can be reindexed separately.
Rebuilding Indexes Concurrently
Rebuilding indexes can affect normal database operations. Normally, Halo locks the table being rebuilt to prevent writes and performs the entire index build in a single table scan. Other transactions can still read the table, but if they attempt to insert, update, or delete rows in the table, they will be blocked until the index rebuild completes. If the system is a live production database, this can have a severe impact. Very large tables may take many hours to index, and even for smaller tables, an index rebuild that locks writers for periods unacceptable in a production system.
Halo supports rebuilding indexes with minimal write locking. This method is invoked by specifying the CONCURRENTLY option of REINDEX. When this option is used, Halo must perform two table scans for each index that needs to be rebuilt, and it must wait for all existing transactions that might use the index to terminate. This method requires more work than a standard index rebuild and takes considerably longer to complete because it needs to wait for outstanding transactions that might modify the index. However, since it allows normal operations to continue while the index is being rebuilt, this method is useful for rebuilding indexes in a production environment. Of course, the additional CPU, memory, and I/O load required for the index rebuild may slow down other operations.
The following steps occur during a concurrent index rebuild. Each step runs in a separate transaction. If multiple indexes are being rebuilt, each step cycles through all indexes before moving on to the next step.
-
The new temporary index definition is added to the catalog pg_index. This definition will be used to replace the old index. A SHARE UPDATE EXCLUSIVE session-level lock is taken on the index being rebuilt and its associated table to prevent any schema modifications during processing.
-
The first pass of index building is completed for each new index. After the index is built, its flag pg_index.indisready is switched to "true" to make it ready for inserts, making it visible to other sessions after the transaction that performed the build completes. This step is done in a separate transaction for each index.
-
A second pass is then executed to add tuples that were added while the first pass was running. This step is also done in a separate transaction for each index.
-
All constraints referencing the index are changed to reference the new index definition, and the index name is also changed. At this point, pg_index.indisvalid is switched to "true" for the new index, and to "false" for the old index, and a cache invalidation causes all sessions referencing the old index to be invalidated.
-
The old index has pg_index.indisready switched to "false" to prevent any new tuple inserts, after waiting for queries that might reference the old index to complete.
-
The old index is dropped. The SHARE UPDATE EXCLUSIVE session locks on the index and table are released.
If a problem occurs during index rebuilding, such as a uniqueness violation in a unique index, the REINDEX command will fail but will leave behind a "invalid" new index in addition to the already existing one. This index will be ignored for query purposes since it may be incomplete; however, it will still incur update overhead. The psql \d command will report such an index as INVALID:
halo0root=# \d tab
Table "public.tab"
Column | Type | Modifiers
--------+---------+-----------
col | integer |
Indexes:
"idx" btree (col)
"idx_ccnew" btree (col) INVALID
If the INVALID index has the suffix ccnew, it corresponds to the temporary index created during the concurrent operation; the recommended recovery method is to drop it using DROP INDEX, then try REINDEX CONCURRENTLY again. If the invalid index instead has the suffix ccold, it corresponds to the original index that could not be dropped; the recommended recovery method is to simply drop the index, as the correct rebuild has already succeeded.
Regular index creation allows other regular index creations on the same table to occur simultaneously, but only one concurrent index creation can occur at a time on a table. In both cases, simultaneous modifications of other schema types on the table are not allowed. Another difference is that regular REINDEX TABLE or REINDEX INDEX commands can be executed within a transaction block, but REINDEX CONCURRENTLY cannot.
REINDEX SYSTEM does not support CONCURRENTLY because system catalogs cannot be reindexed concurrently.
Additionally, indexes on exclusion constraints cannot be reindexed concurrently. If such an index is directly named in this command, an error will be raised. If a table or database with exclusion constraint indexes is reindexed concurrently, those indexes will be skipped. (Such indexes can be reindexed without the CONCURRENTLY option.)
Examples
-- Rebuild a single index:
REINDEX INDEX my_index;
-- Rebuild all indexes on table my_table:
REINDEX TABLE my_table;
-- Rebuild all indexes in a specific database, without assuming system indexes are already usable:
$ export PGOPTIONS="-P"
$ psql broken_db
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-- Rebuild a table's indexes without blocking read or write operations on the related relations during the rebuild:
REINDEX TABLE CONCURRENTLY my_broken_table;