Skip to main content
Version: 1.0.16

CLUSTER

CLUSTER — Cluster a table according to an index

Synopsis

CLUSTER [VERBOSE] table_name [ USING index_name ]

CLUSTER [VERBOSE]

Description

CLUSTER instructs the system to cluster the table specified by table_name based on the index specified by index_name. The index must already be defined on table_name.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation:

When the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows in index order (if you want to do this, you can periodically re-cluster by issuing this command. Also, setting the table's fillfactor storage parameter to less than 100% helps maintain cluster order during updates, because updated rows will be kept on the same page if there is enough space).

When a table is clustered, the system remembers which index it was clustered by. The form CLUSTER table_name re-clusters the table using the same index as before. You can also use CLUSTER or ALTER TABLE's SET WITHOUT CLUSTER form to set the index for future cluster operations, or clear any previous setting.

CLUSTER without any parameters re-clusters all previously clustered tables in the current database that are owned by the calling user (or all previously clustered tables if called by a superuser). This form of CLUSTER cannot be executed within a transaction block.

When a table is being clustered, an ACCESS EXCLUSIVE lock is required on it. This prevents any other database operations (including reads and writes) from operating on the table until CLUSTER finishes.

Parameters

table_name

The name of a table (may be schema-qualified).

index_name

The name of an index.

VERBOSE

Prints a progress report as each table is clustered.

Notes

When randomly accessing rows in a table, the actual order of data in the table is irrelevant. However, if you want to access some data more frequently and there is an index that groups them together, using CLUSTER can be beneficial.

CLUSTER is helpful when you request a range of indexed values from a table or a single value that matches multiple rows, because once the index identifies the table page containing the first matching row, all other matching rows are likely on the same table page, thus saving disk access and improving query speed.

CLUSTER can reorder a table using either an index scan on the specified index or a sequential scan followed by a sort (if the index is a B-tree). It will choose the faster method based on planner cost parameters and available statistics.

When using an index scan, a temporary copy of the table is created containing the table data in index order. Temporary copies of each index on the table are also created. Therefore, you need at least as much free space on disk as the total of the table size plus the index size.

When using a sequential scan and sort, a temporary sort file is also created, so the peak temporary space requirement is twice the table size plus the index size. This method is usually faster than the index scan method, but if disk space requirements are unacceptable, you can disable this choice by temporarily setting enable_sort to off.

It is recommended to set maintenance_work_mem to a reasonably large value before clustering (but not exceeding the RAM capacity you can dedicate to the CLUSTER operation).

Because the planner records statistics about table ordering, it is recommended to run ANALYZE on a newly clustered table. Otherwise, the planner may produce poor query plans.

Because CLUSTER remembers which indexes were clustered, you can first manually cluster the tables you want to cluster, and then set up a periodically running maintenance script that executes CLUSTER without any parameters, so those tables will be periodically re-clustered.

Examples

# Cluster the table employees based on index employees_ind:

CLUSTER employees USING employees_ind;

# Cluster the employees table using the same index as before:

CLUSTER employees;

# Cluster all previously clustered tables in the database:

CLUSTER;

See Also

CLUSTERDB