Skip to main content
Version: 1.0.16

ANALYZE

ANALYZE — Collect statistics about a database

Synopsis

ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

ANALYZE [ VERBOSE ] [ table_and_columns [, ...] ]

where option can be:

VERBOSE [ boolean ]

SKIP_LOCKED [ boolean ]

table_and_columns is:

table_name [ ( column_name [, ...] ) ]


Description

ANALYZE collects statistics about the contents of tables in a database and stores the results in the pg_statistic system catalog. The query planner then uses these statistics to help determine the most efficient execution plan for queries.

Without a table_and_columns list, ANALYZE processes every table and materialized view in the current database that the current user has permission to analyze. With a list, ANALYZE processes only those tables. A list of column names for a table can also be provided, in which case statistics are collected only for those columns.

When the option list is enclosed in parentheses, options can be written in any order. The syntax without parentheses is deprecated.

Parameters

VERBOSE

Enables display of progress messages.

SKIP_LOCKED

Specifies that ANALYZE should not wait for any conflicting locks to be released when beginning to process a relation: if the relation cannot be locked immediately without waiting, it is skipped. Note that even with this option, ANALYZE may still block when opening relation indexes or when fetching sample rows from partitions, table inheritance children, and certain types of foreign tables. Additionally, when ANALYZE normally processes all partitions of a specified partitioned table, if there is a conflicting lock on the partitioned table, this option will cause ANALYZE to skip the entire partitioned table.

boolean

Specifies whether the selected option is enabled or disabled. You can write TRUE, ON, or 1 to enable the option, or FALSE, OFF, or 0 to disable it. The boolean value can be omitted, in which case TRUE is assumed.

table_name

The name of a specific table to be analyzed (can be schema-qualified). If omitted, all regular tables, partitioned tables, and materialized views in the current database are analyzed (but not foreign tables). If the specified table is a partitioned table, both the inherited statistics for the entire partitioned table and the statistics for each individual partition will be updated.

column_name

The name of a specific column to be analyzed. The default is all columns.

Output: When VERBOSE is specified, ANALYZE emits progress messages to indicate which table is currently being processed. It also prints various statistics about those tables.

Notes

To analyze a table, you must typically be the table's owner or a superuser. However, the database owner can analyze all tables in the database, except for shared catalogs. (The restriction on shared catalogs means that a truly database-wide ANALYZE can only be performed by a superuser.) ANALYZE will skip any tables for which the calling user does not have analysis privileges. Foreign tables are only analyzed when explicitly selected. Not all foreign data wrappers support ANALYZE. If the table's wrapper does not support ANALYZE, the command prints a warning and does nothing.

In the default Halo configuration, the autovacuum daemon handles automatic analysis of tables when they are first loaded with data or modified by regular operations. When autovacuum is enabled, it is a good idea to run ANALYZE periodically, or to run ANALYZE after making major changes to table contents. Accurate statistics help the planner choose the most appropriate query plan, thereby improving query processing speed. A general strategy for read-mostly databases is to run VACUUM and ANALYZE once a day during the lowest usage period (this is insufficient if there are a large number of update operations).

ANALYZE requires only a read lock on the target table, so it can run in parallel with other operations on the table.

The statistics collected by ANALYZE typically include a list of the most common values in each column and a histogram showing the approximate data distribution in each column. If ANALYZE considers these items uninteresting (for example, in a unique key column with no common values) or if the column's data type does not support the appropriate operators, these items will be omitted.

For large tables, ANALYZE takes a random sample of the table contents rather than examining every row. This allows analysis of large tables to be completed in a short time. However, note that these statistics are only approximate values, and even if the actual table contents have not changed, the statistics may vary slightly each time ANALYZE is run. This may cause small changes in the planner cost estimates shown by EXPLAIN. In rare cases, this may non-deterministically cause the planner's query plan choices to change after ANALYZE is run. To avoid this, you can increase the amount of statistics collected by ANALYZE as described below.

The amount of analysis can be controlled by adjusting the default_statistics_target configuration variable, and on a per-column basis using ALTER TABLE ... ALTER COLUMN ... SET STATISTICS (see ALTER TABLE). The target value sets the maximum number of entries in the most common value list and the maximum number of bins in the histogram. The default target is 100, and it can be increased or decreased to balance planner estimate accuracy against the time and space consumed by ANALYZE and the space occupied by pg_statistic. In particular, setting the statistics target to zero disables statistics collection for that column. This can be helpful for columns that never appear in the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner cannot use statistics on those columns.

The largest statistics target among the columns being analyzed determines the number of table rows to be sampled for preparing the statistics. Increasing this target causes a proportional increase in the time and space required for ANALYZE.

One of the values estimated by ANALYZE is the number of distinct values appearing in each column. Since only a subset of rows is examined, this estimate can sometimes be very imprecise even with the largest statistics target. If this imprecision leads to poor query plans, you can manually determine a more accurate value and set it with ALTER TABLE ... ALTER COLUMN ... SET

(n_distinct = ...) (see ALTER TABLE).

If the table being analyzed has one or more children, ANALYZE will collect statistics twice: once for the parent table's rows only, and a second time for the rows of the parent table and all its child tables. The second set of statistics is needed when planning queries that need to traverse the entire inheritance tree. However, when determining whether to trigger an automatic analysis on a table, the autovacuum daemon only considers inserts and updates on the parent table itself. If the table is rarely inserted into or updated, the inherited statistics will only be updated when ANALYZE is run manually.

If any child tables are foreign tables whose foreign data wrapper does not support ANALYZE, those child tables are ignored when collecting inherited statistics.

If the table being analyzed is not completely empty, ANALYZE will record new statistics for the table. Any existing statistics will be retained.

See Also

VACUUM, VACUUMDB