EXPLAIN
EXPLAIN — Show the execution plan of a statement
Synopsis
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
where option can be:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
SETTINGS [ boolean ]
BUFFERS [ boolean ]
WAL [ boolean ]
TIMING [ boolean ]
SUMMARY [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
Description
This command displays the execution plan generated by the Halo optimizer for the supplied statement. The execution plan shows how the tables referenced by the statement will be scanned — plain sequential scans, index scans, etc. — and what join algorithm will be used to join rows from each input table together when multiple tables are referenced.
The most important part of the display is the estimated execution cost of the statement, which is the planner's guess at how long the statement will take to run (measured in arbitrary cost units, but conventionally represents the number of disk page fetches). In fact, two numbers are displayed: the startup cost before the first row can be returned, and the total cost to return all rows. For most queries, the total cost is the most important, but in some situations (such as a subquery in EXISTS), the planner will choose a smaller startup cost over a smaller total cost (because the executor will stop after getting one row). Additionally, if you use a LIMIT clause to restrict the number of returned rows, the planner will interpolate between the endpoint costs to estimate which plan is truly the least expensive.
The ANALYZE option causes the statement to be actually executed, not just planned. The actual run-time statistics are then displayed, including the total time spent at each plan node (in milliseconds) and the number of rows it actually returned. This is useful for observing whether the planner's estimates are close to reality.
Only the ANALYZE and VERBOSE options can be specified, and they must be in the order shown above, without placing the option list inside parentheses.
Parameters
ANALYZE
Execute the command and display actual run time and other statistics. This parameter defaults to FALSE.
VERBOSE
Display additional information about the plan. In particular: the output column list for each node in the plan tree, schema-qualified table and function names, always annotate variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. This parameter defaults to FALSE.
COSTS
Include the estimated startup and total cost of each plan node, as well as the estimated number of rows and the width of each row. This parameter defaults to TRUE.
SETTINGS
Include information about configuration parameters. Specifically, include options that affect query planning whose values differ from the built-in defaults. This parameter defaults to FALSE.
BUFFERS
Include buffer usage information. Specifically: the number of shared blocks hit, read, dirtied, and written, the number of local blocks hit, read, dirtied, and written, the number of temp blocks read and written, and the time (in milliseconds) spent reading and writing data file blocks when track_io_timing is enabled. A hit means a read was avoided because the needed block was already found in cache. Shared blocks contain data from regular tables and indexes; local blocks contain data from temporary tables and indexes; temp blocks contain short-lived working data used in sorts, hashes, materialized plan nodes, and similar operations. The number of dirty blocks indicates the number of previously unmodified blocks changed by this query, while the number of written blocks indicates the number of dirty blocks evicted from cache by this backend during query processing. Block counts displayed for a higher-level node include those used by all its child nodes. In text format, only non-zero values are printed. It defaults to FALSE.
WAL
Include information on WAL record generation. Specifically, include the number of records, number of full page images (fpi) and amount of WAL bytes generated. In text format, only non-zero values are printed. This parameter may only be used when ANALYZE is also enabled. It defaults to FALSE.
TIMING
Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts are needed, not actual times. Even when node-level timing is turned off with this option, the overall run time of the statement is always measured. This parameter can only be used when ANALYZE is also enabled. It defaults to TRUE.
SUMMARY
Include summary information after the query plan (e.g., total timing information). Summary information is included by default when ANALYZE is used, but is not included by default otherwise, but can be enabled using this option. The planning time in EXPLAIN EXECUTE includes the time required to retrieve the plan from cache and the time required to re-plan if necessary.
FORMAT
Specify the output format, which can be TEXT, XML, JSON, or YAML. Non-text output contains the same information as text output format, but is easier for programs to parse. This parameter defaults to TEXT.
boolean
Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.
statement
Any SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, CREATE TABLE AS, or CREATE MATERIALIZED VIEW AS statement for which you want to view the execution plan.
Output
The result of this command is a textual description of the plan selected for the statement, optionally annotated with execution statistics.
Notes
To allow the Halo query optimizer to make reasonably informed decisions when optimizing queries, the pg_statistic data for all tables used in the query should be kept up-to-date. Normally this is done automatically by the autovacuum daemon. However, if a table has recently undergone major changes in content, you may need to perform a manual ANALYZE rather than waiting for autovacuum to catch up with the changes.
To measure the run-time cost of each node in the execution plan, the current implementation of EXPLAIN ANALYZE adds profiling overhead to query execution. As a result, running EXPLAIN ANALYZE on a query can sometimes be significantly slower than executing the query normally. The amount of overhead depends on the nature of the query and the platform being used. The worst case occurs on nodes with very short execution times and on machines with relatively slow operating system calls for time-related functions.
Examples
For a table with a single integer column and 10000 rows, to display the plan for a simple query on it:
EXPLAIN SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------
Seq Scan on foo (cost=0.00..155.00 rows=10000 width=4)
(1 row)
Here is the same query in JSON output format:
EXPLAIN (FORMAT JSON) SELECT * FROM foo;
QUERY PLAN
--------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Seq Scan",+
"Relation Name": "foo", +
"Alias": "foo", +
"Startup Cost": 0.00, +
"Total Cost": 155.00, +
"Plan Rows": 10000, +
"Plan Width": 4 +
} +
} +
1596
EXPLAIN
]
(1 row)
If there is an index and we use a query with an indexable WHERE condition, EXPLAIN might show a different plan:
EXPLAIN SELECT * FROM foo WHERE i = 4;
QUERY PLAN
--------------------------------------------------------------
Index Scan using fi on foo (cost=0.00..5.98 rows=1 width=4)
Index Cond: (i = 4)
(2 rows)
Here is the same query in YAML format:
EXPLAIN (FORMAT YAML) SELECT * FROM foo WHERE i='4';
QUERY PLAN
-------------------------------
- Plan: +
Node Type: "Index Scan" +
Scan Direction: "Forward"+
Index Name: "fi" +
Relation Name: "foo" +
Alias: "foo" +
Startup Cost: 0.00 +
Total Cost: 5.98 +
Plan Rows: 1 +
Plan Width: 4 +
Index Cond: "(i = 4)"
(1 row)
The XML format is left as an exercise for the reader.
Here is the same plan with cost estimates turned off:
EXPLAIN (COSTS FALSE) SELECT * FROM foo WHERE i = 4;
QUERY PLAN
----------------------------
Index Scan using fi on foo
s
Index Cond: (i = 4)
(2 rows)
Here is a query plan for a query using an aggregate function:
EXPLAIN SELECT sum(i) FROM foo WHERE i < 10;
QUERY PLAN
---------------------------------------------------------------------
Aggregate (cost=23.93..23.93 rows=1 width=4)
-> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4)
Index Cond: (i < 10)
(3 rows)
Here is an example of using EXPLAIN EXECUTE to display the execution plan of a prepared query:
PREPARE query(int, int) AS SELECT sum(bar) FROM test
WHERE id > $1 AND id < $2
GROUP BY foo;
EXPLAIN ANALYZE EXECUTE query(100, 200);
QUERY PLAN
-------------------------------------------------------------------
-----------------------------------------------------
HashAggregate (cost=9.54..9.54 rows=1 width=8) (actual time=0.156..0.161
rows=11 loops=1)
Group Key: foo
-> Index Scan using test_pkey on test (cost=0.29..9.29 rows=50 width=8)
(actual time=0.039..0.091 rows=99 loops=1)
Index Cond: ((id > $1) AND (id < $2))
Planning time: 0.197 ms
Execution time: 0.225 ms
(6 rows)
Of course, the specific numbers shown here depend on the actual contents of the tables involved. Also note that these numbers, and even the selected query strategy, may change between different versions of Halo, as the optimizer may be improved. Furthermore, the ANALYZE command uses random sampling to estimate data statistics. Therefore, after a new ANALYZE run, the cost estimates may change even if the actual distribution of data in the table has not changed.
See Also
ANALYZE