SELECT
SELECT, TABLE, WITH — Retrieve rows from a table or view
Synopsis
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST |
LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name
[, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
where from_item can be one of:
[ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE
( seed ) ] ]
[ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ]
with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias
[, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] ) [ AS ] alias
( column_definition [, ...] )
[ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition
[, ...] )
[ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS
( column_definition [, ...] ) ] [, ...] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias
[, ...] ) ] ]
from_item [ NATURAL ] join_type from_item [ ON join_condition | USING
( join_column [, ...] ) ]
and grouping_element can be one of:
( )
expression
( expression [, ...] )
ROLLUP ( { expression | ( expression [, ...] ) } [, ...] )
CUBE ( { expression | ( expression [, ...] ) } [, ...] )
GROUPING SETS ( grouping_element [, ...] )
and with_query is:
with_query_name [ ( column_name [, ...] ) ] AS [ [ NOT ] MATERIALIZED ]
( select | values | insert | update | delete )
TABLE [ ONLY ] table_name [ * ]
Description
SELECT retrieves rows from zero or more tables. The general processing of SELECT is as follows:
-
All queries in the WITH list are evaluated. These queries effectively serve as temporary tables that can be referenced in the FROM list. A WITH query that is referenced multiple times in the FROM will only be evaluated once, unless otherwise specified as NOT MATERIALIZED. (See WITH Clause below).
-
All elements in the FROM list are evaluated (each element in FROM is a real table or virtual table). If more than one element is specified in the FROM list, they are cross-joined together (see FROM Clause below).
-
If the WHERE clause is specified, all rows that do not satisfy the condition are eliminated from the output (see WHERE Clause below).
-
If the GROUP BY clause is specified or if there are aggregate functions, the output is combined into groups of rows that match on one or more values, and the results of aggregate functions are computed on them. If the HAVING clause is present, it eliminates groups that do not satisfy the given condition (see GROUP BY Clause and HAVING Clause below).
-
For each selected row or group of rows, the actual output rows are computed using the SELECT output expressions (see SELECT List below).
-
SELECT DISTINCT removes duplicate rows from the result. SELECT DISTINCT ON removes rows that match on all specified expressions. SELECT ALL (the default) returns all candidate rows, including duplicates (see DISTINCT Clause below).
-
By using the operators UNION, INTERSECT, and EXCEPT, the output of more than one SELECT statement can be combined to form a single result set. The UNION operator returns all rows that are in one or both result sets. The INTERSECT operator returns all rows that are in both result sets. The EXCEPT operator returns rows that are in the first result set but not in the second. In all three cases, duplicate rows are eliminated (unless ALL is specified). The noise word DISTINCT can be added to explicitly eliminate duplicate rows. Note that while ALL is the default behavior for SELECT itself, DISTINCT is the default behavior here (see UNION Clause, INTERSECT Clause, and EXCEPT Clause below).
-
If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If no ORDER BY is given, the system returns rows in whatever order it can produce them fastest (see ORDER BY Clause below).
-
If the LIMIT (or FETCH FIRST) or OFFSET clause is specified, the SELECT statement returns only a subset of the result rows (see LIMIT Clause below).
-
If FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, or FOR KEY SHARE is specified, the SELECT statement locks the selected rows against concurrent updates (see The Locking Clause below).
You must have SELECT privilege on each column used in a SELECT command. FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, or FOR KEY SHARE also requires UPDATE privilege (on at least one column of each table so selected).
Parameters
WITH Clause
The WITH clause allows you to specify one or more subqueries that can be referenced by name in the main query. The subqueries effectively act as temporary tables or views for the duration of the main query. Each subquery can be a SELECT, TABLE, VALUES, INSERT, UPDATE, or DELETE statement. When writing a data-modifying statement (INSERT, UPDATE, or DELETE) in WITH, it is typical to include a RETURNING clause. It is the output of RETURNING, not the underlying table modified by the statement, that forms the temporary table read by the main query. If RETURNING is omitted, the statement is still executed but produces no output, so it cannot be referenced as a table from the main query.
For each WITH query, a name must be specified (without schema qualification). Optionally, a list of column names can be specified. If the list is omitted, the column names are inferred from the subquery.
If RECURSIVE is specified, a SELECT subquery is allowed to reference itself by name. The form of such a subquery must be non_recursive_term UNION [ ALL | DISTINCT ] recursive_term where the recursive self-reference must appear on the right-hand side of UNION. Only one recursive self-reference is allowed per query. Recursive data-modifying statements are not supported, but the result of a recursive SELECT query can be used in a data query statement.
Another effect of RECURSIVE is that WITH queries do not need to be ordered: a query can reference another query that appears later in the list (however, circular references or mutual recursion are not implemented). Without RECURSIVE, WITH queries can only reference sibling WITH queries that appear earlier in the WITH list.
When there are multiple queries in the WITH clause, RECURSIVE should only be written once, immediately after WITH. It applies to all queries in the WITH clause, although it has no effect on queries that do not use recursion or forward references.
The main query and all WITH queries are (theoretically) executed at the same time. This means that the effects of a data-modifying statement in WITH cannot be seen from any part of the query, but its RETURNING output can be read. If two such data-modifying statements attempt to modify the same row, the result is indeterminate.
A key property of WITH queries is that, even when the main query references them multiple times, they are typically evaluated only once per execution of the main query. In particular, data-modifying statements are guaranteed to execute once and only once, regardless of whether the main query reads all or any of their output.
However, WITH queries can be marked as NOT MATERIALIZED to remove this guarantee. In this case, the WITH query can be folded into the main query as if it were a simple sub-SELECT in the FROM clause of the main query. If the main query references the WITH query multiple times, this will cause repeated evaluation; however, if each such use only needs a few rows from the total output of the WITH query, NOT MATERIALIZED can save overhead by allowing the query planner to optimize jointly. NOT MATERIALIZED is ignored if it is attached to a recursive WITH query, or one that is not side-effect-free (that is, not a plain SELECT containing non-volatile functions).
FROM Clause
The FROM clause specifies one or more source tables for SELECT. If multiple source tables are specified, the result is the Cartesian product (cross join) of all source tables. But typically, qualifying conditions (via WHERE) are added to restrict the returned rows to a small subset of that Cartesian product.
The FROM clause can contain the following elements:
table_name
The name of an existing table or view (can be schema-qualified). If ONLY is specified before the table name, only that table is scanned. If ONLY is not specified, the table and all its descendants (if any) are scanned. Optionally, * can be specified after the table name to explicitly indicate that descendants are included.
alias
A substitute name for a FROM item containing an alias. An alias is used for brevity or to eliminate ambiguity in self-joins (where the same table is scanned multiple times). When an alias is provided, the actual name of the table or function is hidden. For example, given FROM foo AS f, the remainder of the SELECT must reference this FROM item as f rather than foo. If an alias is written, a column alias list can also be written to provide substitute names for one or more columns of the table.
TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] The TABLESAMPLE clause after table_name indicates that the specified sampling_method should be used to retrieve a subset of rows from the table. This sampling takes place before any other filters (such as WHERE clauses). Halo includes two sampling methods: BERNOULLI and SYSTEM. Other sampling methods can be installed in the database through extensions.
Both BERNOULLI and SYSTEM sampling methods accept a single argument, which represents the fraction of the table to sample, expressed as a percentage between 0 and 100. This argument can be any real-valued expression (other sampling methods may accept more or different parameters). Both methods return a randomly selected sample of the table containing the specified percentage of table rows. The BERNOULLI method scans the entire table and selects or ignores rows with the specified probability. The SYSTEM method performs block-level sampling, where each block has a specified chance of being selected, and all rows in selected blocks are returned. The SYSTEM method is much faster than the BERNOULLI method when a small sampling percentage is specified, but the former may return a less random sample due to clustering effects.
The optional REPEATABLE clause specifies a seed number or expression for generating random numbers within the sampling method. The seed value can be any non-null floating-point value. If the table is not changed between queries, two queries specifying the same seed and argument values will select the same sample of the table. However, different seed values will typically produce different samples. If REPEATABLE is not given, a new random sample is selected for each query based on a system-generated seed.
Note that some extension sampling methods do not accept REPEATABLE and will always produce new samples for each use.
select
A sub-SELECT can appear in the FROM clause. This acts as if its output were created as a temporary table existing for the duration of the SELECT command. Note that the sub-SELECT must be surrounded by parentheses and must be provided with an alias. A VALUES command can also be used here.
with_query_name
A WITH query can be referenced by writing its name, as if the query name were a table name (in fact, the WITH query hides any real table of the same name from the main query. If necessary, you can reference the real table of the same name using schema qualification). An alias can be provided in the same way as for a table.
function_name
Function calls can appear in the FROM clause (this is particularly useful for set-returning functions, but any function can be used). This acts as if the function's output were created as a temporary table existing for the duration of the SELECT command.
When the optional WITH ORDINALITY clause is added to the function call, a new column is appended after the function's output columns, numbering each row.
An alias can be provided in the same way as for a table. If an alias is written, a column alias list can also be written to provide substitute names for one or more attributes of the function's composite return type, including the new column added by ORDINALITY (if any).
Multiple function calls can be combined in a single FROM-clause item by enclosing them in ROWS FROM( ... ).
The output of such an item is the concatenation of the first row from each function, then the second row from each function, and so on. If some functions produce fewer rows than others, null values are placed where data is missing, so the total number of rows returned is always the same as the function that produces the most rows.
If the function is defined to return the record data type, an alias or the keyword AS must appear, followed by a column definition list of the form ( column_name data_type [, ... ]). The column definition list must match the actual number and types of columns returned by the function.
When using the ROWS FROM( ... ) syntax, if one of the functions requires a column definition list, it is best to place the column definition list after the function call within ROWS FROM( ... ). A column definition list can be placed after the ROWS FROM( ... ) construct only when there is exactly one function and no WITH ORDINALITY clause.
To use ORDINALITY together with a column definition list, you must use the ROWS FROM( ... ) syntax and place the column definition list inside ROWS FROM( ... ).
join_type
One of
-
[ INNER ] JOIN
-
LEFT [ OUTER ] JOIN
-
RIGHT [ OUTER ] JOIN
-
FULL [ OUTER ] JOIN
-
CROSS JOIN
For INNER and OUTER join types, a join condition must be specified, namely one of NATURAL, ON join_condition, or USING (join_column [, ...]) (only one can be used). The meaning is described below. For CROSS JOIN, none of these clauses can appear.
A JOIN clause combines two FROM items (for convenience we refer to them as "tables", although they can be any type of FROM item). Parentheses can be used to determine nesting order if necessary. Without parentheses, JOINs nest from left to right. In any case, JOIN binds more tightly than commas separating FROM-list items.
CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same result as listing two tables at the top level of FROM, but constrained by the join condition (if any). CROSS JOIN is equivalent to INNER JOIN ON(TRUE), meaning the condition does not remove any rows. These join types are merely a notational convenience, since there is nothing you can do with them that cannot be done with plain FROM and WHERE.
LEFT OUTER JOIN returns all rows from the restricted Cartesian product (i.e., all combined rows that pass the join condition), plus a copy of each row from the left-hand table for which there is no matching right-hand row that passes the join condition. This left-hand row is extended to the full joined row by inserting null values in the right-hand columns. Note that only the condition of the JOIN clause itself is considered when determining which rows match. Outer conditions are applied afterward.
Conversely, RIGHT OUTER JOIN returns all joined rows, plus one row for each unmatched right-hand row (extended with nulls on the left). This is merely a notational convenience, since you can convert it to a LEFT OUTER JOIN by swapping the left and right tables.
FULL OUTER JOIN returns all joined rows, plus one row for each unmatched left-hand row (extended with nulls on the right), plus one row for each unmatched right-hand row (extended with nulls on the left).
ON join_condition
join_condition is an expression that produces a value of type boolean (similar to a WHERE clause) that indicates which rows are considered to match in a join.
USING ( join_column [, ...] )
A clause of the form USING ( a, b, ... ) is shorthand for ON left_table.a = right_table.a AND left_table.b = right_table.b .... Also, USING implies that only one of each pair of equivalent columns will be included in the join output.
NATURAL
NATURAL is shorthand for a USING list that mentions all columns in both tables with matching names. If there are no common column names, NATURAL is equivalent to ON TRUE.
LATERAL
The LATERAL keyword can precede a sub-SELECT FROM item. This allows the sub-SELECT to reference columns of FROM items that appear before it in the FROM list (without LATERAL, each sub-SELECT is evaluated independently and therefore cannot cross-reference any other FROM item).
LATERAL can also precede a function-call FROM item, but in this case it is a noise word, since the function expression can reference columns of preceding FROM items in any case.
A LATERAL item can appear at the top level of the FROM list, or in a JOIN. In the latter case, it can also reference any items on the left-hand side of the JOIN on which it is the right-hand side.
When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row or set of rows from the FROM items providing the cross-referenced columns, the LATERAL item is evaluated using the column values of that row or row set. The resulting rows are then joined as usual with the rows from which they were computed. This process is repeated for each row or row set from the source table(s) of those columns.
The source table for the columns must be INNER- or LEFT-joined to the LATERAL item, otherwise there is no well-defined set of rows for evaluating the LATERAL item for each row set. Although the construct X RIGHT JOIN LATERAL Y is syntactically valid, it is not actually allowed to reference X within Y.
WHERE Clause
The optional WHERE clause has the form
WHERE condition
where condition is any expression that evaluates to a result of type boolean. Any row that does not satisfy this condition is eliminated from the output. A row satisfies the condition if the expression returns true when the actual row values are substituted for its variable references.
GROUP BY Clause
The optional GROUP BY clause has the form
GROUP BY grouping_element [, ...]
GROUP BY condenses all selected rows that share the same set of values for the grouping expressions into a single row. An expression used in a grouping_element can be an input column name, the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input column values. In case of ambiguity, a GROUP BY name will be interpreted as an input column name rather than an output column name.
If any GROUPING SETS, ROLLUP, or CUBE are present as grouping elements, the GROUP BY clause as a whole defines several independent grouping sets. The effect is equivalent to constructing a UNION ALL between subqueries with grouping sets as their GROUP BY clauses.
Aggregate functions (if used) are computed across all rows that make up each group, producing a single value for each group (if aggregate functions are present but no GROUP BY clause exists, the query is treated as having a single group consisting of all selected rows). The set of rows passed to each aggregate function can be further filtered by attaching a FILTER clause to the aggregate function call. When a FILTER clause is present, only rows matching it are included in the aggregate function's input.
When a GROUP BY clause is present or any aggregate functions are used, SELECT list expressions cannot reference non-grouped columns (unless the reference appears within an aggregate function or is functionally dependent on the grouped columns), because doing so would result in ambiguous values for the non-grouped column. Functional dependency exists if the grouped columns are the primary key (or a subset thereof) of the table containing the non-grouped column.
Keep in mind that all aggregate functions are evaluated before any "scalar" expressions in the HAVING clause or SELECT list. This means that a CASE expression cannot be used to skip evaluation of an aggregate expression.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be specified with GROUP BY.
HAVING Clause
The optional HAVING clause has the form
HAVING condition
where condition is the same as specified in the WHERE clause.
HAVING eliminates group rows that do not satisfy the condition. HAVING differs from WHERE: WHERE filters individual rows before GROUP BY is applied, whereas HAVING filters group rows created by GROUP BY. Every column referenced in condition must unambiguously reference a grouping column (unless the reference appears within an aggregate function or the non-grouped column is functionally dependent on the grouped columns).
Even without a GROUP BY clause, the presence of HAVING transforms a query into a grouped query. This is the same situation as when the query contains aggregate functions but no GROUP BY clause. All selected rows are considered to form a single group, and the SELECT list and HAVING clause can only reference table columns within aggregate functions. If the HAVING condition is true, such a query will emit a single row; otherwise, no rows are returned.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be specified with HAVING.
WINDOW Clause
The optional WINDOW clause has the form
WINDOW window_name AS ( window_definition ) [, ...]
where window_name is a name that can be referenced from an OVER clause or subsequent window definitions.
window_definition is
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ]
[, ...] ]
[ frame_clause ]
If an existing_window_name is specified, it must reference an earlier entry in the WINDOW list. The new window copies its partitioning clause and ordering clause (if any) from that entry. In this case, the new window cannot specify its own PARTITION BY clause, and it can only specify an ORDER BY clause if the copied window does not have one. The new window always uses its own frame clause; the copied window need not specify a frame clause.
PARTITION BY list elements are interpreted in the same way as GROUP BY clause elements, except that they are always simple expressions and cannot be output column names or numbers. Another difference is that these expressions can contain aggregate function calls, which are not allowed in regular GROUP BY clauses. They are allowed here because windows appear after grouping and aggregation.
Similarly, ORDER BY list elements are interpreted in the same way as statement-level ORDER BY clause elements, except that the expressions are always treated as simple expressions and cannot be output column names or numbers.
The optional frame_clause defines the window frame for frame-dependent window functions (not all window functions depend on frames).
The window frame is the set of related rows for each row (called the current row) in the query. The frame_clause can be
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]
where frame_start and frame_end can be
UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING
and frame_exclusion can be
EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS
If frame_end is omitted, it defaults to CURRENT ROW. The restrictions are: frame_start cannot be UNBOUNDED FOLLOWING, frame_end cannot be UNBOUNDED PRECEDING, and the choice of frame_end cannot appear earlier than the choice of frame_start in the list of frame_start and frame_end options above — for example, RANGE BETWEEN CURRENT ROW AND offset PRECEDING is not allowed.
The default frame option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. It sets the frame to start from the beginning of the partition up to the last peer of the current row (rows considered equivalent to the current row by the window's ORDER BY clause; if there is no ORDER BY, all rows are peers). Typically, UNBOUNDED PRECEDING means the frame starts at the first row of the partition, and similarly UNBOUNDED FOLLOWING means the frame ends at the last row of the partition, whether in RANGE, ROWS, or GROUPS mode. In ROWS mode, CURRENT ROW means the frame starts or ends at the current row. In RANGE or GROUPS mode, it means the frame starts or ends at the first or last peer of the current row in the ORDER BY ordering. The meanings of the offset PRECEDING and offset FOLLOWING options vary with the frame mode. In ROWS mode, the offset is an integer indicating that the frame starts or ends that many rows before or after the current row. In GROUPS mode, the offset is an integer indicating that the frame starts or ends that many peer groups before or after the current row's peer group, where a peer group is a group of rows that are equivalent according to the window's ORDER BY clause. In RANGE mode, the use of the offset option requires exactly one ORDER BY column in the window definition. The frame then includes rows whose sort column value is no more than offset less than (for PRECEDING) or greater than (for FOLLOWING) the current row's sort column value. In these cases, the data type of the offset expression depends on the data type of the sort column. For numeric sort columns, it is typically the same type as the sort column, but for datetime sort columns it is interval. In all these cases, the offset value must be non-null and non-negative. Furthermore, although the offset does not have to be a simple constant, it cannot contain variables, aggregate functions, or window functions.
The frame_exclusion option allows excluding rows around the current row from the frame, even if they would be included according to the frame's start and end options. EXCLUDE CURRENT ROW excludes the current row from the frame. EXCLUDE GROUP excludes the current row and its ordering peers from the frame. EXCLUDE TIES excludes any peers of the current row from the frame, but does not exclude the current row itself. EXCLUDE NO OTHERS simply explicitly specifies the default behavior of not excluding the current row or its peers.
Note that if the ORDER BY ordering does not produce a unique ordering of rows, the ROWS mode may produce unpredictable results. The RANGE and GROUPS modes are designed to ensure that rows that are equal in the ORDER BY ordering are treated identically: all rows in a given peer group will either be in the frame or excluded from the frame.
The purpose of the WINDOW clause is to specify the behavior of window functions appearing in the query's SELECT list or ORDER BY clause. These functions can reference WINDOW clause entries by name in their OVER clauses. However, WINDOW clause entries do not have to be referenced. If they are not used in the query, they are simply ignored. Window functions can be used without any WINDOW clause at all, since a window function call can directly specify its window definition in its OVER clause. However, the WINDOW clause can reduce typing when multiple window functions share the same window definition.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be specified with WINDOW.
SELECT List
The SELECT list (between the keywords SELECT and FROM) specifies the expressions that form the output rows of the SELECT statement. These expressions can (and typically do) reference columns computed in the FROM clause. As in a table, each output column of SELECT has a name. In a simple SELECT, this name is just used to label the displayed column, but when SELECT is a subquery of a larger query, the larger query treats the name as the column name of the virtual table produced by the subquery. To specify the name to use for an output column, write AS output_name after the column's expression (AS can be omitted when the desired output name does not conflict with any Halo keyword. To avoid conflicts with future added keywords, it is recommended to always write AS or double-quote the output name). If you do not specify a column name, Halo will automatically choose one. If the column's expression is a simple column reference, the chosen name is the same as the column name. In more complex cases using functions or type names, the system may generate names such as ?column?.
An output column's name can be used to reference the column's value in ORDER BY and GROUP BY clauses, but not in WHERE and HAVING clauses (where the expression must be written out).
You can write * in the output list as a shorthand for all columns of the selected rows. You can also write table_name.* as a shorthand for all columns from just that table. In these cases, AS cannot be used to specify new names; the output column names will be the same as the table column names.
Per the SQL standard, expressions in the output list should be evaluated before applying DISTINCT, ORDER BY, or LIMIT.
This must obviously be done when using DISTINCT, otherwise it would be impossible to determine what values are being distinguished. However, in many cases it is more convenient to evaluate ORDER BY and LIMIT first and then evaluate the output expressions, especially if the output list contains any volatile or expensive functions. With this behavior, the function evaluation order is more intuitive and functions will not be evaluated for rows that never appear in the output. As long as the output expressions are not referenced by DISTINCT, ORDER BY, or GROUP BY, Halo actually evaluates the output expressions after sorting and limiting the number of rows (a counterexample is SELECT f(x) FROM tab ORDER BY 1, which obviously must evaluate f(x) before sorting). Output expressions containing set-returning functions are actually evaluated after sorting and before limiting, so that LIMIT can truncate the output from the set-returning functions.
DISTINCT Clause
If SELECT DISTINCT is specified, all duplicate rows are removed from the result set (one row is kept for each group of duplicates). SELECT ALL specifies the opposite behavior: all rows are retained, which is the default.
SELECT DISTINCT ON ( expression [, ...] ) retains only the first row of each set of rows that evaluate as equal on the given expressions. The DISTINCT ON expressions are interpreted using the same rules as ORDER BY (see above). Note that unless ORDER BY is used to ensure the desired row appears first, the "first row" of each set is unpredictable. For example:
SELECT DISTINCT ON (location) location, time, report
FROM weather_reports
ORDER BY location, time DESC;
retrieves the most recent weather report for each location. But if we do not use ORDER BY to force descending order of time values for each location, the time of the report we get for each location may be unpredictable.
The DISTINCT ON expressions must match the leftmost ORDER BY expressions. The ORDER BY clause will typically contain additional expressions that determine the priority of rows within each DISTINCT ON group.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be used with DISTINCT.
UNION Clause
The UNION clause has the following form:
select_statement UNION [ ALL | DISTINCT ] select_statement
select_statement is any SELECT statement without ORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE clauses (if the subexpression is surrounded by parentheses, ORDER BY and LIMIT can be attached to it. Without parentheses, these clauses are applied to the UNION result rather than the right-hand expression).
The UNION operator computes the set union of the rows returned by the involved SELECT statements. A row is in the union if it appears in at least one of the two result sets. The SELECT statements that serve as the two operands of UNION must produce the same number of columns, and the corresponding columns must have compatible data types.
The result of UNION does not contain duplicate rows, unless the ALL option is specified. ALL prevents the elimination of duplicates (therefore, UNION ALL is typically significantly faster than UNION; use ALL whenever possible). DISTINCT can be written to explicitly specify the behavior of eliminating duplicate rows.
Unless parentheses specify the evaluation order, multiple UNION operators in the same SELECT statement are evaluated from left to right.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be used in the UNION result or any input of UNION.
INTERSECT Clause
The INTERSECT clause has the following form:
select_statement INTERSECT [ ALL | DISTINCT ] select_statement
select_statement is any SELECT statement without ORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE clauses.
The INTERSECT operator computes the set intersection of the rows returned by the involved SELECT statements. A row is in the intersection if it appears in both result sets.
The result of INTERSECT does not contain duplicate rows, unless the ALL option is specified. If ALL is specified, a row that appears m times in the left table and n times in the right table will appear min(m,n) times in the result. DISTINCT can be written to explicitly specify the behavior of eliminating duplicate rows.
Unless parentheses specify the evaluation order, multiple INTERSECT operators in the same SELECT statement are evaluated from left to right. INTERSECT has higher precedence than UNION. That is, A UNION B INTERSECT C will be read as A UNION (B INTERSECT C).
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be used in the INTERSECT result or any input of INTERSECT.
EXCEPT Clause
The EXCEPT clause has the following form:
select_statement EXCEPT [ ALL | DISTINCT ] select_statement
select_statement is any SELECT statement without ORDER BY, LIMIT, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE clauses.
The EXCEPT operator computes the set of rows that are in the result of the left SELECT statement but not in the result of the right SELECT statement.
The result of EXCEPT does not contain duplicate rows, unless the ALL option is specified. If ALL is specified, a row that appears m times in the left table and n times in the right table will appear max(m-n,0) times in the result set. DISTINCT can be written to explicitly specify the behavior of eliminating duplicate rows.
Unless parentheses specify the evaluation order, multiple EXCEPT operators in the same SELECT statement are evaluated from left to right. EXCEPT has the same precedence as UNION.
Currently, FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE, and FOR KEY SHARE cannot be used in the EXCEPT result or any input of EXCEPT.
ORDER BY Clause
The optional ORDER BY clause has the following form:
ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ]
[, ...]
The ORDER BY clause causes the result rows to be sorted according to the specified expressions. If two rows are equal according to the leftmost expression, they are compared according to the next expression, and so on. If they are equal according to all specified expressions, the order in which they are returned is implementation-dependent.
Each expression can be the name or ordinal number of an output column (SELECT list item), or it can be any arbitrary expression formed from input column values.
The ordinal number refers to the ordinal (left-to-right) position of the output column. This feature can be used to define an ordering for columns that do not have unique names. It is not strictly necessary, since the AS clause can always be used to assign a name to an output column.
Arbitrary expressions can also be used in the ORDER BY clause, including columns that do not appear in the SELECT output list. Thus, the following statement is valid:
SELECT name FROM distributors ORDER BY code;
A limitation of this feature is that an ORDER BY clause applied to the result of a UNION, INTERSECT, or EXCEPT clause can only specify output column names or ordinal numbers, not expressions.
If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of what GROUP BY would choose in the same situation. This inconsistency exists for compatibility with the SQL standard.
The keywords ASC (ascending) or DESC (descending) can be added after any expression in the ORDER BY clause. If neither is specified, ASC is assumed by default. Alternatively, a specific sort operator name can be specified in a USING clause. A sort operator must be a less-than or greater-than member of some B-tree operator family. ASC is typically equivalent to USING < and DESC is typically equivalent to USING > (but the creator of a user-defined data type can define exactly what the default sort order is, and it might correspond to an operator with a different name).
If NULLS LAST is specified, null values are sorted after non-null values; if NULLS FIRST is specified, null values are sorted before non-null values. If neither is specified, the default behavior when ASC is specified or implied is NULLS LAST, and the default behavior when DESC is specified or implied is NULLS FIRST (thus, the default behavior is that null values sort after non-null values). When USING is specified, the default null ordering depends on whether the operator is a less-than or greater-than operator.
Note that ordering options apply only to the expression they follow. For example, ORDER BY x, y DESC is different from ORDER BY x DESC, y DESC.
String data is sorted according to the collation applied to the column being sorted. This can be overridden by including a COLLATE clause in the expression, for example ORDER BY mycolumn COLLATE "en_US".
LIMIT Clause
The LIMIT clause consists of two independent subclauses:
LIMIT { count | ALL }
OFFSET start
The count parameter specifies the maximum number of rows to return, while start specifies the number of rows to skip before returning rows. When both are specified, start rows are skipped before counting the count rows to return.
If the count expression evaluates to NULL, it is treated as LIMIT ALL, i.e., no limit. If start evaluates to NULL, it is treated as OFFSET 0.
SQL:2008 introduced a different syntax to achieve the same result, which Halo also supports:
OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }
In this syntax, the standard requires start or count to be a literal constant, a parameter, or a variable name. As a Halo extension, other expressions are also allowed, but they generally need to be enclosed in parentheses to avoid ambiguity. If count is omitted in a FETCH clause, it defaults to 1. The WITH TIES option is used to return any additional rows that tie with the last row in the result set according to the ORDER BY clause; ORDER BY is mandatory in this case. ROW and ROWS, as well as FIRST and NEXT, are noise words that do not affect the behavior of these clauses. Per the standard, if both are present, the OFFSET clause must appear before the FETCH clause. However, Halo is more lenient and allows either order. When using LIMIT, it is good practice to use an ORDER BY clause to constrain the result rows to a unique order. Otherwise, you will get an unpredictable subset of the query result rows — you might request rows 10 through 20, but in what order? Unless ORDER BY is specified, you do not know the order.
The query optimizer considers LIMIT when generating an execution plan, so depending on the LIMIT and OFFSET values you use, you will likely get different plans (and different row orderings). Therefore, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you use ORDER BY to enforce a predictable result order. This is not a bug; it is a necessary consequence of the fact that SQL does not promise to return query results in any particular order (unless ORDER BY is used to constrain the order).
Without an ORDER BY clause to enforce selection of a deterministic subset, repeatedly executing the same LIMIT query may even return different subsets of table rows. Again, this is not a bug; in such a case there is no guarantee of deterministic results.
Locking Clause
FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, and FOR KEY SHARE are locking clauses that affect how SELECT locks rows as they are retrieved from tables.
The general form of a locking clause is:
FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ]
where lock_strength can be
UPDATE
NO KEY UPDATE
SHARE
KEY SHARE
To prevent the operation from waiting for other transactions to commit, the NOWAIT or SKIP LOCKED option can be used. With NOWAIT, if the selected rows cannot be locked immediately, the statement reports an error rather than waiting. With SKIP LOCKED, any selected rows that cannot be locked immediately are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general-purpose work, but can be used to avoid lock contention when multiple users access a queue-like table. Note that NOWAIT and SKIP LOCKED apply only to row-level locks — the required ROW SHARE table-level locks are still acquired in the normal manner. If you want to acquire a table-level lock without waiting, you can first use LOCK with NOWAIT.
If specific tables are mentioned in a locking clause, only rows from those tables are locked; any other tables used in the SELECT are simply read as usual. A locking clause without a table list affects all tables used in the statement. If a locking clause is applied to a view or subquery, it affects all tables used in that view or subquery. However, these clauses do not apply to WITH queries referenced by the main query. If you want row locking to occur in a WITH query, you should specify a locking clause within that WITH query.
If different locking behavior needs to be specified for different tables, multiple locking clauses can be written. If the same table is mentioned in more than one locking clause (or is implicitly affected), it will be treated with the strongest locking behavior specified. Similarly, if NOWAIT is specified in any clause affecting a table, the table will be treated as NOWAIT. Otherwise, if SKIP LOCKED is specified in any clause affecting the table, the table will be treated as SKIP LOCKED.
Locking clauses cannot be used in contexts where returned rows cannot be clearly associated with rows in a table. For example, locking clauses cannot be used with aggregates.
When a locking clause appears at the top level of a SELECT query, the rows that are locked are exactly the rows returned by the query.
In the case of a join query, the locked rows are those that contribute to the returned joined rows. Furthermore, rows that satisfy the query conditions as of the query's snapshot will be locked, but if they are subsequently updated after that snapshot and no longer satisfy the query conditions, they will not be returned. If LIMIT is used, locking stops once the number of returned rows satisfies the limit (but note that rows skipped by OFFSET will be locked). Similarly, if a locking clause is used in a cursor query, only rows actually fetched or skipped by the cursor will be locked.
When a locking clause appears in a sub-SELECT, the locked rows are those that the subquery returns to the outer query. The number of locked rows may be fewer than what appears from the subquery's own perspective, because conditions from the outer query may be used to optimize the subquery's execution. For example:
SELECT * FROM (SELECT * FROM mytable FOR UPDATE) ss WHERE col1 = 5;
will only lock rows where col1 = 5 (even though this condition is not written in the subquery).
Earlier releases could not maintain a lock that was upgraded after a subsequent savepoint. For example, this code:
BEGIN;
SELECT * FROM mytable WHERE key = 1 FOR UPDATE;
SAVEPOINT s;
UPDATE mytable SET ... WHERE key = 1;
ROLLBACK TO s;
would not maintain the FOR UPDATE lock after ROLLBACK TO.
TABLE Command
The command
TABLE name
is equivalent to
SELECT * FROM name
It can be used as a top-level command or within complex queries to save space. Only WITH, UNION, INTERSECT, EXCEPT, ORDER BY, LIMIT, OFFSET, FETCH, and FOR locking clauses can be used with TABLE. The WHERE clause and any form of aggregation cannot be used.
Examples
Join the table films with the table distributors:
SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d, films f
WHERE f.did = d.did
title | did | name | date_prod | kind
-------------------+-----+--------------+------------+----------
The Third Man | 101 | British Lion | 1949-12-23 | Drama
The African Queen | 101 | British Lion | 1951-08-11 | Romantic
...
Sum the len column of all films and group the results by kind:
SELECT kind, sum(len) AS total FROM films GROUP BY kind;
kind | total
----------+-------
Action | 07:34
Comedy | 02:58
Drama | 14:28
Musical | 06:42
Romantic | 04:38
Sum the len column of all films, group the results by kind, and display groups with a total length less than 5 hours:
SELECT kind, sum(len) AS total
FROM films
GROUP BY kind
HAVING sum(len) < interval '5 hours';
kind | total
----------+-------
Comedy | 02:58
Romantic | 04:38
The following two examples both sort the results by the contents of the second column (name):
SELECT * FROM distributors ORDER BY name;
SELECT * FROM distributors ORDER BY 2;
did | name
-----+------------------
109 | 20th Century Fox
110 | Bavaria Atelier
101 | British Lion
107 | Columbia
102 | Jean Luc Godard
113 | Luso films
104 | Mosfilm
103 | Paramount
106 | Toho
105 | United Artists
111 | Walt Disney
112 | Warner Bros.
108 | Westward
The next example shows how to obtain the union of the distributors and actors tables, restricting the results to those rows that begin with the letter W in each table. Only distinct rows are desired, so the keyword ALL is omitted.
distributors: actors:
did | name id | name
-----+-------------- ----+----------------
108 | Westward 1 | Woody Allen
111 | Walt Disney 2 | Warren Beatty
112 | Warner Bros. 3 | Walter Matthau
... ...
SELECT distributors.name
FROM distributors
WHERE distributors.name LIKE 'W%'
UNION
SELECT actors.name
FROM actors
WHERE actors.name LIKE 'W%';
name
-------------
Walt Disney
Walter Matthau
Warner Bros.
Warren Beatty
Westward
Woody Allen
This example shows how to use functions in the FROM clause, with and without a column definition list:
CREATE FUNCTION distributors(int) RETURNS SETOF distributors AS $$
SELECT * FROM distributors WHERE did = $1;
$$ LANGUAGE SQL;
1667
SELECT
SELECT * FROM distributors(111);
did | name
-----+-------------
111 | Walt Disney
CREATE FUNCTION distributors_2(int) RETURNS SETOF record AS $$
SELECT * FROM distributors WHERE did = $1;
$$ LANGUAGE SQL;
SELECT * FROM distributors_2(111) AS (f1 int, f2 text);
f1 | f2
-----+-------------
111 | Walt Disney
Here is an example of a function with an added ordinal column:
SELECT * FROM unnest(ARRAY['a','b','c','d','e','f']) WITH ORDINALITY;
unnest | ordinality
--------+----------
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
(6 rows)
This example shows how to use a simple WITH clause:
WITH t AS (
SELECT random() as x FROM generate_series(1, 3)
)
SELECT * FROM t
UNION ALL
SELECT * FROM t
x
----------------------------
0.534150459803641
0.520092216785997
0.0735620250925422
0.534150459803641
0.520092216785997
0.0735620250925422
Note that the WITH query is only evaluated once, so the two sets have the same three random values.
This example uses WITH RECURSIVE to find all subordinates (direct or indirect) of employee Mary and their distance from Mary, from a table that only shows direct subordinates:
WITH RECURSIVE employee_recursive(distance, employee_name, manager_name) AS (
SELECT 1, employee_name, manager_name
FROM employee
WHERE manager_name = 'Mary'
UNION ALL
SELECT er.distance + 1, e.employee_name, e.manager_name
FROM employee_recursive er, employee e
WHERE er.employee_name = e.manager_name
)
SELECT distance, employee_name FROM employee_recursive;
Note the typical form of a recursive query: an initial condition, followed by UNION, then the recursive part of the query.
Make sure that the recursive part of the query will eventually return no rows, otherwise the query will loop indefinitely.
This example uses LATERAL to apply a set-returning function get_product_names() for each row in the manufacturers table:
SELECT m.name AS mname, pname
FROM manufacturers m, LATERAL get_product_names(m.id) pname;
Manufacturers that currently have no products will not appear in the results, since this is an inner join. If we want to include the names of such manufacturers in the results, we can:
SELECT m.name AS mname, pname
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true;