postgres_fdw
The postgres_fdw module provides the foreign-data wrapper postgres_fdw, which can be used to access data stored in external Halo servers.
The functionality provided by this module substantially overlaps with that of the older dblink module. However, postgres_fdw provides more transparent and more standards-compliant syntax for accessing remote tables, and can give better performance in many cases.
To prepare for remote access using postgres_fdw:
-
Use CREATE EXTENSION to install the postgres_fdw extension.
-
Use CREATE SERVER to create a foreign server object, representing each remote database you wish to connect to. Specify the connection information, except for user and password, as options for the server object.
-
Use CREATE USER MAPPING to create a user mapping, each representing a database user that you want to allow access to a foreign server. Specify the remote user name and password as the user and password options of the user mapping.
-
Use CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA to create a foreign table for each remote table you want to access. The columns of the foreign table must match the referenced remote table. However, if you specify the correct remote name in the foreign table object's options, you can use different table and/or column names than the remote table.
Now you can simply SELECT from a foreign table to access data stored in its underlying remote table. You can also modify the remote table using INSERT, UPDATE, or DELETE (of course, the remote user specified in your user mapping must have the privileges to do these things).
Note that postgres_fdw currently lacks support for INSERT statements with an ON CONFLICT DO UPDATE clause. However, it does support the ON CONFLICT DO NOTHING clause, and the provided unique index inference specification will be omitted. Also note that postgres_fdw supports row movement invocations executed by UPDATE statements on partitioned tables, but it currently does not handle the case where the remote partition selected for moving the row into is also the UPDATE target partition that will be updated later.
It is generally recommended that the columns of a foreign table be declared with exactly the same data types and collations (if available) as the referenced remote table columns. Although postgres_fdw is now capable of performing data type conversions when needed, strange semantic anomalies may occur when types or collations do not match, since the remote server may interpret WHERE clauses differently than the local server.
Note: A foreign table can be declared with fewer columns than the underlying remote table, or with a different column order. Column matching with the remote table is done by name, not by position.
1. FDW Options for postgres_fdw
1.1. Connection Options
A foreign server using the postgres_fdw foreign-data wrapper can use the same options that libpq accepts in connection strings, except that these options are not allowed or have special handling:
• user, password, and sslpassword (should be specified in the user mapping, or use a service file)
• client_encoding (this is automatically set from the local server's encoding)
• fallback_application_name (always set to postgres_fdw)
• sslkey and sslcert — these can appear in either or both the connection and user mapping. If both are present, the user mapping setting overrides the connection setting.
Only superusers can create or modify user mappings with sslcert or sslkey settings.
Only superusers can connect to foreign servers without password authentication, so you should always specify the password option for user mappings belonging to non-superusers.
Superusers can override this check on a per-user-mapping basis by setting the user mapping option password_required 'false', for example:
ALTER USER MAPPING FOR some_non_superuser SERVER loopback_nopw OPTIONS (ADD password_required 'false');
To prevent unprivileged users from exploiting the authentication rights of the Unix user running the postgres server to escalate to superuser privileges, only superusers can set this option on user mappings.
Care must be taken to ensure that this does not allow the mapped user to connect to the mapped database as a superuser, per CVE-2007-3278 and CVE-2007-6601. Do not set password_required=false on the public role. Remember that the mapped user may use any client certificates, .pgpass, .pg_service.conf, etc. from the Unix home directory of the system user running the postgres server. They can also use any trust relationships granted by authentication methods such as peer or ident authentication.
1.2. Object Name Options
These options are needed when a foreign table is created with a name different from the underlying remote table.
schema_name
This option gives the schema name of the foreign table as used on the remote server, and can be specified for a foreign table. If this option is omitted, the schema name of the foreign table will be used.
table_name
This option gives the table name of the foreign table as used on the remote server, and can be specified for a foreign table. If this option is omitted, the name of the foreign table will be used.
column_name
This option gives the column name as used on the remote server, and can be specified for a column of a foreign table. If this option is omitted, the column's name will be used.
1.3. Cost Estimation Options
postgres_fdw retrieves remote data by executing queries on the remote server, so the ideal cost of scanning a foreign table should be the cost of performing the scan on the remote server plus some communication overhead. A reliable way to obtain this is to ask the remote server and add some communication overhead — but for simple queries, the extra round trip to get a remote estimate is not worthwhile. Therefore, postgres_fdw provides the following options to control how cost estimation is done:
use_remote_estimate
This option controls whether postgres_fdw issues EXPLAIN commands to obtain cost estimates, and can be specified for a foreign table or a foreign server. A foreign table's setting overrides any setting for its server, but only for that table. The default is false.
fdw_startup_cost
This option is a numeric value to be added to the estimated startup cost of all foreign table scans on the server. This represents the overhead of establishing a connection, parsing and planning the query on the remote side, etc. The default is 100.
fdw_tuple_cost
This option is a numeric value used as the per-tuple additional cost for foreign table scans on that server, and can be specified for a foreign server. This represents the overhead of data transfer between servers. You can increase or decrease this value to reflect higher or lower network latency to the remote server. The default is 0.01.
When use_remote_estimate is true, postgres_fdw obtains row count and cost estimates from the remote server, then adds fdw_startup_cost and fdw_tuple_cost to the cost estimate. When use_remote_estimate is false, postgres_fdw performs local row count and cost estimation, and then adds fdw_startup_cost and fdw_tuple_cost to the cost estimate. This local estimation will not be very accurate unless a local copy of the remote table's statistics is available. Running ANALYZE on the foreign table is the way to update local statistics; this performs a scan of the remote table and then calculates and stores statistics as if the table were local. Maintaining local statistics can be a useful way to reduce the pre-query planning overhead for a remote table — but if the remote table is frequently updated, the local statistics will quickly become stale.
1.4. Remote Execution Options
By default, only WHERE clauses that use built-in operators and functions are considered for execution on the remote server.
Clauses involving non-built-in functions will be checked locally after the rows are fetched. If such functions are available on the remote server and can produce the same results as local execution, performance can be improved by sending such WHERE clauses to the remote server for execution. The following option controls this behavior:
extensions
Immutable functions and operators belonging to an extension in this list will be considered for push-down to the remote server for execution. This option can only be specified for a foreign server, not on a per-table basis.
When using the extensions option, the user is responsible for ensuring that the listed extensions exist and are consistent on both the local and remote servers. Otherwise, remote queries may fail or behave unexpectedly.
fetch_size
This option specifies the number of rows that postgres_fdw should fetch in each row-fetching operation. This can be specified for a foreign table or a foreign server. An option specified on a table overrides one specified at the server level. The default is 100.
1.5. Updatability Options
By default, all foreign tables using postgres_fdw are assumed to be updatable.
This can be overridden using the following option:
updatable
This option controls whether postgres_fdw allows foreign tables to be updated using INSERT, UPDATE, and DELETE commands.
It can be specified for a foreign table or a foreign server. A table-level option overrides a server-level option.
The default is true.
Of course, if the remote table is not actually updatable, an error will be raised. The primary use of this option is to allow errors to be thrown locally without querying the remote server. But note that information_schema views will report a postgres_fdw foreign table as updatable (or not) based on this option's setting, without any checks from the remote server.
1.6. Import Options
postgres_fdw can use IMPORT FOREIGN SCHEMA to import foreign table definitions. This command creates foreign table definitions on the local server that match tables or views existing on the remote server. If the remote tables to be imported have columns with user-defined data types, compatible types of the same name must exist on the local server.
Import behavior can be customized with the following options (given in the IMPORT FOREIGN SCHEMA command):
import_collate
This option controls whether column COLLATE options are included in foreign table definitions imported from the foreign server. The default is true. You may need to turn this off if the remote server has a different set of collation names than the local server, which is likely when running on different operating systems.
import_default
This option controls whether column DEFAULT expressions are included in foreign table definitions imported from the foreign server. The default is false. If you enable this option, be careful that expressions may be evaluated differently on the remote and local servers; nextval() commonly causes such issues. If an imported default expression uses a function or operator that does not exist locally, IMPORT will fail entirely.
import_not_null
This option controls whether column NOT NULL constraints are included in foreign table definitions imported from the foreign server. The default is true.
Note: Constraints other than NOT NULL will not be imported from the remote table. Although Halo does support CHECK constraints on foreign tables, they are not automatically imported due to the risk of the local and remote servers evaluating constraint expressions differently. Any such inconsistency in CHECK constraints could lead to hard-to-detect errors in query optimization. If you want to import CHECK constraints, you must do so manually, and you should carefully verify the semantics of each such constraint.
Tables or foreign tables that are partitions of other tables are automatically excluded. Partitioned tables are imported unless they are partitions of other tables. Since all data can be accessed through the partitioned table that is the root of the partition hierarchy, this approach should allow access to all data without creating additional objects.
2. Connection Management
postgres_fdw establishes a connection to a foreign server during the first query that uses a foreign table associated with that foreign server. This connection is kept and reused for subsequent queries in the same session. However, if multiple user identities (user mappings) are used to access the foreign server, a separate connection is established for each user mapping.
3. Transaction Management
During a query that references any remote table on a foreign server, if a remote transaction has not yet been opened in the context of the current local transaction, postgres_fdw will open a transaction on the remote server. When the local transaction commits or aborts, the remote transaction is also committed or aborted. Savepoints are similarly managed by creating corresponding remote savepoints.
When the local transaction uses the SERIALIZABLE isolation level, the remote transaction uses the SERIALIZABLE isolation level; otherwise, it uses the REPEATABLE READ isolation level. If a query performs multiple table scans on the remote server, this choice ensures that it will get snapshot-consistent results for all scans. One consequence is that subsequent queries within a single transaction will see the same data from the remote server, even if concurrent updates have occurred on the remote server due to other activity. This behavior is expected when the local transaction uses SERIALIZABLE or REPEATABLE READ isolation levels, but it may seem odd for a READ COMMITTED local transaction.
Note that postgres_fdw currently does not support preparing remote transactions for two-phase commit.
4. Remote Query Optimization
postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from the foreign server. This is done by sending WHERE clauses of the query to the remote server for execution, and by not retrieving table columns that are not needed for the current query. To reduce the risk of mis-execution of queries, WHERE clauses will not be sent to the remote server unless the data types, operators, and functions used in the WHERE clause are built-in or belong to an extension listed in the foreign server's extensions option. Operators and functions in these clauses must also be IMMUTABLE. For UPDATE or DELETE queries, if there are no WHERE clauses that cannot be sent to the remote server, no local joins in the query, no local row-level BEFORE or AFTER triggers or stored generated columns on the target table, and no CHECK OPTION constraints from views, postgres_fdw attempts to optimize query execution by sending the entire query to the remote server.
In UPDATE, expressions assigned to target columns can only use built-in data types, IMMUTABLE operators, or IMMUTABLE functions, to reduce the risk of query mis-execution.
When postgres_fdw encounters joins between foreign tables on the same foreign server, it sends the entire join to the foreign server, unless for some reason it considers it more efficient to fetch rows from each table individually, or the involved table references belong to different user mappings. When sending JOIN clauses, it takes the same precautions as described above for WHERE clauses.
The actual query sent to the remote server for execution can be examined using EXPLAIN VERBOSE.
5. Remote Query Execution Environment
In a remote session opened by postgres_fdw, the search_path parameter is set to only pg_catalog, so only built-in objects are visible without schema qualification. This is not an issue for queries generated by postgres_fdw itself, since it always provides such qualifications. However, this could cause problems for functions executed on the remote server through triggers or rules on remote tables. For example, if a remote table is actually a view, any functions used in that view will be executed in this restricted search path. We recommend schema-qualifying all names in such functions, or attaching a SET search_path option to such functions (see CREATE FUNCTION) to establish the search path environment they expect.
postgres_fdw also establishes remote session settings for various parameters:
• TimeZone is set to UTC
• DateStyle is set to ISO
• IntervalStyle is set to postgres
• extra_float_digits is set to 3
These are less problematic than search_path, but can be handled using function SET options if needed.
We do not recommend overriding this behavior by changing session-level settings for these parameters, as this will likely cause postgres_fdw to malfunction.
6. Examples
Here is an example of creating a foreign table with postgres_fdw. First install the extension:
CREATE EXTENSION postgres_fdw;
Then use CREATE SERVER to create a foreign server. In this example, we want to connect to a Halo server located at host 192.83.123.89 listening on port 5432. The database to connect to on the remote server is foreign_db:
test=## CREATE SERVER foreign_server
test-## FOREIGN DATA WRAPPER postgres_fdw
test-## OPTIONS (host '192.83.123.89', port '5432', dbname 'foreign_db');
CREATE SERVER
A user mapping needs to be defined with CREATE USER MAPPING to identify which role to use on the remote server:
CREATE USER MAPPING FOR local_user
SERVER foreign_server
OPTIONS (user 'foreign_user', password 'password');
Now you can create a foreign table using CREATE FOREIGN TABLE. In this example, we want to access a table named some_schema.some_table on the remote server. Its local name is foreign_table:
test=## CREATE FOREIGN TABLE foreign_table (
test(## id integer NOT NULL,
test(## data text
test(## )
test-## SERVER foreign_server
test-## OPTIONS (schema_name 'some_schema', table_name 'some_table');
The column data types and other properties declared in CREATE FOREIGN TABLE must match the actual remote table. Column names must also match, though you can attach the column_name option to individual columns to indicate which column they correspond to on the remote server. In many cases, using IMPORT FOREIGN SCHEMA is better than manually constructing foreign table definitions.