COPY
COPY — Copy data between a file and a table
Synopsis
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
[ WHERE condition ]
COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
TO { 'filename' | PROGRAM 'command' | STDOUT }
[ [ WITH ] ( option [, ...] ) ]
where option can be one of the following:
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
Description
COPY moves data between tables and standard file system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is already in the table). COPY TO can also copy the results of a SELECT query.
If a column list is specified, COPY TO will only copy the data in the specified columns to the file. For COPY FROM, each field in the file will be inserted sequentially into the specified columns. Table columns not listed in the COPY FROM command's column list will receive their default values.
COPY with a filename instructs the server to directly read from or write to a file. The file must be accessible and should be specified from the server's perspective. When PROGRAM is specified, the server executes the given command and reads from the program's standard output or writes to the program's standard input. When STDIN or STDOUT is specified, data is transmitted through the connection between the client and the server.
Parameters
table_name
The name of an existing table (can be schema-qualified).
column_name
An optional list of columns to be copied. If no column list is specified, all columns of the table except generated columns will be copied.
query
A SELECT, VALUES, INSERT, UPDATE, or DELETE command whose results are to be copied. Note that parentheses around the query are required.
For INSERT, UPDATE, and DELETE queries, a RETURNING clause must be provided, and the target relation cannot have conditional rules, ALSO rules, or INSTEAD rules that would expand into multiple statements.
filename
The path name of the input or output file. An input file name can be an absolute or relative path, but an output file name must be an absolute path. Windows users may need to use an E'' string and double any backslashes used in the path name.
PROGRAM
A command to be executed. In COPY FROM, input is read from the command's standard output, while in COPY TO, output is written to the command's standard input. Note that the command is invoked by the shell, so if you need to pass any arguments from untrusted sources to the shell command, you must be careful to strip out characters that may have special meaning to the shell. For security reasons, it is best to use a fixed command string, or at least avoid passing any user input into it.
STDIN
Specifies that input comes from the client application.
STDOUT
Specifies that output goes to the client application.
boolean
Specifies whether the selected option should be turned off or on. You can write TRUE, ON, or 1 to enable the option, or FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.
FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
FREEZE
Requests copying data that has already completed row freezing, as if the copy were performed after running a VACUUM FREEZE command. This is designed for initial data loading performance. Rows will only be frozen if the table being loaded has already been created or truncated in the current subtransaction, no cursors are open in the transaction, and the transaction does not hold an older snapshot. Currently, COPY FREEZE cannot be executed on partitioned tables.
Note that once successfully loaded, all other sessions will be able to see the data immediately. This violates normal MVCC visibility rules, and the user specifying this option should be aware of the potential issues this may cause.
DELIMITER
Specifies the character that separates columns within each row of the file. The default is a tab character in text format and a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format and an unquoted empty string in CSV format. You might prefer an empty string even in text format when you do not want to distinguish null values from empty strings. This option is not allowed when using binary format.
[TABLE]
HEADER
Specifies that the file contains a header line with the name of each column. On output, the first line contains the column names from the table. On input, the first line is ignored. This option is only allowed when using CSV format.
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double quotes. This must be a single one-byte character. This option is only allowed when using CSV format.
ESCAPE
Specifies the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that if the quoting character appears in the data, it is doubled). This must be a single one-byte character. This option is only allowed when using CSV format.
FORCE_QUOTE
Forces quoting to be used for all non-NULL values in each specified column. NULL output is not quoted. If * is specified, non-NULL values in all columns will be quoted. This option is only allowed in COPY TO when using CSV format.
FORCE_NOT_NULL
Do not match the values of the specified columns against the null string. In the default case where the null string is the empty string, this means that empty strings will be read as zero-length strings rather than null values (even if they are not quoted). This option is only allowed in COPY FROM when using CSV format.
FORCE_NULL
Match the values of the specified columns against the null string (even if it has been quoted), and set the value to NULL when a match is found. In the default case where the null string is the empty string, this converts a quoted empty string to NULL. This option is only allowed in COPY FROM when using CSV format.
ENCODING
Specifies that the file is encoded in encoding_name. If this option is omitted, the current client encoding is used. See the Notes below for details.
WHERE
The WHERE clause is optional; its general form is:
WHERE condition
where condition is any expression that evaluates to a boolean type. Any row that does not satisfy this condition will not be inserted into the table. When any variable references are replaced with actual row values, the row satisfies the condition if it returns true.
Currently, subqueries are not allowed in the WHERE expression, and value evaluation does not see any changes made by COPY itself (this is important when the expression contains calls to VOLATILE functions).
Output
On successful completion, a COPY command returns a command tag of the form COPY count, where count is the number of rows copied.
Notes
COPY TO can only be used with plain tables, not views. However, you can write COPY (SELECT * FROM viewname) TO ... to copy the current contents of a view.
COPY FROM can be used with plain tables, foreign tables, partitioned tables, or views that have INSTEAD OF INSERT triggers.
COPY only processes the named table; it does not copy data from or to child tables. For example, COPY table TO will show the same data as SELECT * FROM ONLY table. However, COPY (SELECT * FROM table) TO ... can be used to dump all data in an inheritance hierarchy.
You must have SELECT privilege on the table being read by COPY TO, and INSERT privilege on the table being inserted into by COPY FROM.
Having privilege on the columns listed in the command is sufficient.
If row-level security is enabled on the table, the relevant SELECT policies apply to COPY table TO statements. Currently, COPY FROM is not supported on tables with row-level security. However, equivalent INSERT statements can be used.
The files mentioned in the COPY command are read or written directly by the server (not the client application). Therefore, they must be located on or accessible from the database server's machine (not the client's). They must be accessible and readable or writable. Similarly, commands specified with PROGRAM are executed directly by the server (not the client application). Only database superusers or users granted one of the default roles pg_read_server_files, pg_write_server_files, or pg_execute_server_program are allowed to COPY a file or command, because it allows reading or writing any file the server has privilege access to, or running programs the server has privilege access to.
Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, then reads/stores data in a file accessible to the psql client. Therefore, when using \copy, file accessibility and access rights depend on the client rather than the server.
It is recommended that file names used in COPY always be specified as absolute paths. The server enforces this for COPY TO, but for COPY FROM you may choose to read from a file specified with a relative path. The path will be interpreted relative to the working directory of the server process (not the client), which is typically the cluster's data directory.
Executing a command with PROGRAM may be subject to operating system access control mechanisms such as SELinux.
COPY FROM will invoke any triggers and check constraints on the target table. However, it does not invoke rules.
For identity columns, the COPY FROM command will always write the column values provided in the input data, behaving the same as INSERT's OVERRIDING SYSTEM VALUE option.
COPY input and output are affected by DateStyle. To ensure portability to other systems that may use non-default DateStyle settings, DateStyle should be set to ISO before using COPY TO. It is also a good idea to avoid dumping data with IntervalStyle set to sql_standard, because negative interval values may be misinterpreted by servers with different IntervalStyle settings.
Even though data is read from or written to a file directly by the server without passing through the client, input data is interpreted according to the ENCODING option or the current client encoding, and output data is encoded according to ENCODING or the current client encoding. COPY stops at the first error. This does not cause problems in the case of COPY TO, but in COPY FROM the target table will have already received some rows. These rows will not become visible or accessible, but they still occupy disk space. If an error occurs during a large copy operation, this may waste considerable disk space. You may want to invoke VACUUM to recover the wasted space.
FORCE_NULL and FORCE_NOT_NULL can be used simultaneously on the same column. This causes quoted null value strings to be converted to null values and unquoted null value strings to be converted to empty strings.
File Formats
Text Format
When using the text format, a text file is read or written where each line corresponds to one row in the table. Columns within a row are separated by the delimiter character. Column values themselves are strings produced by the output function or acceptable to the input function for each attribute's data type. The specified null string is used for columns with null values. If any line in the input file contains more or fewer columns than expected, COPY FROM will raise an error.
The end of data can be represented as a single line containing only a backslash and a period (.). When reading from a file, the end-of-data marker is not necessary because the end-of-file indicator is sufficient. It is only needed when copying data with client applications using a pre-3.0 client protocol.
The backslash character () can be used in COPY data to quote characters used as row or column delimiters. In particular, if the following characters appear as part of a column value, they must be preceded by a backslash: the backslash itself, newline, carriage return, and the current delimiter character.
COPY TO returns the specified null string without adding any backslashes. Conversely, COPY FROM matches the input against the null string before removing backslashes. Therefore, a null string (such as \N) will not be confused with the actual data value \N (which would be represented as \N).
COPY FROM recognizes the following special backslash sequences:
| Sequence | Meaning |
|---|---|
| \b | Backspace (ASCII 8) |
| \f | Form feed (ASCII 12) |
| \n | Newline (ASCII 10) |
| \t | Tab (ASCII 9) |
| \v | Vertical tab (ASCII 11) |
| \digits | Backslash followed by one to three octal digits represents the character with that numeric code |
| \xdigits | Backslash plus x followed by one to three hexadecimal digits represents the character with that numeric code |
Currently, COPY TO does not emit octal or hexadecimal backslash sequences, but it does use the other sequences listed above for those control characters.
Any other backslash character not mentioned in the table above will be treated as representing itself. However, note that adding unnecessary backslashes may accidentally produce a string that matches the end-of-data marker (.) or the null string (default \N). These strings are recognized before any other backslash processing is performed.
It is strongly recommended that applications producing COPY data convert data newlines and carriage returns to \n and \r sequences, respectively. Currently, a data carriage return can be represented as a backslash and a carriage return, and a data newline can be represented as a backslash and a newline. However, future releases may not accept these representations. They are also vulnerable to corruption when transferring COPY files between different machines (e.g., from Unix to Windows).
COPY TO terminates each line with a Unix-style newline ("\n"). Servers running on Microsoft Windows output carriage return/newline ("\r\n"), but only for COPY to a server file. For cross-platform consistency, COPY TO STDOUT always sends "\n" regardless of the server platform. COPY FROM can handle lines terminated by newline, carriage return, or carriage return/newline. To reduce the risk from unescaped newlines or carriage returns in the data, COPY FROM will complain if line endings in the output are not entirely consistent.
CSV Format
This format option is used for importing and exporting comma-separated values (CSV) file format used by many other programs (such as spreadsheets). Unlike the escape rules used in the standard text format, it produces and recognizes general CSV escaping mechanisms.
Values in each record are separated by the DELIMITER character. If a value contains the delimiter character, QUOTE character, NULL string, a carriage return, or a newline character, the entire value is prefixed and suffixed with the QUOTE character, and an escape character is placed before each occurrence of the QUOTE or ESCAPE character within the value. FORCE_QUOTE can also be used to force quoting of non-NULL values in specified output columns.
CSV format has no standard way to distinguish NULL values from empty strings. COPY handles this distinction using quoting. NULL is output as the NULL parameter string and is not quoted, while non-NULL values matching the NULL parameter string are quoted. For example, with default settings, NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Value reading follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparison for specified columns. You can also use FORCE_NULL to convert quoted null value string data values to NULL.
Because the backslash is not a special character in CSV format, the end-of-data marker . can also appear as a data value.
To avoid any misinterpretation, a . data value appearing as a standalone item on a line is automatically quoted on output, and on input, if quoted, it is not interpreted as an end-of-data marker. If you are loading a file created by another application that has an unquoted column that might have a . value, you may need to quote the value in the input file.
[TABLE]
[TABLE]
[TABLE]
Binary Format
The binary format option causes all data to be stored/read in binary format rather than text format. It is somewhat faster than text and CSV formats, but binary format files are less portable across different machine architectures and different Halo versions. Also, the binary format is very data-type specific. For example, you cannot output binary data from a smallint column and read it into an integer column, even though this is feasible in text format.
The binary file format consists of a file header, zero or more tuples containing row data, and a file trailer. Headers and data are represented in network byte order.
File Header
The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area.
The fixed fields are:
Signature: an 11-byte sequence PGCOPY\n\377\r\n\0 — note that the zero byte is a necessary part of the signature (this signature is designed to easily detect files that have been corrupted by transfers that cannot correctly handle 8-bit character encodings. The signature would be altered by line-ending translation filters, zero-byte deletion, high-bit deletion, or parity modification).
Flags field:
A 32-bit integer bit mask indicating important aspects of the file format. Bits are numbered from 0 (LSB) to 31 (MSB). Note that this field is stored in network byte order (most significant byte first), as are all integer fields used in this file format. Bits 16-31 are reserved to indicate serious file format issues; a reader should abort if it finds unexpected set bits in this range. Bits 0-15 are reserved for backward-compatible format issues; a reader should simply skip any unexpected set bits in this range. Currently only one flag bit is defined; all others must be zero: Bit 16
If 1, indicates that OIDs are included in the data; if 0, they are not included.
Header extension area length
A 32-bit integer indicating the length in bytes of the remainder of the header, not including itself. Currently, this length is zero, and the first tuple immediately follows. Future changes to the format may allow additional data to be represented in the header. If a reader does not know what to do with the header extension data, it can silently skip it.
The header extension area is expected to contain a sequence of self-explanatory blocks. The flags field is not intended to tell the reader what the extension data is. Detailed design of header extension contents is left to future releases.
This design allows backward-compatible header additions (adding header extension blocks or setting low-order flag bits) and non-backward-compatible changes (setting high-order flag bits to indicate such changes and adding supporting data to the extension area as needed).
Tuples
Each tuple begins with a 16-bit integer count representing the number of fields in the tuple (currently, all tuples in a table should have the same count, but this may not always be true). Then, for each field in the tuple, there is a 32-bit length word followed by that many bytes of field data (the length word does not include itself and can be zero). As a special case, -1 indicates a NULL field value. In the NULL case, no value bytes follow.
There is no alignment padding or any other extra data between fields.
File Trailer
The file trailer consists of a 16-bit integer containing -1. This is easily distinguished from a tuple's field count word.
If a field count word is neither -1 nor the expected number of columns, the reader should report an error. This provides an additional check against certain types of data desynchronization.
Examples
The following example copies a table to the client using a pipe (|) as the field delimiter:
COPY country TO STDOUT (DELIMITER '|');
Copy data from a file into the country table:
COPY country FROM '/usr1/proj/bray/sql/country_data';
Copy only countries whose names start with 'A' to a file:
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';
To copy to a compressed file, you can pipe the output to an external compression program:
COPY country TO PROGRAM 'gzip > /usr1/proj/bray/sql/country_data.gz';
Here is data suitable for copying from STDIN into a table:
AF AFGHANISTAN AL ALBANIA DZ ALGERIA ZM ZAMBIA ZW ZIMBABWE
Note that the whitespace on each line is actually a tab character.
Below is the same data in binary format output. The data is displayed after filtering with the Unix tool od -c. The table has three columns; the first column is of type char(2), the second is text, and the third is integer. All rows have null values in the third column.
0000000 P G C O P Y \n 377 r n \0 0 0 0 0 0
0000020 0 0 0 0 003 0 0 0 002 A F 0 0 0 013 A
0000040 F G H A N I S T A N 377 377 377 377 0 003
0000060 0 0 0 002 A L 0 0 0 007 A L B A N I
0000100 A 377 377 377 377 0 003 0 0 0 002 D Z 0 0 0
0000120 007 A L G E R I A 377 377 377 377 0 003 0 0
0000140 0 002 Z M 0 0 0 006 Z A M B I A 377 377
0000160 377 377 0 003 0 0 0 002 Z W 0 0 0 b Z I
0000200 M B A B W E 377 377 377 377 377 377