Bulk input/output

Bulk input/output mk Sun, 03/28/2010 - 00:48

The COPY INTO command enables fast insertion of multiple tuples from an text file or standard input. Each tuple  in the input file is terminated by a record-separator (default '\n') and fields are separated by the field separator (default '|'). The field values should adhere to the  syntax for value literals. Alternative separators should adhere to the SQL lexical syntax for string values. A different NULL value representation can be passed using the NULL as null_string option. Furthermore, the fields are optionally enclosed with a user defined quote character. The text file should use UTF-8 encoding if specified as file_name, and the same encoding as mclient is using if read using FROM STDIN.

It is strongly advised to announce the maximum number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of table space which may involve potentially expensive copying. A portion of the input file can be skipped using the offset feature.  The offset value specifies the record at which loading should commence, the first record having offset 1.

The file read from must be accessible by the server. Therefore, they must reside on or be accessible to the database server machine and they must be identified with an absolute path name. The STDIN file designator reads data streaming from the client application. An empty record determines the end of sequence.

COPY [ int_val [ OFFSET int_val ] RECORDS ] INTO table_name
      FROM ( file_name ',' ... | STDIN )
      [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ] [LOCKED]

COPY [ int_val RECORDS ] OFFSET int_val INTO table_name
      FROM ( file_name ',' ... | STDIN )
      [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ] [LOCKED]

The input syntax should comply to the following grammar: [ [ [quote] [[escape]char] * [quote]] feldspar] * record separator. Quote characters in quoted fields may be escaped with a backslash. Field and record separators can be embedded in quoted fields.


In many bulk loading situations, the original file can be saved as a backup or recreated for disaster handling. This reliefs the database system from having to prepare for recovery as well and to safe significant storage space. The LOCKED qualifier can be used in this situation (and in single user mode!) to skip the logging operation normally performed.

WARNING It is advised to add integrity constraints to the table after the file has been loaded. The ALTER statements perform bulk integrity checking and perform these checks often more efficiently.

For more see the CSV bulk load recipe.

Copy into File

The COPY INTO command with a file name argument allows for fast dumping of a result set into an ASCII file. The file must be accessible by the server and a full path name may be required. The file STDOUT can be used to direct the result to the primary output channel.

The delimiters and NULL AS arguments provide control over the layout required.  

COPY subquery INTO file_name [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ]

For both the input and output version of the COPY INTO command one can specify a file name ending with 'gz' or 'bz2'  to use the appropriate compression library (if available).

For more see the Exporting bulk data recipe.

Copying binary files

Migration of tables between MonetDB/SQL instances can be sped up using the binary COPY INTO/FROM format.

For more see the Binary bulk data load recipe.