A database is often populated with the SQL COPY statement using an ASCII representation of a table with column and record separators, e.g. a CSV file. This scheme is preferred for almost all situations. Its performance is mostly determined by parsing cost of turning ASCII value representation into the native binary one. The MonetDB copy process is highly optimized for multi-core systems, where several threads in parallel handle variable length records efficiently.
When large tables are migrated between MonetDB instances, or when large tables are produced by a trusted external program, a slightly faster loading track can be used instead. This saves rendering of data into ASCII and subsequent parsing of the data being exchanged. For this to work, the user creates the binary version of the BAT storage and 'attach' it to the SQL catalog using a variation of the COPY command.
To illustrate, consider the following snippet:
create table Tmp( i integer, f real, s string);
copy into Tmp from ('path_to_file_i', 'path_to_file_f', 'path_to_file_s');
The file names are complete paths to the columns files. They should reside on the same file system as the database farm and they will replace the content of the table Tmp. The files are copied into place, which means the originals can be garbage collected afterwards.
Each attachment file is produced by a program that writes the binary image of the BAT directly, i.e. a binary dump of an C-array. For the SQL types char (1 byte), tinyint (8-bits), smallint (16-bits), int/integer (32 bits), and bigint (64 bits) correspond with the C types directly; real and double are mapped onto type float and double in C, respectively. For variable length strings, the file should have one C-based string value per line, terminated by a newline, and it is processed without escape character conversion. MonetDB assumes that all files are aligned, i.e. the i-th value in each file corresponds to the i-th record in the table.
All other types (including UTF8 and escaped characters) should be handled using the default COPY INTO language construct.
Note that attaching string columns in this way is not possible in any release up to and including the Aug2011 release. It will be possible in the Dec2011 and later releases.
- Printer-friendly version
- Login to post comments
