Reading using file_loader

Reading data from a delimited text file is possible via the file_loader('file_path') functionality. The file_loader('file_path') function expects a character string literal which specifies the full path and name of the delimited text file and returns a virtual table with data.

Note The text file must be accessable and readable by the MonetDB server process. Also the data in the text file is expected to be in UTF-8 character encoding.

The file_loader('file_path') function automatically tries to find out:

  • what the field and row delimiters are,
  • how many columns the data the text file has,
  • what type of data each column is (boolean or bigint or decimal or double precision or date YYYY-MM-DD or time HH:MM or timestamp or string), and
  • at which row the data starts, so the first row could list the names of the columns.

Examples:

SELECT * FROM file_loader('/home/martin/csvfiles/movies.csv');

SELECT * FROM file_loader('/home/martin/Downloads/mtcars.tsv');

SELECT * FROM file_loader('/home/martin/csvfiles/flights-1m.csv.gz');

-- on MS Windows you can use the R prefix for Raw interpretation of the string
SELECT * FROM file_loader(R'C:\temp\csvfiles\flights-1m.csv.gz');

The file_loader('file_path') function recognises common file name extensions (.csv, .tsv and .psv) for comma, tab and pipe (the | character) separated values files and supports compressed text files (.gz, .lz4, .bz2, .xz).

As a convenience (less typing) it is also possible to omit the file_loader( function name and only specify the file name literal in the from-clause:

SELECT * FROM '/home/martin/csvfiles/movies.csv';

-- on MS Windows you can use the R prefix for Raw interpretation of the string
SELECT * FROM R'C:\temp\csvfiles\flights-1m.csv.gz';

Create table and bulk insert

It is also possible to use the query result data to create a new table and populate it in one go.

Examples:

CREATE TABLE movies_imp (movieid, title, genres) AS
  SELECT * FROM file_loader('/home/martin/csvfiles/movies.csv')
  WITH DATA;

SELECT * FROM movies_imp;


-- or use a local temp table (visible to the current session only and automatically removed at end of the session)
CREATE LOCAL TEMP TABLE movies_tmp (movieid, title, genres) AS
  SELECT * FROM file_loader('/home/martin/csvfiles/movies.csv')
  WITH DATA
  ON COMMIT PRESERVE ROWS;

SELECT * FROM tmp.movies_tmp;

Bulk insert

The query result data can also be used in an insert statement to bulk load the external data into an existing table:

-- the table needs to exist
CREATE TABLE IF NOT EXISTS movies (
  movieid varchar(32) not null,
  title varchar(120) not null,
  genres varchar(80));

-- do the bulk insert
INSERT INTO movies (movieid, title, genres)
  SELECT * FROM file_loader('/home/martin/csvfiles/movies.csv');

SELECT * FROM movies ORDER BY title LIMIT 10;