|Summary:||SQL COPY INTO FROM compressed (bz2,xz,gz) file fails|
|Product:||SQL||Reporter:||Stefan Manegold <Stefan.Manegold>|
|Component:||all||Assignee:||SQL devs <bugs-sql>|
Description Stefan Manegold 2019-08-20 13:56:17 CEST
trying to bulk-load from a compressed (bz2,xz,gz) CSV file using COPY OFFSET 3 INTO "data" FROM '$PWD/data.csv.bz2' DELIMITERS ',','\r\n','' NULL AS ''; fails with "Failed to import table 'data', incomplete record at end of file" The problem occurs with Apr2019_release (Apr2019_3), Apr2019_SP1_release (APR2019_7) and the latest head of the Apr2019 branch (changeset ad472c9f29d3).
Comment 1 Stefan Manegold 2019-08-20 14:03:03 CEST
Please find a test at http://homepages.cwi.nl/~manegold/Bug-6753.tar
Comment 2 Sjoerd Mullender 2019-08-20 14:15:33 CEST
Try delimiter '\n' instead of '\r\n'.
Comment 3 Stefan Manegold 2019-08-20 16:13:51 CEST
Thanks! Indeed, all MonetDB decompression code seems to silently replace '\r\n' (CRLF) by '\n' (LF) during decompression (at least on non-Windows systems, i.e., on systems where '\n' (LF) is the standard end-of-line marker, rather than '\r\n' (CRLF)). The plain (non-compressed) CSV reader, however, seems to not do such replacement. Consequently, when loading from CSV files that do contain '\r\n' (CRLF) line-endings, MonetDB's copy into need to specify delimiter '\r\n' when loading from a plain (uncompressed) CSV file, and delimiter '\n' when loading from a compressed CSV file. Otherwise, when using '\r\n' with compressed files, the error reported here occurs, and when using '\n' on plain (uncompressed) CSV files, '\r' occurs in the loaded data (provided the last column is a clob/string/varchar/char(-like; i.e., non-numerical) column). Please take this as an implicit documentation of this current "inconsistency" in the MonetDB code base.