COPY INTO robust to empty row at end of CSV

Stefan Manegold Stefan.Manegold at cwi.nl
Mon May 19 10:44:01 CEST 2014


No, it's the intended behavior that if an empty line is read MonetDB complains that it (obviously) does not comply with the expected format.
This is independent of specifying number of records to be read or not.
However, by specifying the number of records to be loaded, the trailing lines (if any)
that are in the file but not to be loaded are not read and not checked for format compliance.

E.g.,
with a file F containing
1,2
3,4
<empty line>

and a table T with two integer columns
copy into from 'F' delimiters ',','\n'
copy 3 records from 'F' delimiters ',','\n'
copy 2 offset 1 records from 'F' delimiters ',','\n'
will all fail while
copy 2 records from 'F' delimiters ',','\n'
copy 1 offset 1 records from 'F' delimiters ',','\n'
will (should) work.

S. 

----- Original Message -----
> Thanks, I have not. So that is the intended behavior after specifying a
> number of records? Maybe it is an issue with counting the header in or
> something.
> 
> 
> On Mon, May 19, 2014 at 9:58 AM, Stefan Manegold < Stefan.Manegold at cwi.nl >
> wrote:
> 
> 
> 
> Then I suspect that "records" has the wrong value, i.e., it is also counting
> the empty line.
> Did/could you double-check?
> 
> ----- Original Message -----
> > Thanks, Stefan, on COPY:
> > 
> > The current script issues the following command, and it still runs into the
> > problem with the empty line/row in the end:
> > 
> > copy ='COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS
> > %r,%r,%r NULL AS \'\' LOCKED;' %
> > (records,offset,tname,f,dialect.delimiter,eol,dialect.quotechar)
> > 
> > 
> > 
> > On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold < Stefan.Manegold at cwi.nl >
> > wrote:
> > 
> > 
> > 
> > 
> > 
> > ----- Original Message -----
> > > Hi all,
> > > 
> > > Hannes Mühleisen (cc'd) generously shared his
> > > homepages.cwi.nl/~hannes/importer.py Python script with me last week,
> > > which
> > > automates the creation of tables from text files: Python's csv-reader
> > > helps
> > > recognize the dialect of the text file as well as grab the header for
> > > column
> > > names and identify the types for the columns, and then a single COPY %i
> > > OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to
> > > MonetDB (see line 404 of the script).
> > > 
> > > I had some issues with this over the last three days, but now I see the
> > > source of the error I get: There is nothing wrong with Hannes's Python
> > > code
> > > but when COPY INTO turns back to the text file, MonetDB still complains
> > > about the last row containing an "incomplete record" (it is empty) even
> > > if
> > > Python was smart enough not to count that row into the number of records
> > > beforehand.
> > > 
> > > Is there a way to call COPY INTO that is robust to the text file ending
> > > in
> > > an
> > > extra empty row? It is common with some generated CSV or TSV files.
> > 
> > Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV
> > file.
> > 
> > > By the way, I also have a follow-up about how to merge some of the tables
> > > together. I am happy to work on this, of course, I don't expect anyone to
> > > script this for me, but I am confused about how much I can use SQL
> > > variables
> > > for this (and how) versus how much I need to use another language to
> > > script
> > > the repeated calls to MonetDB about each year for each series of tables
> > > (I
> > > have dozens of dozens). I hope this is a reasonably well-posed question:
> > > http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-tables-annual-tables-into-a-single-table-of-all
> > 
> > Not sure whether the WHILE loops works as such, but you could easily do
> > that
> > in your application.
> > 
> > You also might want to consider replacing
> > ALTER TABLE data_ at i ADD COLUMN "year" INTEGER; UPDATE data_ at i SET "year" =
> > @i;
> > by, say,
> > ALTER TABLE data_ at i ADD COLUMN "year" INTEGER DEFAULT @i;
> > and
> > CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM
> > data_1991 UNION ALL [...] WITH DATA;
> > by, say,
> > CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM
> > data_1991
> > UNION ALL [...];
> > 
> > Best,
> > Stefan
> > 
> > > Thanks for any help,
> > > 
> > > Laszlo
> > > 
> > > _______________________________________________
> > > users-list mailing list
> > > users-list at monetdb.org
> > > https://www.monetdb.org/mailman/listinfo/users-list
> > > 
> > 
> > --
> > | Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> > | www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> > | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> > 
> > _______________________________________________
> > users-list mailing list
> > users-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/users-list
> > 
> > 
> > _______________________________________________
> > users-list mailing list
> > users-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/users-list
> > 
> 
> --
> | Stefan.Manegold at CWI.nl | DB Architectures (DA) |
> | www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
> 
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
> 
> 
> _______________________________________________
> users-list mailing list
> users-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
> 

-- 
| Stefan.Manegold at CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |



More information about the users-list mailing list