Re: COPY INTO robust to empty row at end of CSV

19 May 2014

      Stefan,
On the second half of your first email: Why would you prefer me having a
view instead of a new big table (and deleting the annual ones afterwards)?
There is little advantage of them being stored (?) partitioned by year, no?
Doesn't that kill the purpose of the columnar storage anyway? (Though it is
true, for a long panel dataset, most use cases are specific to some years
only, so quick selection based on partitions might speed things up.)

And on the code itself: You would still recommend coding these calls up in
Python, this is much harder (impossible) to code using SQL variables?

Thanks,

Laszlo

On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold wrote:
...
----- Original Message -----
...
Hi all,
Hannes Mühleisen (cc'd) generously shared his
homepages.cwi.nl/~hannes/importer.py Python script with me last week,
which
automates the creation of tables from text files: Python's csv-reader
helps
recognize the dialect of the text file as well as grab the header for
column
> names and identify the types for the columns, and then a single COPY %i
> OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to
> MonetDB (see line 404 of the script).
>
> I had some issues with this over the last three days, but now I see the
> source of the error I get: There is nothing wrong with Hannes's Python
code
> but when COPY INTO turns back to the text file, MonetDB still complains
> about the last row containing an "incomplete record" (it is empty) even
if
> Python was smart enough not to count that row into the number of records
> beforehand.
>
> Is there a way to call COPY INTO that is robust to the text file ending
in an
> extra empty row? It is common with some generated CSV or TSV files.
Use COPY n RECORDS ..., where n is the number of non-empty lines in you
CSV file.
...
By the way, I also have a follow-up about how to merge some of the tables
together. I am happy to work on this, of course, I don't expect anyone to
script this for me, but I am confused about how much I can use SQL
variables
for this (and how) versus how much I need to use another language to
script
the repeated calls to MonetDB about each year for each series of tables
(I
have dozens of dozens). I hope this is a reasonably well-posed question:
http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...
Not sure whether the WHILE loops works as such, but you could easily do
that in your application.
You also might want to consider replacing
    ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET
"year" = @i;
by, say,
    ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i;
and
CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM
data_1991 UNION ALL [...] WITH DATA;
by, say,
CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM
data_1991 UNION ALL [...];
Best,
Stefan
...
Thanks for any help,
Laszlo
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

Re: COPY INTO robust to empty row at end of CSV

László Sándor