COPY INTO robust to empty row at end of CSV

László Sándor

18 May 2014 18 May '14

9:10 p.m.

Hi all, Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for column names and identify the types for the columns, and then a single COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to MonetDB (see line 404 of the script). I had some issues with this over the last three days, but now I see the source of the error I get: There is nothing wrong with Hannes's Python code but when COPY INTO turns back to the text file, MonetDB still complains about the last row containing an "incomplete record" (it is empty) even if Python was smart enough not to count that row into the number of records beforehand. Is there a way to call COPY INTO that is robust to the text file ending in an extra empty row? It is common with some generated CSV or TSV files. By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question: http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t... Thanks for any help, Laszlo

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Stefan Manegold

19 May 19 May

6:11 a.m.

----- Original Message -----

...

Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for column names and identify the types for the columns, and then a single COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to MonetDB (see line 404 of the script).

I had some issues with this over the last three days, but now I see the source of the error I get: There is nothing wrong with Hannes's Python code but when COPY INTO turns back to the text file, MonetDB still complains about the last row containing an "incomplete record" (it is empty) even if Python was smart enough not to count that row into the number of records beforehand.

Is there a way to call COPY INTO that is robust to the text file ending in an extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...

By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question: http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application. You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...]; Best, Stefan

...

Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

László Sándor

7:45 a.m.

Thanks, Stefan, on COPY: The current script issues the following command, and it still runs into the problem with the empty line/row in the end: copy ='COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS %r,%r,%r NULL AS \'\' LOCKED;' % (records,offset,tname,f,dialect.delimiter,eol,dialect.quotechar) On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold wrote:

...

----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for

column > names and identify the types for the columns, and then a single COPY %i > OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to > MonetDB (see line 404 of the script). > > I had some issues with this over the last three days, but now I see the > source of the error I get: There is nothing wrong with Hannes's Python code > but when COPY INTO turns back to the text file, MonetDB still complains > about the last row containing an "incomplete record" (it is empty) even if > Python was smart enough not to count that row into the number of records > beforehand. > > Is there a way to call COPY INTO that is robust to the text file ending in an > extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question:

http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

7:58 a.m.

Then I suspect that "records" has the wrong value, i.e., it is also counting the empty line. Did/could you double-check? ----- Original Message -----

...

Thanks, Stefan, on COPY:

The current script issues the following command, and it still runs into the problem with the empty line/row in the end:

copy ='COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS %r,%r,%r NULL AS \'\' LOCKED;' % (records,offset,tname,f,dialect.delimiter,eol,dialect.quotechar)

On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:

----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for column names and identify the types for the columns, and then a single COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to MonetDB (see line 404 of the script).

I had some issues with this over the last three days, but now I see the source of the error I get: There is nothing wrong with Hannes's Python code but when COPY INTO turns back to the text file, MonetDB still complains about the last row containing an "incomplete record" (it is empty) even if Python was smart enough not to count that row into the number of records beforehand.

Is there a way to call COPY INTO that is robust to the text file ending in an extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question: http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

László Sándor

8:04 a.m.

Thanks, I have not. So that is the intended behavior after specifying a number of records? Maybe it is an issue with counting the header in or something. On Mon, May 19, 2014 at 9:58 AM, Stefan Manegold wrote:

...

Then I suspect that "records" has the wrong value, i.e., it is also counting the empty line. Did/could you double-check?

...
Thanks, Stefan, on COPY:

The current script issues the following command, and it still runs into

...
problem with the empty line/row in the end:

copy ='COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS %r,%r,%r NULL AS \'\' LOCKED;' % (records,offset,tname,f,dialect.delimiter,eol,dialect.quotechar)

On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:

----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for column names and identify the types for the columns, and then a single COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to MonetDB (see line 404 of the script).

I had some issues with this over the last three days, but now I see the source of the error I get: There is nothing wrong with Hannes's Python code but when COPY INTO turns back to the text file, MonetDB still complains about the last row containing an "incomplete record" (it is empty) even if Python was smart enough not to count that row into the number of records beforehand.

Is there a way to call COPY INTO that is robust to the text file ending in an extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the

...
...
together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to

...
...
the repeated calls to MonetDB about each year for each series of

...
...
have dozens of dozens). I hope this is a reasonably well-posed question:

http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do

----- Original Message ----- the tables script tables (I that

...
in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Stefan Manegold

8:44 a.m.

No, it's the intended behavior that if an empty line is read MonetDB complains that it (obviously) does not comply with the expected format. This is independent of specifying number of records to be read or not. However, by specifying the number of records to be loaded, the trailing lines (if any) that are in the file but not to be loaded are not read and not checked for format compliance. E.g., with a file F containing 1,2 3,4 <empty line> and a table T with two integer columns copy into from 'F' delimiters ',','\n' copy 3 records from 'F' delimiters ',','\n' copy 2 offset 1 records from 'F' delimiters ',','\n' will all fail while copy 2 records from 'F' delimiters ',','\n' copy 1 offset 1 records from 'F' delimiters ',','\n' will (should) work. S. ----- Original Message -----

...

Thanks, I have not. So that is the intended behavior after specifying a number of records? Maybe it is an issue with counting the header in or something.

On Mon, May 19, 2014 at 9:58 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:

Then I suspect that "records" has the wrong value, i.e., it is also counting the empty line. Did/could you double-check?

----- Original Message -----

...
Thanks, Stefan, on COPY:

The current script issues the following command, and it still runs into the problem with the empty line/row in the end:

copy ='COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS %r,%r,%r NULL AS \'\' LOCKED;' % (records,offset,tname,f,dialect.delimiter,eol,dialect.quotechar)

On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold < Stefan.Manegold@cwi.nl > wrote:

----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for column names and identify the types for the columns, and then a single COPY %i OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to MonetDB (see line 404 of the script).

I had some issues with this over the last three days, but now I see the source of the error I get: There is nothing wrong with Hannes's Python code but when COPY INTO turns back to the text file, MonetDB still complains about the last row containing an "incomplete record" (it is empty) even if Python was smart enough not to count that row into the number of records beforehand.

Is there a way to call COPY INTO that is robust to the text file ending in an extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question: http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

László Sándor

4:06 p.m.

Stefan, On the second half of your first email: Why would you prefer me having a view instead of a new big table (and deleting the annual ones afterwards)? There is little advantage of them being stored (?) partitioned by year, no? Doesn't that kill the purpose of the columnar storage anyway? (Though it is true, for a long panel dataset, most use cases are specific to some years only, so quick selection based on partitions might speed things up.) And on the code itself: You would still recommend coding these calls up in Python, this is much harder (impossible) to code using SQL variables? Thanks, Laszlo On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold wrote:

...

----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for

column > names and identify the types for the columns, and then a single COPY %i > OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to > MonetDB (see line 404 of the script). > > I had some issues with this over the last three days, but now I see the > source of the error I get: There is nothing wrong with Hannes's Python code > but when COPY INTO turns back to the text file, MonetDB still complains > about the last row containing an "incomplete record" (it is empty) even if > Python was smart enough not to count that row into the number of records > beforehand. > > Is there a way to call COPY INTO that is robust to the text file ending in an > extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question:

http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

László Sándor

7:53 p.m.

I think I have a better handle on how and why to code this up in "my application" (an overstatement for my Python script), thanks for bearing with me. If you care, below is my first attempt for this using the Python library. Sorry about the messy renaming, I received messy partitions from data sources. I could not test it as there is still an issue with "incomplete record at end of file," even if I specify the number of records precisely — Hannes was kind enough to work on the Python script for this, and we do not see the error on macs, but I am tied to a Windows server for my analysis. You can try the mock data file attached with http://homepages.cwi.nl/~hannes/importer.py . I got the error in Python 2.7.5 (from Anaconda, 64-bit) and Windows Server 2012. Maybe there is one question I already have on the Python script: Can I rename the table not to use the "sys." beginning, or that's part of the architecture? And by the way, if you can recommend your favorite SQL:2003 tutorial that is compatible with MonetDB, I would be happy to guide my coworkers to it! Thanks, Laszlo # script to merge tables in MonetDB import re from monetdb import mapi server = mapi.Server() server.connect(hostname="localhost", port=50000, username="monetdb", password="monetdb", database="dbfarm", language="sql") def tablemerge(stub,yearlist): for year in yearlist: server.cmd('ALTER TABLE %s_%d ADD COLUMN "year" INTEGER DEFAULT %d;' % (stub,year,year)) newstub = re.sub(r'sys.ds_chocker_lev_', r'', stub) if year == yearlist[0]: unioncall = 'CREATE TABLE %s AS SELECT * FROM %s_%d ' % (newstub,stub,year) else: unioncall += 'UNION ALL SELECT * FROM %s_%d ' % (stub,year) unioncall += ';' server.cmd(unioncall) for year in yearlist: server.cmd('DROP TABLE %s_%d;' % (stub,year)) print '%s done.' % stub def treatmentmerge(stub,stringlist): for file in stringlist: if file == stringlist[0]: unioncall = 'CREATE TABLE %s AS SELECT * FROM %s%s ' % (stub,stub,file) else: unioncall += 'UNION ALL SELECT * FROM %s%s ' % (stub,file) unioncall += ';' server.cmd(unioncall) for file in stringlist: server.cmd('DROP TABLE %s%s;' % (stub,file)) print '%s done.' % stub def kurumerge(st,yearlist): stub = 'ku'+str(st) for year in yearlist: server.cmd('ALTER TABLE sys.ds_chocker_lev_kuru%d_%s ADD COLUMN "year" INTEGER DEFAULT %d;' % (year,stub,year)) if year == yearlist[0]: unioncall = 'CREATE TABLE %s AS SELECT * FROM sys.ds_chocker_lev_kuru%d_%s ' % (stub,year,stub) else: unioncall += 'UNION ALL SELECT * FROM %s_%d ' % (stub,year) unioncall += ';' server.cmd(unioncall) for year in yearlist: server.cmd('DROP TABLE sys.ds_chocker_lev_kuru%d_%s;' % (year,stub)) print '%s done.' % stub def drugsmerge(): for year in xrange(2005,2013): for month in xrange(1,13): if year == 2005 & month < 7: continue server.cmd('ALTER TABLE lmed_%d_mon%d ADD COLUMN "year" INTEGER DEFAULT %d;' % (year,month,year)) server.cmd('ALTER TABLE lmed_%d_mon%d ADD COLUMN "month" INTEGER DEFAULT %d;' % (year,month,month)) if year == 2005 & month == 7: unioncall = 'CREATE TABLE lmed AS SELECT * FROM lmed_%d_mon%d ' % (year,month) else: unioncall += 'UNION ALL SELECT * FROM lmed_%d_mon%d ' % (year,month) unioncall += ';' server.cmd(unioncall) for year in yearlist: server.cmd('DROP TABLE lmed_%d_mon%d;' % (year,month)) print 'lmed done.' for stub in ['civandr']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1998,2013)) # use with xrange(start,end+1) for stub in ['civil']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1968,2013)) for stub in ['iot']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1990,2013)) for stub in ['fas']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(2000,2013)) for stub in ['form']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1999,2008)) for stub in ['jobb']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1985,2012)) for stub in ['lisaftg','lisaindivid']: tablemerge('sys.ds_chocker_lev_%s' % stub,xrange(1990,2012)) for st in [21,26,31,33]: kurumerge(st,xrange(1999,2008)) for st in [20,22,25,32,40,50,55]: kurumerge(st,xrange(1999,2000).extend(xrange(2003,2008))) for st in [34,35,52,56]: kurumerge(st,xrange(2003,2008)) drugsmerge() treatmentmerge('sys.oppenvard',['2011','2012','20012005','20062010']) treatmentmerge('sys.slutenvard',['2011','2012','19871996','1997','19982004','20052010']) On Mon, May 19, 2014 at 6:06 PM, László Sándor wrote:

...

Stefan, On the second half of your first email: Why would you prefer me having a view instead of a new big table (and deleting the annual ones afterwards)? There is little advantage of them being stored (?) partitioned by year, no? Doesn't that kill the purpose of the columnar storage anyway? (Though it is true, for a long panel dataset, most use cases are specific to some years only, so quick selection based on partitions might speed things up.)

And on the code itself: You would still recommend coding these calls up in Python, this is much harder (impossible) to code using SQL variables?

Thanks,

Laszlo

On Mon, May 19, 2014 at 8:11 AM, Stefan Manegold wrote:

...
----- Original Message -----

...
Hi all,

Hannes Mühleisen (cc'd) generously shared his homepages.cwi.nl/~hannes/importer.py Python script with me last week, which automates the creation of tables from text files: Python's csv-reader helps recognize the dialect of the text file as well as grab the header for

column > names and identify the types for the columns, and then a single COPY %i > OFFSET %i RECORDS INTO %s FROM \'%s\' USING DELIMITERS … is issued to > MonetDB (see line 404 of the script). > > I had some issues with this over the last three days, but now I see the > source of the error I get: There is nothing wrong with Hannes's Python code > but when COPY INTO turns back to the text file, MonetDB still complains > about the last row containing an "incomplete record" (it is empty) even if > Python was smart enough not to count that row into the number of records > beforehand. > > Is there a way to call COPY INTO that is robust to the text file ending in an > extra empty row? It is common with some generated CSV or TSV files.

Use COPY n RECORDS ..., where n is the number of non-empty lines in you CSV file.

...
By the way, I also have a follow-up about how to merge some of the tables together. I am happy to work on this, of course, I don't expect anyone to script this for me, but I am confused about how much I can use SQL variables for this (and how) versus how much I need to use another language to script the repeated calls to MonetDB about each year for each series of tables (I have dozens of dozens). I hope this is a reasonably well-posed question:

http://dba.stackexchange.com/questions/65329/dynamic-sql-merge-partitioned-t...

Not sure whether the WHILE loops works as such, but you could easily do that in your application.

You also might want to consider replacing ALTER TABLE data_@i ADD COLUMN "year" INTEGER; UPDATE data_@i SET "year" = @i; by, say, ALTER TABLE data_@i ADD COLUMN "year" INTEGER DEFAULT @i; and CREATE TABLE data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...] WITH DATA; by, say, CREATE VIEW data AS SELECT * FROM data_1990 UNION ALL SELECT * FROM data_1991 UNION ALL [...];

Best, Stefan

...
Thanks for any help,

Laszlo

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

3819

Age (days ago)

3820

Last active (days ago)

List overview

Download

7 comments

2 participants

participants (2)

László Sándor
Stefan Manegold