[MonetDB-users] Redundant data
Hello All,
I am very new to MonetDB and I am very pleased with its performance at our university.
Basically, I have data which is stored on our Unix filesystem.
/research_data/life/YYYY/MM/DD/species (there are over 100 species files)
I then created a very large csv file which is traverses thru this data. The file size is about 2GB.
The table I created looks like this
t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )
The file species.csv looks like this weigh, color 3,red 7,green 4,blue
I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue
This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
Mag Gam wrote:
Hello All,
Hello Mag,
What version of MonetDB are you using? on what kind of machine/os?
I am very new to MonetDB and I am very pleased with its performance at our university.
Basically, I have data which is stored on our Unix filesystem.
/research_data/life/YYYY/MM/DD/species (there are over 100 species files)
I then created a very large csv file which is traverses thru this data. The file size is about 2GB.
The table I created looks like this
t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )
The file species.csv looks like this weigh, color 3,red 7,green 4,blue
I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue
This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
use primary keys in your table.... this will trap duplicate insertion.
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Thanks for the quick response.
I am using Version5 (Feb 2009 release) and on Linux 64Bit.
If I use a primary key, would the insert take a longer time? I am talking about close to a billion records.
On Sat, May 9, 2009 at 9:29 AM, Martin Kersten Martin.Kersten@cwi.nl wrote:
Mag Gam wrote:
Hello All,
Hello Mag,
What version of MonetDB are you using? on what kind of machine/os?
I am very new to MonetDB and I am very pleased with its performance at our university.
Basically, I have data which is stored on our Unix filesystem.
/research_data/life/YYYY/MM/DD/species (there are over 100 species files)
I then created a very large csv file which is traverses thru this data. The file size is about 2GB.
The table I created looks like this
t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )
The file species.csv looks like this weigh, color 3,red 7,green 4,blue
I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue
This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?
use primary keys in your table.... this will trap duplicate insertion.
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
Mag Gam
-
Martin Kersten