[MonetDB-users] Redundant data

newer
[MonetDB-users] Extra Tables and...

older
[MonetDB-users] porting a...

Mag Gam

9 May 2009 9 May '09

1:19 p.m.

Hello All,

I am very new to MonetDB and I am very pleased with its performance at our university.

Basically, I have data which is stored on our Unix filesystem.

/research_data/life/YYYY/MM/DD/species (there are over 100 species files)

I then created a very large csv file which is traverses thru this data. The file size is about 2GB.

The table I created looks like this

t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )

The file species.csv looks like this weigh, color 3,red 7,green 4,blue

I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue

This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?

Show replies by date

Martin Kersten

9 May 9 May

1:29 p.m.

Mag Gam wrote:

...

Hello All,

Hello Mag,

What version of MonetDB are you using? on what kind of machine/os?

...

I am very new to MonetDB and I am very pleased with its performance at our university.

Basically, I have data which is stored on our Unix filesystem.

/research_data/life/YYYY/MM/DD/species (there are over 100 species files)

I then created a very large csv file which is traverses thru this data. The file size is about 2GB.

The table I created looks like this

t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )

The file species.csv looks like this weigh, color 3,red 7,green 4,blue

I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue

This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?

use primary keys in your table.... this will trap duplicate insertion.

...

The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Mag Gam

2:48 p.m.

Thanks for the quick response.

I am using Version5 (Feb 2009 release) and on Linux 64Bit.

If I use a primary key, would the insert take a longer time? I am talking about close to a billion records.

On Sat, May 9, 2009 at 9:29 AM, Martin Kersten Martin.Kersten@cwi.nl wrote:

...

Mag Gam wrote:

...
Hello All,

Hello Mag,

What version of MonetDB are you using? on what kind of machine/os?

...
I am very new to MonetDB and I am very pleased with its performance at our university.

Basically, I have data which is stored on our Unix filesystem.

/research_data/life/YYYY/MM/DD/species (there are over 100 species files)

I then created a very large csv file which is traverses thru this data. The file size is about 2GB.

The table I created looks like this

t1 ( t as timestamp species as varchar(10) weight as float color as varchar(10) )

The file species.csv looks like this weigh, color 3,red 7,green 4,blue

I have a script that basically traverses thru the filesystem and creates a big csv file for example: 2008-04-01, cat,3,red 2008-04-01, cat,7,green 2008-04-01, cat,4,blue

This works fine, but if I run the copy() operation again it will put redundant data. Is there anyway to avoid this?

use primary keys in your table.... this will trap duplicate insertion.

...

The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

5489

Age (days ago)

5489

Last active (days ago)

List overview

Download

2 comments

2 participants

tags (0)

participants (2)

Mag Gam
Martin Kersten