[MonetDB-users] billions rows ?

Martin Kersten Martin.Kersten at cwi.nl
Sun Sep 28 14:23:07 CEST 2008


Hi Eric,

Please can you indicate what version of MonetDB you are using.

The largest table loaded at our site contained 13.000.000.000 elements.
(on a  2 quad-core system with 64Gb RAM and 6TB disk)
It requires the Current version of MonetDB from the repository, because
there where some problems in both loading and counting.
To load such a large table, you better turn off (foreign-) key
checking if you can, because those checks are notoriously
expensive using the current code base. A solution is under development.

For such large databases we used a Linux platform with
128GB swap space.

The increased load is curious. If you use the Current version
then the queries my internally run in parallel, causing all your
cores to become active. They probably wait for IO.

A general remark, you are using strings to represent the genotype
and rely on string comparison. Given the way a DBMS handles this,
it is often significantly worse then handling integer based search.

Alternatively, for the adventurous people, a column-store approach
where the genotypes are split into 3-4 columns can be considered.
Joining them back upon need.
Then each column is an array of ca 370MB which fits into memory.
The selection itself would roughly create 600MB intermediates.

success and keep us informed on the progress,
Martin
Lefteris wrote:
> Hi Eric,
>
> thank you for trying out MonetDB. What happens in your case is that
> you run out of memory. Your query becomes I/O bound and not cpu bound,
> hence the low cpu avg load. For this amount of data you will need more
> RAM and MonetDB will manage faster your workload.
>
> There are no user defined indices in Monet, however depending your
> query load you may choose to order your data in the columns that most
> of your predicates apply to.
>
> CPU resources cannot be restricted (as far as I know) from inside
> monet, but you can always do that through you OS, btw are you using
> Windows or Linux?
>
> I hope I gave you a couple of hints.
>
> Kind regards,
>
> lefteris
>
> On Sun, Sep 28, 2008 at 12:38 PM, eric Gtep <eric.gtep at gmail.com> wrote:
>   
>> Hello,
>>
>> I'm testing for store large collections of genotypes on Monetdb.
>> In few words, a genotype is a little piece of information ( about 3 or
>> 4 char ) related to an individual and a dna marker.
>> So a genotypes  looks like  contents of cell of sheet. Sometime we
>> need to access them by  individuals (cross marker), sometime by dna
>> marker (cross individual).
>> Today genotyping technologies provide  to get 600, 000,0000 genotypes by run.
>>
>> Is MonetDB able to manage efficiently tables with several billions of rows ?
>>
>> have you any example of application  with lot  ( > 1 billion) of rows
>> in only one table ?
>>
>> I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2
>> quadcores intel xeon 2.66 Ghz and 4 GB RAM with success
>> I 've created a very basic table to store genotype :
>>
>> Create table genotypes (
>>
>> ind char(10),
>> mark char(10),
>> alleles(char3)
>> )
>>
>> after I've populated this table with the "copy into table" statement
>> About 370 millions rows have been loaded after 7 minutes.
>> I haven't defined any index.
>>
>> >From mclient I sent the query below :
>>
>> select * from genotypes where alleles ="A A";
>>
>> Immediatly the server became frozen and after about ten minutes a w
>> unix command showed : load average 16 !!!!
>> I stopped the query
>> Could you explain to me what has appended ?
>> Is this behaviour normal ?
>> Is it possible to restrict the cpu ressources allowed to monetDB ?
>>
>> Thank you in advance for your advices and your help
>>
>> Eric
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> MonetDB-users mailing list
>> MonetDB-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>>
>>     
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>   





More information about the users-list mailing list