[MonetDB-users] billions rows ?

Thomas Briggs tom at briggs.cx
Sun Sep 28 21:49:56 CEST 2008


   Out of curiosity... what was the 13B row dataset you loaded?  It
doesn't happen to be publicly available, does it? :)

   -Tom


On Sun, Sep 28, 2008 at 8:23 AM, Martin Kersten <Martin.Kersten at cwi.nl> wrote:
> Hi Eric,
>
> Please can you indicate what version of MonetDB you are using.
>
> The largest table loaded at our site contained 13.000.000.000 elements.
> (on a  2 quad-core system with 64Gb RAM and 6TB disk)
> It requires the Current version of MonetDB from the repository, because
> there where some problems in both loading and counting.
> To load such a large table, you better turn off (foreign-) key
> checking if you can, because those checks are notoriously
> expensive using the current code base. A solution is under development.
>
> For such large databases we used a Linux platform with
> 128GB swap space.
>
> The increased load is curious. If you use the Current version
> then the queries my internally run in parallel, causing all your
> cores to become active. They probably wait for IO.
>
> A general remark, you are using strings to represent the genotype
> and rely on string comparison. Given the way a DBMS handles this,
> it is often significantly worse then handling integer based search.
>
> Alternatively, for the adventurous people, a column-store approach
> where the genotypes are split into 3-4 columns can be considered.
> Joining them back upon need.
> Then each column is an array of ca 370MB which fits into memory.
> The selection itself would roughly create 600MB intermediates.
>
> success and keep us informed on the progress,
> Martin
> Lefteris wrote:
>> Hi Eric,
>>
>> thank you for trying out MonetDB. What happens in your case is that
>> you run out of memory. Your query becomes I/O bound and not cpu bound,
>> hence the low cpu avg load. For this amount of data you will need more
>> RAM and MonetDB will manage faster your workload.
>>
>> There are no user defined indices in Monet, however depending your
>> query load you may choose to order your data in the columns that most
>> of your predicates apply to.
>>
>> CPU resources cannot be restricted (as far as I know) from inside
>> monet, but you can always do that through you OS, btw are you using
>> Windows or Linux?
>>
>> I hope I gave you a couple of hints.
>>
>> Kind regards,
>>
>> lefteris
>>
>> On Sun, Sep 28, 2008 at 12:38 PM, eric Gtep <eric.gtep at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I'm testing for store large collections of genotypes on Monetdb.
>>> In few words, a genotype is a little piece of information ( about 3 or
>>> 4 char ) related to an individual and a dna marker.
>>> So a genotypes  looks like  contents of cell of sheet. Sometime we
>>> need to access them by  individuals (cross marker), sometime by dna
>>> marker (cross individual).
>>> Today genotyping technologies provide  to get 600, 000,0000 genotypes by run.
>>>
>>> Is MonetDB able to manage efficiently tables with several billions of rows ?
>>>
>>> have you any example of application  with lot  ( > 1 billion) of rows
>>> in only one table ?
>>>
>>> I' ve compiled and installed monetDB/SQL on a Dell PE2950 with 2
>>> quadcores intel xeon 2.66 Ghz and 4 GB RAM with success
>>> I 've created a very basic table to store genotype :
>>>
>>> Create table genotypes (
>>>
>>> ind char(10),
>>> mark char(10),
>>> alleles(char3)
>>> )
>>>
>>> after I've populated this table with the "copy into table" statement
>>> About 370 millions rows have been loaded after 7 minutes.
>>> I haven't defined any index.
>>>
>>> >From mclient I sent the query below :
>>>
>>> select * from genotypes where alleles ="A A";
>>>
>>> Immediatly the server became frozen and after about ten minutes a w
>>> unix command showed : load average 16 !!!!
>>> I stopped the query
>>> Could you explain to me what has appended ?
>>> Is this behaviour normal ?
>>> Is it possible to restrict the cpu ressources allowed to monetDB ?
>>>
>>> Thank you in advance for your advices and your help
>>>
>>> Eric
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> MonetDB-users mailing list
>>> MonetDB-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>>>
>>>
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> MonetDB-users mailing list
>> MonetDB-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> MonetDB-users mailing list
> MonetDB-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-users
>




More information about the users-list mailing list