[Monetdb-developers] Unable to shred 1gb xml file (OS: Not enough space)

Stefan Manegold Stefan.Manegold at cwi.nl
Tue Aug 1 12:03:56 CEST 2006


Hi Klarinda,

[first of all, our apologies for the late reply --- it's summer and vacation
time... ;-)]

Concerning your, it looks as if the 1GB requires more (virtual) memory
(and/or disk space) to load than is available on your system. The error
indicates that MonetDB/XQuery fails to allocate 442499072 bytes (422 MB),
while already 794361856 bytes (757 MB) of your machine's virtual memory (512
MB physical memory plus the size of your Windows installation's "page file")
are used by MonetDB. Most probably, your virtual memory is less than 
422 MB + 757 MB = 1179 MB, right?
It could also be, that you harddisk (at least the partition used by MonetDB)
has only less then 422 MB free at the time that MonetDB/XQuery tries to
allocate the extra 422 MB.

Given that, it remains open, whether MonetDB/XQuery indeed requires all the
memory to load the 1GB document, or whether there is a bug that makes it
request more memory than it should require.

To test this, we'd need to have your document --- the pure size of the
serialized document is not enough information for us to estimate how big the
internal data structure will/must be --- we also need to know (a.o.) how
many nodes the document has, what the structure look like, etc.

I tried to generate your document myself in order to analyse the problem,
but (while working fine for Text-Centric documents) the XBench/ToXgene
generator fails for me with Document-Centric documents (at least with the
"large" and "huge" ones):
========
$ perl ./xdbgen.pl
 -----------------------------------
|   XBench Database Generator v1.0  |
| (c)2002 by University of Waterloo |
 -----------------------------------

Database Class: [1]TC/SD [2]TC/MD [3]DC/SD [4]DC/MD
Please choose database class (any other key to exit): 3

Database Size: [1]Small [2]Normal [3]Large [4]Huge
Please choose database size (default is Normal): 3

Generating template templates/DCSD.tsl==>templates/newDCSD.tsl

Generating TPC-W titles/lastnames ...
sh: wgen/tpcw: cannot execute binary file
sh: wgen/tpcw: cannot execute binary file

ToXgene 1.1a - (c) 2001 by University of Toronto and IBM Corporation

 ***** Parsing template: Done!

Generating 250000 elements in items: Done!
Reading list titles from input/titles.xml:
java.lang.ArrayIndexOutOfBoundsException: 1
        at genes.lists.ToxListParser.endElement(ToxListParser.java:125)
        at
org.apache.xerces.parsers.SAXParser.endElement(SAXParser.java:1403)
        at
org.apache.xerces.validators.common.XMLValidator.callEndElement(XMLValidator.java:1480)
        at
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1204)
        at
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1122)
        at genes.lists.ToxListParser.parse(ToxListParser.java:53)
        at genes.lists.ToxList.readFromFile(ToxList.java:354)
        at genes.lists.ToxList.generate(ToxList.java:230)
        at toxgene.ToXgene.main(ToXgene.java:160)

***** UNRECOGNIZED ERROR: An unrecognized error has occurred.
Please report the following debug information to toxgene-bugs at cs.toronto.edu
Please include the template that causes this problem.

java.lang.NullPointerException
        at genes.lists.ToxList.readFromFile(ToxList.java:419)
        at genes.lists.ToxList.generate(ToxList.java:230)
        at toxgene.ToXgene.main(ToXgene.java:160)
Total elapsed time: 18759ms.
Parsing time: 835ms.
List processing time: -1154356332715ms.
Done.
========

Hence, could you please send me the (compressed!) document by email
(stefan.manegold at cwi.nl), or please it somewhere where I coudl download it
from?

One final question (for now): Which version of MonetDB/XQuery are you using?

Kind Regards,

Stefan

On Mon, Jul 24, 2006 at 05:43:33PM +0800, kla gw wrote:
> Hi,
> 
> I tried MonetDB/XQuery to shred a 1gb xml file, but it failed.
> 
> Following is the error message:
> MonetDB>shred_doc("D:xbench/output/DC1000catalog.xml", "DC1000catalog.xml");
> !ERROR: MT_mmap: MapViewOfFile(6b4, 2, 0, 0, 442499072, 0) failed
> !OS: Not enough space
> !GDKmmap(442499072) fail => BBPtrim(enter) usage[mem=101297568,vm=794361856]
> 
> I use windows XP Professional version 2002 service pack 2, Pentium 4
> CPU, 2.40GHz, 512 MB of RAM.
> 
> Previously I tried to shred 100mb xml file, and it took 18.532 sec.
> For this 1 gb file, I left it overnight so I don't know how long it
> takes till the error message occurs.
> 
> Can please help me to solve this problem?
> 
> Regards,
> 
> Klarinda
> 
> 
> Below is the complete error message:
> MonetDB>shred_doc("D:xbench/output/DC1000catalog.xml", "DC1000catalog.xml");
> !ERROR: MT_mmap: MapViewOfFile(6b4, 2, 0, 0, 442499072, 0) failed
> !OS: Not enough space
> !GDKmmap(442499072) fail => BBPtrim(enter) usage[mem=101297568,vm=794361856]
> #
> !mallinfo.arena = 15613828
> !mallinfo.ordblks = 46134
> !mallinfo.smblks = 15492
> !mallinfo.hblkhd = 0
> !mallinfo.hblks = 0
> !mallinfo.usmblks = 13718408
> !mallinfo.fsmblks = 899720
> !mallinfo.uordblks = 950740
> !mallinfo.fordblks = 44960
> #BBPTRIM_ENTER: memsize=101297568,vmsize=794361856
> #BBPTRIM: memtarget=0 vmtarget=1073741824
> #TRIMSCAN: mem=0 vm=1, start=1, limit=1
> #TRIMSCAN:      145030          0=tmp_35        (#0)
> #TRIMSCAN:      145059          1=tmp_36        (#0)
> #TRIMSCAN:      145088          2=tmp_37        (#0)
> #TRIMSCAN:      145146          3=tmp_41        (#0)
> #TRIMSCAN:      149075          4=doc_query     (#0)
> #TRIMSCAN:      149092          5=doc_sema      (#0)
> #TRIMSCAN:      155215          6=tmp_374       (#0)
> #TRIMSCAN:      155218          7=prop_pre_39   (#0)
> #TRIMSCAN:      157895          8=tmp_533       (#0)
> #TRIMSCAN:      157898          9=prop_pre_310  (#0)
> #TRIMSCAN: end at 1 (size=628)
> #TRIMSELECT: dirty = 0
> #TRIMSELECT: candidate=tmp_35 BAT*=03D6A230
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145030,145030
> ,145030)
> #TRIMSELECT: keep tmp_35 [224,0] bytes [224,0] dirty target(mem=0 vm=1073741824)
> 
> #TRIMSELECT: candidate=tmp_36 BAT*=03D66E60
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145059,145059
> ,145059)
> #TRIMSELECT: keep tmp_36 [224,0] bytes [224,0] dirty target(mem=0 vm=1073741824)
> 
> #TRIMSELECT: candidate=tmp_37 BAT*=058080B0
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145088,145088
> ,145088)
> #TRIMSELECT: keep tmp_37 [224,0] bytes [224,0] dirty target(mem=0 vm=1073741824)
> 
> #TRIMSELECT: candidate=tmp_41 BAT*=058066F0
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145146,145146
> ,145146)
> #TRIMSELECT: keep tmp_41 [224,0] bytes [224,0] dirty target(mem=0 vm=1073741824)
> 
> #TRIMSELECT: candidate=doc_query BAT*=057F9D10
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=149075,149075
> ,149075)
> #TRIMSELECT: keep doc_query [224,0] bytes [224,0] dirty target(mem=0 vm=10737418
> 24)
> #TRIMSELECT: candidate=doc_sema BAT*=05845370
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=149092,149092
> ,149092)
> #TRIMSELECT: keep doc_sema [224,0] bytes [224,0] dirty target(mem=0 vm=107374182
> 4)
> #TRIMSELECT: candidate=tmp_374 BAT*=03D722F0
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=155215,155215
> ,155215)
> #TRIMSELECT: keep tmp_374 [224,0] bytes [0,0] dirty target(mem=0 vm=1073741824)
> #TRIMSELECT: candidate=prop_pre_39 BAT*=05845CD0
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=155218,155218
> ,155218)
> #TRIMSELECT: keep prop_pre_39 [224,0] bytes [0,0] dirty target(mem=0 vm=10737418
> 24)
> #TRIMSELECT: candidate=tmp_533 BAT*=0583C598
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=157895,157895
> ,157895)
> #TRIMSELECT: keep tmp_533 [224,0] bytes [0,0] dirty target(mem=0 vm=1073741824)
> #TRIMSELECT: candidate=prop_pre_310 BAT*=057F7E70
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=157898,157898
> ,157898)
> #TRIMSELECT: keep prop_pre_310 [224,0] bytes [0,0] dirty target(mem=0 vm=1073741
> 824)
> #TRIMSELECT: end
> #TRIMSELECT: dirty = 1
> #TRIMSELECT: candidate=tmp_35 BAT*=03D6A230
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145030,145030
> ,145030)
> #TRIMSELECT: delete tmp_35 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=tmp_36 BAT*=03D66E60
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145059,145059
> ,145059)
> #TRIMSELECT: delete tmp_36 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=tmp_37 BAT*=058080B0
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145088,145088
> ,145088)
> #TRIMSELECT: delete tmp_37 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=tmp_41 BAT*=058066F0
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=145146,145146
> ,145146)
> #TRIMSELECT: delete tmp_41 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=doc_query BAT*=057F9D10
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=149075,149075
> ,149075)
> #TRIMSELECT: delete doc_query from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=doc_sema BAT*=05845370
> #            (cnt=0, mode=1024, refs=0, wait=0, parent=0, lastused=149092,149092
> ,149092)
> #TRIMSELECT: delete doc_sema from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=tmp_374 BAT*=03D722F0
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=155215,155215
> ,155215)
> #TRIMSELECT: delete tmp_374 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=prop_pre_39 BAT*=05845CD0
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=155218,155218
> ,155218)
> #TRIMSELECT: delete prop_pre_39 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=tmp_533 BAT*=0583C598
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=157895,157895
> ,157895)
> #TRIMSELECT: delete tmp_533 from trimlist (does not match trim needs)
> #TRIMSELECT: candidate=prop_pre_310 BAT*=057F7E70
> #            (cnt=0, mode=4096, refs=0, wait=0, parent=0, lastused=157898,157898
> ,157898)
> #TRIMSELECT: delete prop_pre_310 from trimlist (does not match trim needs)
> #TRIMSELECT: end
> #BBPTRIM: no more unload candidates!
> #BBPTRIM_EXIT: memsize=95140356,vmsize=794361856
> !GDKmmap(442499072) fail => BBPtrim(ready) usage[mem=101297568,vm=794361856]
> #
> !mallinfo.arena = 15613828
> !mallinfo.ordblks = 46134
> !mallinfo.smblks = 15492
> !mallinfo.hblkhd = 0
> !mallinfo.hblks = 0
> !mallinfo.usmblks = 13718408
> !mallinfo.fsmblks = 899720
> !mallinfo.uordblks = 950740
> !mallinfo.fordblks = 44960
> !ERROR: MT_mmap: MapViewOfFile(6b0, 2, 0, 0, 442499072, 0) failed
> !OS: Not enough space
> !ERROR: GDKload: cannot mmap(): name=05\552, ext=theap.priv
> !OS: Not enough space
> !ERROR: GDKload failed: name=05\552, ext=theap.priv
> !ERROR: shredder.mx:append_str2bat: APPEND-STR[PROP_TEXT](final foxes since the
> silent, quick realms should breach never sheaves--ruthless, daring waters beneat
> h the close asymptotes c), BUNappend fails
> !ERROR: CMDshred2bats: operation failed.
> MonetDB>
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Monetdb-developers mailing list
> Monetdb-developers at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-developers

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |




More information about the developers-list mailing list