[Monetdb-developers] Fixing the update issue for Large XML documents

Peter Boncz P.Boncz at cwi.nl
Thu Oct 11 13:07:52 CEST 2007


Thanks Stefan,

This is much better. I have started looking. 

In case I find that there is something wrong here, a bug should be opened at
sourceforge. That is in http://sf.net/projects/monetdb, choosing Tracker ->
Bugs.

I am currently importing the dataset -- the shredding is already taking an hour.
This is due to many hash collisions on the attributes that have a purely numeric
value. Your document has no text nodes, but lots of those attributes. I already
noted that the MonetDB hash function is very fragile in this domain. 

I think I will open a bug report on that one. Bad thing is that fixing it will
alter our binary repository format. But that has been done before.

Will keep you posted about my progress in reproducing your remap error. I am
trying with 64-bits compilation and 64-bits oids, so there should be no
scalability problems.. Thing is I will be using fedora core, not gentoo.

Peter


> -----Original Message-----
> From: Stefan de Konink [mailto:skinkie at xs4all.nl] 
> Sent: Thursday, October 11, 2007 10:19 AM
> To: P.Boncz at cwi.nl
> Cc: monetdb-developers at lists.sourceforge.net
> Subject: Re: Monetdb-developers Digest, Vol 17, Issue 6 (was: 
> Fixing the update issue for Large XML documents)
> 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Dear Peter,
> 
> 
> Peter Boncz schreef:
> > Thanks for your question, but I regret to inform you that 
> your way of
> > communicating technical problems severely falls short of 
> what is required by
> > technical etiquette:
> > 
> > (1) ambiguous software version description: you cannot be 
> running MonetDB4 and
> > MonetDB5 at the same time. Even more, I want to know the 
> exact version number of
> > the software, whether it was a 32-bits or 64-bits build 
> (and if so, whether it
> > used 64-bits or 32-bits oids).
> 
> Why can't I run MonetDB4 and MonetDB5, in respect to XML and 
> SQL at the
> same time? In this case I'm reporting a bug for MonetDB4, with the
> pathfinder module.
> 
> Monet Database Server V4.18.2
> Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs;
> 
> > (2) major omissions in your platform description. I presume 
> you have installed
> > an operating system on your Xeon. Would you care to tell us 
> which operating
> > system that is?
> 
> Gentoo, Linux, xen01 2.6.20-xen-r3 #3 SMP Sun Oct 7 05:22:20 CEST 2007
> x86_64 Intel(R) Xeon(R) CPU L5320 @ 1.86GHz GenuineIntel GNU/Linux
> 
> (Before further questions: it is the host machine, not a 
> virtual server)
> 
> > (3) ill-described reproduction path. Your bug report 
> suggests being about
> > updates, but as far as I read your email, I doubt you did 
> any updates yet. This
> > is again is crucial information.
> 
> - - Download 
> http://mirror.openstreetmap.nl/planet/planet-071003.osm.bz2
> - - Decompress the document.
> - - Start the MonetDB4 server with Pathfinder module.
> - - Add this document to the database server with 5 procent space for
> updates.
> - - Request any operation to this document will result in the failure
> reported.
> 
> This error doesn't happen if the document was added without 
> the ability
> to update the document.
> 
> > The error you see is the "remap" call failing -- the most 
> probable cause of
> > error is VM space shortage. For MonetDB to work on 20GB 
> size datasets you must
> > use a 64-bits operating system and MonetDB binary, and may 
> even 64-bits oids
> > (because your volume of text nodes is likely to be significant).
> 
> This doesn't explain the fact that the readonly version works.
> 
> Also for you my current ls()
> 
> MonetDB>ls();
> #-------------------------------------------------------------
> --------------------------------------------------------------
> --------------#
> # name                          htype   ttype           count
> heat    dirty           status  kind    refcnt  lrefcnt       
>     # name
> # str                           str     str             lng
> int     str             str     str     int     int           
>     # type
> #-------------------------------------------------------------
> --------------------------------------------------------------
> --------------#
> [ "1000000000_attr_own",          "void", "oid",          747750437,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_attr_prop",         "void", "oid",          747750437,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_attr_qn",           "void", "oid",          747750437,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_frag_root",         "oid",  "oid",          1,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000000_map_pid",           "void", "void",         42822,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_nid_rid",           "void", "void",         701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_prop_com",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_prop_ins",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_prop_text",         "void", "str",          3,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_prop_tgt",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_prop_val",          "void", "str",          118077679,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_histogram",      "void", "lng",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_loc",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_nid",            "oid",  "oid",          271876931,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000000_qn_prefix",         "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_prefix_uri_loc", "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_uri",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_qn_uri_loc",        "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_rid_kind",          "void", "chr",          701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_rid_level",         "void", "chr",          701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_rid_nid",           "void", "void",         701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_rid_prop",          "void", "oid",          701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_rid_size",          "void", "int",          701587412,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000000_vx_hsh_nid",        "int",  "oid",          402481184,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000001_attr_own",          "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_attr_prop",         "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_attr_qn",           "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_frag_root",         "oid",  "oid",          1,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000001_map_pid",           "void", "oid",          45450,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_nid_rid",           "void", "oid",          707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_prop_com",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_prop_ins",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_prop_text",         "void", "str",          3,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_prop_tgt",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_prop_val",          "void", "str",          119784146,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_histogram",      "void", "lng",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_loc",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_prefix",         "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_prefix_uri_loc", "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_uri",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_qn_uri_loc",        "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_rid_kind",          "void", "chr",          744652800,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_rid_level",         "void", "chr",          744652800,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_rid_nid",           "void", "oid",          744652800,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_rid_prop",          "void", "oid",          744652800,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000001_rid_size",          "void", "int",          744652800,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_attr_own",          "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_attr_prop",         "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_attr_qn",           "void", "oid",          754314985,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_frag_root",         "oid",  "oid",          1,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000002_map_pid",           "void", "void",         43178,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_nid_rid",           "void", "void",         707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_prop_com",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_prop_ins",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_prop_text",         "void", "str",          3,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_prop_tgt",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_prop_val",          "void", "str",          119784146,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_histogram",      "void", "lng",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_loc",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_nid",            "oid",  "oid",          273809205,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000002_qn_prefix",         "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_prefix_uri_loc", "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_uri",            "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_qn_uri_loc",        "void", "str",          19,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_rid_kind",          "void", "chr",          707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_rid_level",         "void", "chr",          707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_rid_nid",           "void", "void",         707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_rid_prop",          "void", "oid",          707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_rid_size",          "void", "int",          707424150,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000002_vx_hsh_nid",        "int",  "oid",          407415839,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000003_attr_own",          "void", "oid",          5870236,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_attr_prop",         "void", "oid",          5870236,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_attr_qn",           "void", "oid",          5870236,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_frag_root",         "oid",  "oid",          1,
>   0,      "clean",        "disk", "pers", 0,      1               ]
> [ "1000000003_map_pid",           "void", "oid",          351,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_nid_rid",           "void", "oid",          5455263,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_prop_com",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_prop_ins",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_prop_text",         "void", "str",          3,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_prop_tgt",          "void", "str",          0,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_prop_val",          "void", "str",          1814631,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_histogram",      "void", "lng",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_loc",            "void", "str",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_prefix",         "void", "str",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_prefix_uri_loc", "void", "str",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_uri",            "void", "str",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_qn_uri_loc",        "void", "str",          16,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_rid_kind",          "void", "chr",          5750784,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_rid_level",         "void", "chr",          5750784,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_rid_nid",           "void", "oid",          5750784,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_rid_prop",          "void", "oid",          5750784,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "1000000003_rid_size",          "void", "int",          5750784,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "collection_name",              "oid",  "str",          4,
>   0,      "clean",        "load", "pers", 0,      2               ]
> [ "collection_size",              "oid",  "lng",          4,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "doc_collection",               "oid",  "oid",          6,
>   0,      "clean",        "load", "pers", 0,      2               ]
> [ "doc_location",                 "oid",  "str",          4,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "doc_name",                     "oid",  "str",          4,
>   0,      "clean",        "load", "pers", 0,      2               ]
> [ "doc_timestamp",                "oid",  "timestamp",    4,
>   0,      "clean",        "disk", "pers", 0,      2               ]
> [ "uri_lifetime",                 "str",  "lng",          1,
>   0,      "clean",        "load", "pers", 0,      2               ]
> [ "xquery_catalog",               "int",  "str",          86,
>   0,      "clean",        "load", "pers", 1,      1               ]
> [ "xquery_seqs",                  "int",  "lng",          1,
>   0,      "clean",        "load", "pers", 1,      2               ]
> [ "xquery_snapshots",             "int",  "int",          0,
>   0,      "clean",        "load", "pers", 1,      2               ]
> 
> 
> Yours Sincerely,
> 
> Stefan de Konink
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.7 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHDdyGYH1+F2Rqwn0RCkg5AJwNn3V6AoWnIpbB30lJLyzSnTnFtQCgkPON
> GWp/yNGobkgpBQCR7SIb14Y=
> =MqLA
> -----END PGP SIGNATURE-----





More information about the developers-list mailing list