[Monetdb-developers] Fixing the update issue for Large XML documents

Stefan de Konink skinkie at xs4all.nl
Thu Oct 11 17:45:30 CEST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Peter,


Peter Boncz schreef:
> In case I find that there is something wrong here, a bug should be opened at
> sourceforge. That is in http://sf.net/projects/monetdb, choosing Tracker ->
> Bugs.

Was already done. But also with the 'small' amount of information.
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468


> I am currently importing the dataset -- the shredding is already taking an hour.
> This is due to many hash collisions on the attributes that have a purely numeric
> value. Your document has no text nodes, but lots of those attributes. I already
> noted that the MonetDB hash function is very fragile in this domain. 

Without the proper GB's of RAM, this took me at least 6 hours. The total
amount was around 40GB in read only mode.


> I think I will open a bug report on that one. Bad thing is that fixing it will
> alter our binary repository format. But that has been done before.

I'm sorry for breaking your binary format ;)


> Will keep you posted about my progress in reproducing your remap error. I am
> trying with 64-bits compilation and 64-bits oids, so there should be no
> scalability problems.. Thing is I will be using fedora core, not gentoo.

I never believed in distribution differences. If the hashing is fragile
I presume this should have general priority. Finding out what the
breaking point is, would be interesting too, if your machine is slower
than mine I could offer a virtual machine with Fedora Core.


Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHDkUZYH1+F2Rqwn0RCgARAJ99ImeLEC5mhY49whPN1pbgMkf6xQCgg8h/
Wz6uyuG9AXnSKilfwX0PrvU=
=jnFE
-----END PGP SIGNATURE-----




More information about the developers-list mailing list