[Monetdb-developers] New serialization routine
rittinge at in.tum.de
Fri May 12 11:23:31 CEST 2006
In the current version serialization takes 10 seconds for a 10 MB file
(http://www-db.in.tum.de/~rittinge/files/mixed.xml) which contains only
slightly more nodes (517448) then the auction xml file of the same size
(472681). The difference however is the number of attribute nodes which
is much higher in mixed.xml (517447 vs. 38266).
While the serialization of doc("auction10MB.xml") takes 2.72 seconds,
doc("mixed.xml") requires 10.34 seconds!! Removing the attributes in
mixed.xml (by deleting all entries from the attr_own bat) speeds up the
serialization to complete in 4.6 seconds.
Some more calculations reveal that for the auction file we are able to
serialize 187 nodes/sec (nodes = elements+textnodes+attributes) while
for mixed.xml the serialization function (with and without attributes)
was only able to generate about 100 nodes/sec.
In all 3 cases the serialization seems way to slow... (at least for me).
As in principle the serialization is only a dump of the tables (even in
table order) and other approaches are really faster we should be able to
come up with an appropriate fast serialization routine.
I see at least three possibilities to increase the performance:
* serialize.mx uses about 50 function calls until one node without
attributes is serialized
* serialize.mx does collect the nodes using random access with a lot of
* serialize.mx uses stream_printf which is probably slower than a fixed
size print routine like stream_write.
In the following two weeks (before the feature freeze) I will try to
come up with a new prototypical print routine that should handle the
serialization more efficiently. This prototype will be a proof of
concept and probably will fall back to current routine whenever some
conditions are not fulfilled (e.g. wrong serialization type or multiple
Let's see wether we can get a faster serialization routine for the
average query result!
I will probably need help for some gdk internals, changes introduced by
the update facility (e.g. indirections), and for understanding parts of
the current serialize.mx. I hope for your support :-)
More information on the new serialization routine will be available at
the pathfinder wiki.
Technische Universität München (Germany)
More information about the developers-list