From MonetDB
Jump to navigationJump to search

Renovation short list[edit]

In February 2011 the core development team decided to freeze the MonetDB 4/Pathfinder/XQuery track and to distangle the two software strands.

This major change in release management triggered a reconsideration of the remaining MonetDB/SQL stack and the >17 years growth of the software. Many major and minor tweaks had been added, that made the code in the kernel hard to extend/maintain/improve.

The decision was taken to go after a concerted effort to perform a major code clean up and to realise some of the items on our list for a long time. Most notably are the propagation of the pure columnar approach taken in the SQL front-end all the way down to MAL and the kernel. It was expected to lead to a more compact code base. It will affect the functionality offered to users of MAL.

The second major step was to abandon mostly the Mx package, which helped in generating the massive slightly differing implementations of the kernel routines.

The task touches all the code:

module files Kloc new Kloc
gdk 33 36
monetdb5 202 152 128
sql 149 79
clients 117 35K

Experiences gathered in the first pass shows that roughly one-day per file is needed to convert it completely to.

The activities envisioned are organized by priority:

  • A must do in first round to reach functional stable point
    • Move to a pure columnar kernel MonetDB:Headless MonetDB:MAL
      • gdk primitives all work on columns MonetDB:Template MonetDB:GDK
      • join signature changes to [oid,any][oid,any]
      • only use the semantics of u*select and friends
    • remove set operations, semantic properties
    • BATs of BATs are to be removed MonetDB:MAL
    • remove integer after oid literal marker(@)
    • remove BUN type
    • Revive ASSERT defense code lines (BATcheck)
    • remove --enable-oid32
  • B must do to finalize the code cleanup process
    • remove BATloop, BUNhloc, -tloc...and friends
    • always loop over C-arrays
    • remove wrd type
    • BAT vs BATiterator
    • Exploit varheap cardinality knowledge
    • split BATpropcheck
    • min-max properties
    • add properties: set linear, trivial, all
    • distinct count property
    • improved exception handling in GDK
    • BAT descriptor defines
    • property setting template/macro
    • Exception return handling /longjump?
  • C nice to have as soon as possible'
    • Heaps : private vs shared to improve mmap reuse
    • BBPtrim removed/reconsidered move to upperlayer
    • ???NIL representation
    • BAT descriptor recycling
    • Mx is useful for template generation, can we live without
    • environment control voor databases/dbfarm/sessions...
    • OID compression to n-wide
    • constante columns seqbase(start,incr)
    • hash-revisited using variable width OID knowledge
    • hash-persistency for special cases
    • aggregation is a dual issue, non-reducing/reducing group by
    • Testing and tests
    • Arbitrary grouping of columns based on shared knowledge (e.g. alignment)