question: efficient use of partitioned tables
sandorl at gmail.com
Mon May 26 01:17:29 CEST 2014
This might be less of MonetDB-specific question about features than about
RDBMS (or just SQL) in general, but I would still appreciate your advice
in these matters.
I have data with a few "cross-sectional" tables (e.g. the parents of an
individual) and many longitudinal/panel tables in annual chunks (e.g. tax
records for each year separately).
I wonder if it is a good idea to merge/union the annual tables into long
The benefits seems to be a much cleaner structure for the consistent
columns, correspondingly easier code to write, fewer loops, tables names,
lines of code. Also, fast joins on individual IDs and years across the
tables without intermediary steps.
However, the downsides are that not all columns are consistent across
years, and there are simply some that appear only in certain years. Storing
missing values for other years for these variables is a waste of disk space
and memory. The merged tables could be too big anyway — if it matters in
MonetDB (think of tables ~70 GB in some cases, while my server has 128 GB
RAM), while many use cases would focus on a subset of years anyway.
Do you have any thoughts about this? Maybe more concrete questions would be:
— How cumbersome would it be (for a novice!) to code joining data from the
annual tables separately, for multiple series of tables? (E.g. union all
tax returns for 1997-2008, then join all residential data for these 12
years year-by-year, then employment records year-by-year etc.)
— How much slower would this *run* than simply selecting years from the
— Shall I be worried about the longitudinal tables having the size on the
same order of magnitude as my physical memory, or MonetDB handles it
equally efficiently (it would need to access many of the small annual
tables anyway, that also takes memory).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users-list