The MonetDB team at CWI/MonetDB BV is pleased to announce the
Apr2019 feature release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<https://www.monetdb.org/>.
For details on this release, please see the release notes at
<https://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <https://dev.monetdb.org/downloads/>.
Apr 2019 feature release (11.33.3)
MonetDB5 Server
* Changed the way blobs are ordered. Before, shorter blobs came
before longer, now the contents is compared first.
* Removed function blob.tostring() since it cannot guarantee that the
resulting string is properly encoded in UTF-8.
* Removed the type "sqlblob", changed the external format of the type
"blob" to be like what "sqlblob" used to be. In other words, the
"blob" type is now SQL-compatible.
* The MAL functions algebra.sort and algebra.firstn now have a new,
extra second-to-last argument, nilslast:bit, which indicates where
NIL values are to be sorted: at the beginning, or at the end. The
old behavior is when for algebra.sort, the value of nilslast is
equal to the value or the reverse:bit parameter, for algebra.firstn
when the value of nilslast is opposite to the value of the asc:bit
parameter.
Build Environment
* We now use the C11 atomic operations if available, although
libatomic_ops is still used if present and not disabled on the
configure command line.
* We now also build the Python 3 integration packages for Fedora,
Debian, Ubuntu, and Windows (not for Debian 8, Ubuntu 14.04, or
EPEL 6 and 7 which are too old).
SELinux Support
* There is one tunable parameter, mserver5_can_read_home, which can
be set using "setsebool -P mserver5_can_read_home=true" to allow an
mserver5 process started by monetdbd under the control of systemd
to read files in users' home directories.
Merovingian
* Added a new database property named `profilerbeatfreq`. Its value,
<freq>, should be an integer and, if it is set the command
`profilerstart` will call stethoscope with this value as the
argument to -b. This will instruct stethoscope to collect heartbeat
events from the MonetDB server every <freq> milliseconds. Please
note that no type checking is done, and if <freq> is not an
integer, stethoscope will silently ignore it.
Client Package
* The mclient program can now be quit by typing quit or exit on the
command-line. This makes it easier for novice users to quit the
program if they do not know that the standard command to quit is
\q.
Mapi Library
* Removed function mapi_cache_shuffle.
* Removed function mapi_stream_query.
Stream Library
* The functions mnstr_write{Sht,Int,Lng,Hge} and their Array variants
now swap bytes of the written values if the stream is set to swap
bytes (i.e. big-endian on a little-endian machine and v.v.).
* Removed defines ST_ASCII, ST_BIN, ST_READ, ST_WRITE from stream.h.
* Changed function mnstr_set_byteorder(stream, char) to
mnstr_set_bigendian(stream, bool) where the second argument should
be `true' for specifying that the stream is bigendian. This sets an
internal flag whether or not to swap bytes which can be retrieved
with mnstr_get_swapbytes(stream).
* Changed function `int mnstr_byteorder(stream)' to `bool
mnstr_get_swapbytes(stream)'; it now returns whether bytes are
swapped.
* Changed function `int mnstr_type(stream)' to `bool
mnstr_isbinary(stream)'.
MonetDB Common
* During processing, we now try to maintain the smallest and largest
values that occur in a BAT. This information is not saved across
server restarts.
* Added an extra argument of type `bool' to the atom to string and
string to atom methods to indicate whether the string is for
internal use (casting the value to or from a string) of external
use (to be printed).
* Implemented a nilslast option for BATfirstn. If set, NILs come last
in the ordering that BATfirstn simulates, so non-NIL values are
preferentially returned. The old behavior can be obtained by
setting nilslast to !asc(ending).
* Implemented a nilslast option for BATsort. This option should be
equal to the reverse option for stable sort (it is not implemented
for stable sort), but can be different from reverse for non-stable
sort. The functions BATsort and GDKqsort have extra parameters, the
function GDKqsort_rev has been removed (superseded by GDKqsort with
the new `reverse' parameter).
* The BUNtail, BUNtvar, BUNtloc, and BUNtpos macros (and Tloc and
Tpos) now return a `void *' instead of a `char *'.
* Function PROPdestroy now takes a BAT* parameter.
* A whole bunch of functions now take a bool argument instead of an
int argument. Some functions now return a bool instead of an int.
In all these cases, the int was used as a Boolean value.
SQL Frontend
* Implemented two-argument least/greatest functions. If one argument
is NULL, the functions return the other value, otherwise they
return the least/greatest of the two arguments.
* Implemented SQL2003 natural logarithm function: ln(num_expr).
* Function octet_length now also accepts a BLOB argument and returns
the length of the BLOB (i.e. the same as length when called with a
BLOB argument).
* Implemented length(blob) and comparison between blobs.
* Implemented a two argument function log which calculates the
logarithm of the first argument using the second argument as base.
* Next to the functions log (natural logarithm) and log10 (base 10
logarithm), there is now also log2 (base 2 logarithm).
* Removed unused global SQL variable "history".
* Allow usage of custom expressions in GROUP BY and PARTITION BY
clauses. The same expressions can be used in the projection phase
as long as it is exactly the same used in the grouping clause
(textual lookup is performed). As an example the query: SELECT
col1*2 FROM t1 GROUP BY col1*2; is correct, while SELECT
sin(col1+5) FROM t1 GROUP BY col1*2; is wrong.
* Allow usage of WITH clauses in insert, update, delete and merge
statements.
* Implemented merge statements from SQL:2003 standard. Using a source
relation R, a target table T is updating depending on the result of
the merge. In case of a match, the table T's row is either updated
or deleted with R's row. In a non-match case, R's row is inserted
into T. The grammar is the follows: > MERGE INTO target [ [AS]
ident ] USING source ON search_condition { WHEN MATCHED [ AND
search_condition ] THEN { UPDATE SET assignment_list | DELETE } } |
{ WHEN NOT MATCHED [ AND search_condition ] THEN INSERT [
column_list ] [ { DEFAULT VALUES | VALUES row_values } ] }
* Added possibility to change the schema of a table with the syntax:
> ALTER TABLE [ IF EXISTS ] qname SET SCHEMA ident
* Added optional alias option for target table in update and delete
statements: (e.g. UPDATE target AS alias SET col1=1 WHERE
alias.col2 < 5)
* Improved and extended storagemodel functionality. Changes include:
* Added views sys."tablestorage" and sys."schemastorage" for easy and
quick assesment of aggregated storage usage per table or per
schema.
* Excluded listing system tables in views sys."storage",
sys."tablestorage" and sys."schemastorage". You can still use
sys."storage"() to query storage of system tables and columns.
* In procedure sys.storagemodelinit() when populating table
sys.storagemodelinput it now excludes system tables as system
tables are not useful to be modeled for storagesize by application
users. It now also computes and populates the atomwidth column more
correctly for variable size data types (strings, blobs) when the
table has representative data in those columns.
* System function sys.columnsize() has been corrected for types:
tinyint, real, date, time, timetz, sec_interval, month_interval,
decimal, uuid, mbr, char, clob, json, url, blob, geometry and
geometrya. For variable size data types (varchar, char, clob, json,
url, blob, geometry, geometrya) it now returns the columnsize
excluding the variable heapsize. The heapsize is retrievable via
sys.heapsize().
* System function sys.heapsize() has been corrected for all variable
size data types: varchar, char, clob, json, url, blob, geometry and
geometrya.
* System function sys.imprintsize() has been corrected. The
imprintsize depends on the width (1 or 2 or 4 or 8 or 16 bytes) of
the data type. Also instead of 12% it now uses 20%, which is a
better estimate.
* System function sys.storagemodel() has been removed as it outputs
the same data as view sys.storagemodel. Use view sys.storagemodel
instead.
* Corrected views sys.storagemodel and sys.tablestoragemodel by
returning a computed orderidxsize when the count has been changed
in the sys.storagemodelinput table. Also the views now return data
ordered by schema, table and column names.
* Extended view sys.tablestoragemodel with column: "storages".
Besides columns also keys (primary, foreign and unique keys) and
indexes (ordered, imprints) use storage, so the "storages" count
can be higher than the number of columns per table.
* Corrected the data type of columns "schema", "table", "column",
"type", "mode" and location in table sys.storagemodelinput and
functions sys."storage"(), sys."storage"(sname),
sys."storage"(sname, tname) and sys."storage"(sname, tname, cname)
from string into varchar(1024). Consequently also the views based
on the table or functions will be reporting varchar(1024) as column
meta data instead of clob for those columns. This allows faster
querying and reporting by generic SQL programs which treat clob
querying different from varchar columns.
* Extended windowing functions catalog with SQL standard semantics.
Standard aggregation functions can now be used in windowing
functions: (avg,max,min,sum,prod,count). Other windowing specific
functions were also implemented: (percent_rank,cume_dist,ntile,
lag,lead,first_value,last_value,nth_value).
* The standard frame specification was implemented for aggregation
functions as well as first_value, last_value and nth_value
functions. The available frames are rows, range and groups. Bounds
can be unbounded (partition limit), the current row, a fixed number
of rows (constant), or variable (column as input). (e.g SELECT
COUNT(col1) OVER (PARTITION BY col2 ORDER BY col3 RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM t1;).
* Added WINDOW keyword which is optionally provided after the FROM
clause, with window specifications used in the projection (e.g
SELECT SUM(col1) OVER w1, LAST_VALUE(col2) OVER w2 FROM t1 WINDOW
w1 AS (ROWS BETWEEN 5 PRECEDING AND 0 FOLLOWING), w2 AS (w1);).
* Our previous partitioning implementation didn’t impose order in the
input. With this reexamination, partitioning now imposes ascending
order by default, thus pairing with the industry standard
implementation.
* Implemented X'...' style binary string literals.
* Implemented U&'...' Unicode character string literals and U&"..."
Unicode delimited identifiers, including UESCAPE. For the string
literals, you can have U&'...' '...' '...' UESCAPE '...' where the
escape must be as single character and the other '...' strings are
also Unicode character string literals. For now, these latter
strings also undergo C-style backslash interpretation.
* Implemented PostgreSQL-like E'...' strings. The strings can contain
C-style backslash escapes. The old format strings '...' currently
still also accept C-style escapes, but that feature will be removed
in a future release.
* Implemented the NULLS FIRST and NULLS LAST option to ORDER BY. The
default is NULLS FIRST for ASC(ending) and NULLS LAST for
DESC(ending).
* Added possibility to rename SQL schemas, tables and columns with
ALTER statements: > ALTER SCHEMA [ IF EXISTS ] ident RENAME TO
ident > ALTER TABLE [ IF EXISTS ] qname RENAME TO ident > ALTER
TABLE [ IF EXISTS ] qname RENAME [ COLUMN ] ident TO ident Also
added optional IF EXISTS clause to other existing ALTER TABLE
statements.
* The sql SAMPLE syntax is extended to include an optional integral
SEED parameter that sets the seed for the internal random number
generator in the sample algorithm. When the seed is set, the user
can obtain a reproducible sample set from a static dataset with the
same sample query. The new syntax is SELECT ... FROM ... WHERE ...
SAMPLE <expr> [SEED <integer>]
* Implemented a version of COPY INTO that reads/writes files from/to
the client instead of doing it in the server. This has the
advantage that COPY INTO is then no longer restricted to only the
"super user" monetdb, nor only to absolute file names. The syntax
to have the server communicate with the client for file content is
COPY INTO table FROM file ON CLIENT ...; and COPY query INTO file
ON CLIENT ...;. This also works for COPY BINARY INTO. There is also
the possibility to specify that files are to be read/written by the
server by using ON SERVER. This is also the default when ON CLIENT
or ON SERVER is not specified.
* The sys.functions table now has an extra column called "system" of
type BOOLEAN whose value is TRUE for system functions (i.e.
functions that should not be dumped). The table sys.systemfunctions
has been changed to a view and is now officially deprecated.
* Extended merge tables with partitioning using a predicate: > CREATE
MERGE TABLE [ IF NOT EXISTS ] table_name (... columns ...) [
PARTITION BY { RANGE | VALUES } { ON '(' column_name ')' | USING
'(' expression ')' } ] The partitioning can occur by range or list
of values using one of the table's columns or an expression. The
domain of the partitioning scheme will be covered by each child
table added with alter statements depending if the table is
partitioned by range or list of values: > ALTER TABLE
merge_table_name { ADD | SET } TABLE child_table_name AS PARTITION
IN '(' expression [ ',' ... ] ')' [ WITH NULL ] > ALTER TABLE
merge_table_name { ADD | SET } TABLE child_table_name AS PARTITION
BETWEEN { RANGE MINVALUE | expression } AND { RANGE MAXVALUE |
expression } [ WITH NULL ] > ALTER TABLE merge_table_name { ADD |
SET } TABLE child_table_name AS PARTITION WITH NULL When updating a
single partition range/list of values, the SET keyword should be
used instead of ADD. INSERT, UPDATE and DELETE statements are
possible on partitioned tables with corresponding validation on
each partition domain. The system tables sys.table_partitions,
sys.range_partitions and sys.value_partitions were added to store
meta-information about each existing partition.
Bug Fixes
* 2403: stream: http read support in streams
* 2416: GDK: file/dir creation mask
* 2478: SQL: rename columns
* 2496: SQL: implement greatest / least
* 3384: Auxiliary window functions
* 3416: log() SQL function only supports natural logarithm
* 3448: Make lbatomic_ops optional if installed
* 3520: Add support for scalar function LN(num)
* 3530: sigabort on window aggr function
* 3567: Add support for: ALTER TABLE [schema1.]oldtablename RENAME
[TO] [schema2.]newtablename
* 3743: DELETE FROM merge table not supported.
* 3832: Cannot use expressions in GROUP BY clause
* 3946: Expose C log2 function to SQL
* 4056: mapi_timeout is not implemented
* 6181: comments cause error with return in function
* 6326: Eliminate unused UNION tables
* 6347: Mserver returns with "memory exhausted in" error if query
size exceeds 2.5MB
* 6475: Remove unused SQL global variable "history"
* 6588: Table aliasing are not supported in SQL update queries
* 6591: Name mangling in the SQL catalog results in ambiguity
* 6608: Sqlitelogictest error message in group by with coalesce
* 6609: Rename Schema in MonetDB
* 6636: sys.queue() extension: allow DBA to see all running queries
* 6647: Add suport to Python 3 on Windows
* 6669: COPY [xxx RECORDS] INTO foo FROM STDIN ... doesn't work
without specifying nr of to be copied records
* 6671: Error when running user function on merge table with remote
part
* 6674: R UDF with Date type fails to convert RAPI
* 6676: Max data length of 2048 for column sys._tables.query is too
small for the actual data (2811 for view sys.ids)
* 6678: Binding NULL parameter to parametrized query results in
syntax error on execution
* 6680: Copy cannot open CSV file if systemctl enabled
* 6684: Inserting into a replica table crashes the server
* 6685: adding a view to a merge table gives unexpected behaviour
* 6690: Unable to fetch empty string with ODBC driver
* 6696: Re-use of MAL explain plans with complex type coercions
Hi Dan,
On 12-09-19 11:01, developers-list-request(a)monetdb.org wrote:
> Send developers-list mailing list submissions to
> developers-list(a)monetdb.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://www.monetdb.org/mailman/listinfo/developers-list
> or, via email, send a message with subject or body 'help' to
> developers-list-request(a)monetdb.org
>
> You can reach the person managing the list at
> developers-list-owner(a)monetdb.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of developers-list digest..."
>
>
> Today's Topics:
>
> 1. Re: ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
> 2. Re: ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 11 Sep 2019 13:53:27 +0300
> From: Daniel Zvinca <daniel.zvinca(a)logbis.com>
> To: "Communication channel for developers of the MonetDB suite."
> <developers-list(a)monetdb.org>
> Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE
> Message-ID:
> <CALxHsPEdkfOUKjy-xV5GDYHFry=SAst5qzZM5rk8JvFHqk1FNQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Thank you so much for your answer, Aris.
>
> The good news is that compression is considered and is going to be part of
> MonetDB in two releases or so (one year).
First release is probably much closer to half a year.
>
> I will try to see if I can use the current code in a custom build, I am
> quite curious how that will affect performance.
> To be honest I don't expect performance issues, nowadays even on SSD era,
> it is still faster to read compressed data and decompress in memory if the
> right ratio is there, of course. And I do expect decent compression ratio
> on most of the data.
The idea is that between every memory fetch you have quite some CPU
cycles at your disposal which can be used to perform either
(potentially) more expensive algebraic operations directly on compressed
data or to just decompress data. This is the general principle that
implies that at least lightweight compression can be used to accelerate
queries.
>
> However, at this stage the MOSAIC's dual compressed - uncompressed
> storage will obviously not give me the gain I need. Yet, it is interesting
> to understand at least how query performance might look in the future.
Again this feature is still under discussion. But I don't think it is
unlikely that it will enter into Mosaic.
>
> Bad news is of course the compression feature is going to happen in ... one
> year. But if it comes also with support for compressing in memory results
> (I know, it wasnt promised), it might worth to wait.
We're working it ☺. More remarks below on your other email...
>
> Best regards,
>
>
>
>
>
>
> On Wed, Sep 11, 2019 at 12:18 PM aris <aris.koning(a)monetdbsolutions.com>
> wrote:
>
>>
>>
>> -------- Forwarded Message --------
>> Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE
>> Date: Tue, 10 Sep 2019 15:15:09 +0200
>> From: aris <aris.koning(a)monetdbsolutions.com>
>> <aris.koning(a)monetdbsolutions.com>
>> To: developers-list-request(a)monetdb.org
>>
>> Hi Daniel,
>>
>> On 10-09-19 12:00, developers-list-request(a)monetdb.org wrote:
>>
>> Send developers-list mailing list submissions to
>> developers-list(a)monetdb.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://www.monetdb.org/mailman/listinfo/developers-list
>> or, via email, send a message with subject or body 'help' to
>> developers-list-request(a)monetdb.org
>>
>> You can reach the person managing the list at
>> developers-list-owner(a)monetdb.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of developers-list digest..."
>>
>>
>> Today's Topics:
>>
>> 1. ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 10 Sep 2019 12:40:22 +0300
>> From: Daniel Zvinca <daniel.zvinca(a)logbis.com> <daniel.zvinca(a)logbis.com>
>> To: developers-list(a)monetdb.org
>> Subject: ALTER TABLE ALTER COLUMN SET STORAGE
>> Message-ID:
>> <CALxHsPHEEExozCgdp_srxQm2tFtFOqGOGNqBTw2UhTBwWZu93g(a)mail.gmail.com>
>> <CALxHsPHEEExozCgdp_srxQm2tFtFOqGOGNqBTw2UhTBwWZu93g(a)mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hello,
>>
>> I am interested to find out more about ALTER TABLE ALTER COLUMN SET STORAGE
>> feature and how is that related to compression.
>>
>> As far as I understood this is related to an active development branch,
>> called MOSAIC which was never merged with any of the previous MonetDB
>> versions. Obviously, compression is an important feature columnar databases
>> are providing for data storage and manipulation. A module like MOSAIC that
>> seems to allow several compression techniques, would be an interesting
>> option.
>>
>> Yes, compression a.k.a. Mosaic is a going to be a new feature in MonetDB.
>> Although the feature won't be included in the upcoming November release.
>> Most likely, you can expect the feature in the first release after the
>> November release. But Mosaic is a somewhat big undertaking. our current
>> road map is probably covering multiple future MonetDB releases before all
>> envisioned compression features are available in MonetDB. The first
>> milestone in the current road map is to apply a single compression
>> technique on an entire column. But one of the next milestone is to
>> partition a column into variable-sized compression blocks. Within each
>> block a particular compression is applied.
>>
>>
>> First question I have: Can MOSAIC extension be used (sources added and
>> custom compiled) with success for any of its proposed codecs with any of
>> the newest versions (Apr2019 +). I mean without affecting any of embedded,
>> capi, rapi and pyapi modules, which all exchange data with external
>> libraries.
>>
>> If by this you mean you want to import the mosaic module as an external
>> library into an existing release out of the box, then the answer is no.
>> There are some slight modifications in the GDK layer to accommodate the
>> Mosaic module. And to interact with it from SQL, there are also some code
>> changes in the SQL layer. But besides those dependencies, I don't expect
>> any issue with the particular (x)api frameworks. But nothing is guaranteed
>> obviously. It sounds like you want to hack-back port it into custom builds
>> of earlier releases. I wouldn't give it a zero change of success but I do
>> wish you much luck :)
>>
>>
>> A quick read of MOSAIC code made me understand that this compression can be
>> applied only on readonly PERSISTENT columns. That means that I would loose
>> the major benefit of compression that I mostly need during importing stage.
>> Sure I can imagine a controlled batching import scheme that would append
>> data to tables and when it reaches certain threshold table is made
>> readonly, then compressed, then added to a merged table, but this looks
>> quite of a scenario. Am I wrong, can MOSAIC be used in a different
>> scenario?
>>
>> Your observation about the joint life cycle of a Mosaic structure and its
>> original column file is correct: currently Mosaic adds a compressed
>> representation next to the existing uncompressed column. For the first
>> milestone on the Mosaic road map we want to successfully apply compression
>> on READ-ONLY pre-existing columns where the purpose of compression is to
>> potentially accelerate analytical queries on these columns. However we are
>> still looking into potentially freeing the uncompressed column once a
>> compressed Mosaic heap is available. This would accommodate compression for
>> the more traditional sake of limiting memory- and/or disk footprint.
>>
>>
>> I can understand reasons behind compressing only PERSISTENT bats, yet I am
>> wondering if TRANSIENT bats can also benefit from it especially for 1.
>> result building stage (server-client or embedded version) or 2. for remote
>> connections when data is transferred for merging operations.
>>
>> Regarding to above question, are there any chances that you would consider
>> keeping compressed results in memory? Sure I can use instead disk
>> temporary tables for subsequent manipulation, but for performance reasons
>> in memory compressed results would be way faster. Actually, when embedded
>> version provides a result set, it stays valid till the user releases it,
>> why not to be able to also use that for possible subsequent SQL operations
>> that do not fit into a CTE scenario. That would provide superior
>> flexibility and memory management to CTE mechanism. Temporary results can
>> be developed in steps, they can be accessed directly at any time as
>> convenient as temporary views in CTE, but without the burden of possible
>> temporary bats that are not released till one CTE ends.
>>
>> I think it is an interesting idea. But I think it is part of a more
>> general goal/problem of how to handle updates on compressed data. There are
>> internal discussions on this topic. But whatever the outcome, this will be
>> only relevant for a much later milestone on the road map.
>>
>>
>>
>> Thank you,
>> Dan
>>
>>
>> Hope it helps.
>>
>> Kind regards,
>>
>> Aris
>>
>>
>>
-------- Forwarded Message --------
Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE
Date: Tue, 10 Sep 2019 15:15:09 +0200
From: aris <aris.koning(a)monetdbsolutions.com>
To: developers-list-request(a)monetdb.org
Hi Daniel,
On 10-09-19 12:00, developers-list-request(a)monetdb.org wrote:
> Send developers-list mailing list submissions to
> developers-list(a)monetdb.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://www.monetdb.org/mailman/listinfo/developers-list
> or, via email, send a message with subject or body 'help' to
> developers-list-request(a)monetdb.org
>
> You can reach the person managing the list at
> developers-list-owner(a)monetdb.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of developers-list digest..."
>
>
> Today's Topics:
>
> 1. ALTER TABLE ALTER COLUMN SET STORAGE (Daniel Zvinca)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 10 Sep 2019 12:40:22 +0300
> From: Daniel Zvinca <daniel.zvinca(a)logbis.com>
> To: developers-list(a)monetdb.org
> Subject: ALTER TABLE ALTER COLUMN SET STORAGE
> Message-ID:
> <CALxHsPHEEExozCgdp_srxQm2tFtFOqGOGNqBTw2UhTBwWZu93g(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I am interested to find out more about ALTER TABLE ALTER COLUMN SET
> STORAGE
> feature and how is that related to compression.
>
> As far as I understood this is related to an active development branch,
> called MOSAIC which was never merged with any of the previous MonetDB
> versions. Obviously, compression is an important feature columnar
> databases
> are providing for data storage and manipulation. A module like MOSAIC that
> seems to allow several compression techniques, would be an interesting
> option.
Yes, compression a.k.a. Mosaic is a going to be a new feature in
MonetDB. Although the feature won't be included in the upcoming November
release. Most likely, you can expect the feature in the first release
after the November release. But Mosaic is a somewhat big undertaking.
our current road map is probably covering multiple future MonetDB
releases before all envisioned compression features are available in
MonetDB. The first milestone in the current road map is to apply a
single compression technique on an entire column. But one of the next
milestone is to partition a column into variable-sized compression
blocks. Within each block a particular compression is applied.
>
> First question I have: Can MOSAIC extension be used (sources added and
> custom compiled) with success for any of its proposed codecs with any of
> the newest versions (Apr2019 +). I mean without affecting any of embedded,
> capi, rapi and pyapi modules, which all exchange data with external
> libraries.
If by this you mean you want to import the mosaic module as an external
library into an existing release out of the box, then the answer is no.
There are some slight modifications in the GDK layer to accommodate the
Mosaic module. And to interact with it from SQL, there are also some
code changes in the SQL layer. But besides those dependencies, I don't
expect any issue with the particular (x)api frameworks. But nothing is
guaranteed obviously. It sounds like you want to hack-back port it into
custom builds of earlier releases. I wouldn't give it a zero change of
success but I do wish you much luck :)
>
> A quick read of MOSAIC code made me understand that this compression
> can be
> applied only on readonly PERSISTENT columns. That means that I would loose
> the major benefit of compression that I mostly need during importing
> stage.
> Sure I can imagine a controlled batching import scheme that would append
> data to tables and when it reaches certain threshold table is made
> readonly, then compressed, then added to a merged table, but this looks
> quite of a scenario. Am I wrong, can MOSAIC be used in a different
> scenario?
Your observation about the joint life cycle of a Mosaic structure and
its original column file is correct: currently Mosaic adds a compressed
representation next to the existing uncompressed column. For the first
milestone on the Mosaic road map we want to successfully apply
compression on READ-ONLY pre-existing columns where the purpose of
compression is to potentially accelerate analytical queries on these
columns. However we are still looking into potentially freeing the
uncompressed column once a compressed Mosaic heap is available. This
would accommodate compression for the more traditional sake of limiting
memory- and/or disk footprint.
>
> I can understand reasons behind compressing only PERSISTENT bats, yet I am
> wondering if TRANSIENT bats can also benefit from it especially for 1.
> result building stage (server-client or embedded version) or 2. for remote
> connections when data is transferred for merging operations.
>
> Regarding to above question, are there any chances that you would consider
> keeping compressed results in memory? Sure I can use instead disk
> temporary tables for subsequent manipulation, but for performance reasons
> in memory compressed results would be way faster. Actually, when embedded
> version provides a result set, it stays valid till the user releases it,
> why not to be able to also use that for possible subsequent SQL operations
> that do not fit into a CTE scenario. That would provide superior
> flexibility and memory management to CTE mechanism. Temporary results can
> be developed in steps, they can be accessed directly at any time as
> convenient as temporary views in CTE, but without the burden of possible
> temporary bats that are not released till one CTE ends.
I think it is an interesting idea. But I think it is part of a more
general goal/problem of how to handle updates on compressed data. There
are internal discussions on this topic. But whatever the outcome, this
will be only relevant for a much later milestone on the road map.
>
>
> Thank you,
> Dan
Hope it helps.
Kind regards,
Aris
>
Hello,
I am interested to find out more about ALTER TABLE ALTER COLUMN SET STORAGE
feature and how is that related to compression.
As far as I understood this is related to an active development branch,
called MOSAIC which was never merged with any of the previous MonetDB
versions. Obviously, compression is an important feature columnar databases
are providing for data storage and manipulation. A module like MOSAIC that
seems to allow several compression techniques, would be an interesting
option.
First question I have: Can MOSAIC extension be used (sources added and
custom compiled) with success for any of its proposed codecs with any of
the newest versions (Apr2019 +). I mean without affecting any of embedded,
capi, rapi and pyapi modules, which all exchange data with external
libraries.
A quick read of MOSAIC code made me understand that this compression can be
applied only on readonly PERSISTENT columns. That means that I would loose
the major benefit of compression that I mostly need during importing stage.
Sure I can imagine a controlled batching import scheme that would append
data to tables and when it reaches certain threshold table is made
readonly, then compressed, then added to a merged table, but this looks
quite of a scenario. Am I wrong, can MOSAIC be used in a different scenario?
I can understand reasons behind compressing only PERSISTENT bats, yet I am
wondering if TRANSIENT bats can also benefit from it especially for 1.
result building stage (server-client or embedded version) or 2. for remote
connections when data is transferred for merging operations.
Regarding to above question, are there any chances that you would consider
keeping compressed results in memory? Sure I can use instead disk
temporary tables for subsequent manipulation, but for performance reasons
in memory compressed results would be way faster. Actually, when embedded
version provides a result set, it stays valid till the user releases it,
why not to be able to also use that for possible subsequent SQL operations
that do not fit into a CTE scenario. That would provide superior
flexibility and memory management to CTE mechanism. Temporary results can
be developed in steps, they can be accessed directly at any time as
convenient as temporary views in CTE, but without the burden of possible
temporary bats that are not released till one CTE ends.
Thank you,
Dan
I have created a new stable branch for the next feature release. This
branch was created in order to stabilize the code before the feature
release.
From now on *only* bug fixes may go to this new stable branch. I
reserve the right to undo any changes that do not comply with this
rule. I also reserve the right to admit changes that are not strictly
bug fixes.
The name for the branch is Nov2019. So in each of your stable
checkout directories you can do
hg pull
hg update -rNov2019
to update to this new branch.
--
Sjoerd Mullender
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Apr2019-SP1 bugfix release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
<https://www.monetdb.org/>.
For details on this release, please see the release notes at
<https://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <https://dev.monetdb.org/downloads/>.
Apr 2019-SP1 bugfix release (11.33.11)
Build Environment
* Removed restriction on using combinations of --enable-assert,
--enable-debug, and --enable-optimize. --enable-debug adds a -g (or
-even -g3) option, --enable-debug=gdb adds a -ggdb3 flag (for GCC);
--enable-optimize adds a bunch of optimization flags;
--disable-debug (or --enable-debug=no) removes any -g flags;
--disable-optimize removes any -O flags; --enable-optimize=auto
leaves the optimization flags untouched.
MonetDB Common
* We now look at the limits imposed by cgroups and the setrlimit
system call to initialize some internal values related to how much
(virtual) memory we think is available.
SQL Frontend
* Many fixes for using the server in a highly concurrent setting with
many clients doing read-only queries mixed with clients doing
updating queries.
* Changed the internal representation of RANGE MINVALUE and RANGE
MAXVALUE in MERGE TABLE partitions. Before, the limits were
represented by the smallest and largest value of the domain of the
column, now these are represented by a NULL value. This has the
added benefit that if you use TO RANGE MAXVALUE, the largest value
of the domain is included in the partition (before it wasn't).
Also, these new limits now work for any ordered type, so including
VARCHAR.
Bug Fixes
* 6576: Sqlitelogictest aritmetic expressions with negative numbers
handling
* 6697: Duplicate expressions not eliminated with long CASE statement
* 6701: When changing the schema name of a table, referencing rows
from sys.columns, sys.keys and more tables are not removed
* 6703: SQL optimizer enters loop and goes into stack overflow
* 6706: prepare doesn't recognize merge statement
* 6712: Where clause with cast ignores sub-select
* 6713: COPY INTO FROM 'file.xz' does not work
* 6714: Assertion failure in rel_select.c for correlated subquery in
aggregation query (with group by and having and order by and limit)
* 6715: Assertion failure in rel_bin.c for MERGE INTO command
* 6716: COPY INTO does not load UTF8 encoded text
* 6718: Can't copy table into another table with constraints
(mkey.bulk_rotate_xor_hash)
* 6720: Compilation issues on gcc 9.1.1 [-Werror=stringop-truncation]
* 6721: Possibly incorrect call to pcre_exec
* 6725: Prepare statement on non-existing function crashes
* 6728: SELECT DISTINCT not removing duplicates
* 6729: Creating a table with duplicate column name should not be
possible
* 6730: sys.stop or sys.pause a INSERT query triggers "BATproject:
does not match always"
* 6736: Unexpected query result with merge tables and view after
upgrade
* 6738: issue with window functions and hugeint type coercion
* 6741: report no such table when execute “alter table tb1 rename to
tb2” according to using mapi
* 6744: CREATE SEQUENCE seq; fails
* 6748: MonetDB "forgets" columns after ALTER TABLE statements
* 6749: mserver5 restart aborts/segfaults after dropping column with
constraint
* 6751: ODBC driver(11.33.3) Seg Faults when "fn ucase" is used in
SQL