Jul2021 (11.41) (LTS)

The Jul2021 documentation can be found here.

LONG TERM SUPPORT VERSIONS

For more information on long term support releases visit MonetDB Solutions

Jul2021-SP15 (11.41.47)

Bug Fixes

7254: Commit with deletions is very slow

Jul2021-SP14 (11.41.45)

Bug Fixes

7501: files remain in backup causing problems at restart
7526: deadlock, causing new connections to hang indefinitely
7531: loading more than 2147483647 rows gives issue
7546: monetdbd leaks file descriptors when starting mserver5

Jul2021-SP13 (11.41.43)

MonetDB Common

Fixed a regression where bats weren’t always cleaned up when they weren’t needed anymore. In particular, after a DELETE FROM table query without a WHERE clause (which deletes all rows from the table), the bats for the table get replaced by new ones, and the old, now unused, bats weren’t removed from the database.

Jul2021-SP12 (11.41.41)

MonetDB Common

Introduced options wal_max_dropped, wal_max_file_age and wal_max_file_size that control the write-ahead log file rotation.
Fixed a (rare) race condition between copying a bat (COLcopy) and updates happening in parallel to that same bat. This may only be an actual problem with string bats, and then only in very particular circumstances.

Jul2021-SP11 (11.41.39)

Do a lot more error checking, mostly for allocation failures. More is still needed, though.

MonetDB Common

When saving the SQL catalog during a low-level commit, we should only save the part of the catalog that corresponds to the part of the write-ahead log that has been processed. What we did was save more, which resulted in the catalog containing references to tables and columns whose disk presence is otherwise only in the write-ahead log.
A bug was fixed where the administration of which bats were in use was interpreted incorrectly during startup, causing problems later. One symptom that has been observed was failure to startup with a message that the catalog tables could not be loaded.
When memory is tight, it could happen a bat was backed up so that it could be saved. It was then possible that after commit the backup was not removed so that after a restart the old backup was used instead of the committed copy. This was fixed.
Fixed a number of data races (race conditions).
Fixed a reference counting problem when a BAT could nog be loaded, e.g. because of resource limitations.
Only check for virtual memory limits when creating or growing bats, not for general memory allocations. There is (still) too much code that doesn’t properly handle failing allocations, so we need to avoid those as much as possible. This has mostly an effect if there are virtual memory size restrictions imposed by cgroups (memory.swap.max in cgroups v2, memory.memsw.limit_in_bytes in cgroups v1).
The low-level commit turned out to always commit every persistent bat in the system. There is no need for that, it should only commit bats that were changed. This has now been fixed.
Warnings and informational messages are now sent to stdout instead of stderr, which means that monetdbd will now log them with the tag MSG instead of ERR.

MonetDB5 Server

If the server is sent a SIGUSR1 signal, it prints out some useful information to the standard output. When run under monetdbd, this output will appear in the merovingian.log file.
There is now a new option –set tablet_threads=N to limit the number of threads used for a COPY INTO from CSV file query. This option can also be set for a specific database using the monetdb command using the ncopyintothreads property.

Merovingian

The command ‘monetdb snapshot write …’ caused a crash of the monetdb program. This has been fixed.

Bug Fixes

7410: SIGSEGV cause database corruption

OPEN SOURCE RELEASES

Jul2021-SP10 Bugfix Release (11.41.33)

MonetDB Common

Fixed parsing of the BBP.dir files when BAT ids grow larger than 2**24 (i.e. 100000000 in octal).

MonetDB5 Server

A bug was fixed where data from a client context was freed after the context was closed. This meant that the data being freed could belong to the next user of the context (a next client that just connected), leading to chaos (i.e. crashes).

SQL Frontend

When creating a hot snapshot, allow other clients to proceed, even with updating queries.

Jul2021-SP9 Bugfix Release (11.41.31)

MonetDB Common

When processing the WAL, if a to-be-destroyed object cannot be found, don’t stop, but keep processing the rest of the WAL.
A race condition was fixed where certain write-ahead log messages could get intermingled, resulting in a corrupted WAL file.
If opening of a file failed when it was supposed to get memory mapped, an incorrect value was returned to indicate the failure, causing crashes later on. This has been fixed.
When saving a bat failed for some reason during a low-level commit, this was logged in the log file, but the error was then subsequently ignored, possibly leading to files that are too short or even missing.
The write-ahead log (WAL) is now rotated a bit more efficiently by doing multiple log files in one go (i.e. in one low-level transaction).
Fixed a race condition that could lead to a bat being added to the SQL catalog but nog being made persistent, causing a subsequent restart of the system to fail (and crash).
Fixed a race condition where a hash could have been created on a bat using the old bat count while in another thread the bat count got updated. This would make the hash be based on too small a size, causing failures later on.
When extending a bat failed, the capacity had been updated already and was therefore too large. This could then later cause a crash. This has been fixed by only updating the capacity if the extend succeeded.
A bug was fixed when dealing with copy-on-write memory maps. These can occur for some bats used by the write-ahead log code when they grow large enough.

MonetDB5 Server

Client connections are cleaned up better so that we get fewer instances of clients that cannot connect.
Fix a bug where the MAL optimizer would use the starttime of the previous query to determine whether a query timeout occurred.

SQL Frontend

Increased the size of a variable counting the number of changes made to the database (e.g. in case more than a 2 billion rows are added to a table).
Improved cleanup after failures such as failed memory allocations.
An insert into a table from which a column was dropped in a parallel transaction was incorrectly not flagged as a transaction conflict.
Added some error checking to prevent crashes. Errors would mainly occur under memory pressure.
Fixed cleanup after a failed allocation where the data being cleaned up was uninitialized but still used as pointers to memory that also had to be freed.
A bug was fixed when optimizing combining of range select subexpressions.
If there was an error in one of the special commands to the server (e.g. setting the reply size for result sets), the server could get into an infinite loop. This has been fixed.
Fixed a double cleanup after a failed allocation in COPY INTO. The double cleanup could cause a crash due to a race condition it enabled.

Merovingian

Stop logging references to monetdbd’s logfile in said logfile.

Jul2021-SP8 Bugfix Release (11.41.27)

MonetDB Common

A bug was fixed when upgrading a database from the Oct2020 releases (11.39.X) or older when the write-ahead log (WAL) was not empty and contained instructions to create new tables.
Avoid logging of failure to backup files that didn’t need to be backed up in the first place.
Avoid an attempt to access a file when the database is in memory.

SQL Frontend

Fixed a busy loop in the code that applies the write-ahead log when there are log files that cannot yet be cleaned due to active transactions. This loop can become nasty when mserver5 is exiting.

Merovingian

In certain cases (when an mserver5 process exits right after producing a message) the log message was logged over and over again, causing monetdbd to use 100% CPU. This has been fixed.

Jul2021-SP7 Bugfix Release (11.41.25)

MonetDB Common

When destroying a bat, make sure there are no files left over in the BACKUP directory since they can cause problems when the bat id gets reused.
Fixed an off-by-one error in the logger which caused older log files to stick around longer in the write-ahead log than necessary.
When an empty BAT is committed, skip writing (and synchronizing to disk) the heap (tail and theap) files and write 0 for their sizes to the BBP.dir file. When reading the BBP.dir file, if an empty BAT is encountered, set the sizes of those files to 0. This fixes potential issues during startup of the server (BBPcheckbats reporting errors).
Make sure heap files of transient bats get deleted when the bat is destroyed. If the bat was a partial view (sharing the vheap but not the tail), the tail file wasn’t deleted.
Various changes were made to satisfy newer compilers.
The batDirtydesc and batDirtyflushed Boolean values have been deprecated and are no longer used. They were both holdovers from long ago.
Various race conditions (data races) have been fixed.
All accesses to the BACKUP directory need to be protected by the same lock. The lock already existed (GDKtmLock), but wasn’t used consistently. This is now fixed. Hopefully this makes the hot snapshot code more reliable.

MonetDB5 Server

Various race conditions (data races) have been fixed.

Merovingian

When multiple identical messages are written to the log, write the first one, and combine subsequent ones in a single message.
Fixed a leak where the log file wasn’t closed when it was reopened after a log rotation (SIGHUP signal).
Try to deal more gracefully with “inherited” mserver5 processes. This includes not complaining about an “impossible state”, and allowing such processes to be stopped by the monetdbd process.
When a transient failure occurs during processing of a new connection to the monetdbd server, sleep for half a second so that if the transient failure occurs again, the log file doesn’t get swamped with error messages.

Bug Fixes

Jul2021-SP6 Bugfix Release (11.41.23)

Bug Fixes

Jul2021-SP5 Bugfix Release (11.41.21)

MonetDB Common

Fixed a race condition which could cause a too large size being written for a .theap file to the BBP.dir file after the correct size file had been saved to disk.
We now ignore the size and capacity columns in the BBP.dir file. These values are essential during run time, but not useful in the on-disk image of the database.

Merovingian

Disabled logging into merovingian.log of next info message types: “proxying client <host>:<port> for database ‘<dbname>’ to <url>” and “target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying”. These messages were written to the log file at each connection. In most cases this information is not used. The disabling reduces the log file size.

Bug Fixes

Jul2021-SP4 Bugfix Release (11.41.19)

Bug Fixes

7267: Update after delete does not update some rows

Jul2021-SP3 Bugfix Release (11.41.15)

MonetDB Common

Fixed race condition during backup of BATs.
Fixed append to BATs of type msk (bit mask).
Fix to WAL logger when a BAT gets replaced within a transaction.

SQL Frontend

Add number of rows affected by output statements into the total rowcount.
Fix to MAL code generation.

Bug Fixes

7225: Invalid memory access when extending a BAT during appends
7228: COMMIT: transaction is aborted because of concurrency conflicts, will ROLLBACK instead

Jul2021-SP2 Bugfix Release (11.41.13)

Client Package

Dumping the database now also dumps the read-only and insert-only states of tables.

MonetDB Common

Sometimes when the server was restarted, it wouldn’t start anymore due to an error from BBPcheckbats. We finally found and fixed a (hopefully “the”) cause of this problem.

SQL Frontend

Number parsing for SQL was fixed. If a number was immediately followed by letters (i.e. without a space), the number was accepted and the alphanumeric string starting with the letter was interpreted as an alias (if aliases were allowed in that position).

Bug Fixes

7163: Multiple sql.mvc() invocations in the same query
7167: sys.shutdown() problems
7184: Insert into query blocks all other queries
7185: GROUPING SETS on groups with aliases provided in the SELECT returns empty result
7186: data files created with COPY SELECT .. INTO ‘file.csv’ fail to be loaded using COPY INTO .. FROM ‘file.csv’ when double quoted string data contains the field values delimiter character
7191: [MonetDBe] monetdbe_cleanup_statement() with bound NULLs on variable-sized types bug
7196: BATproject2: does not match always
7198: Suboptimal query plan for query containing JSON access filter and two negative string comparisons
7200: PRIMARY KEY unique constraint is violated with concurrent inserts
7206: Python UDF fails when returning an empty table as a dictionary

Jul2021-SP1 Bugfix Release (11.41.11)

MonetDB Common

Some deadlock and race condition issues were fixed.
Handling of the list of free bats has been improved, leading to less thread contention.
A problem was fixed where the server wouldn’t start with a message from BBPcheckbats about files being too small. The issue was not that the file was too small, but that BBPcheckbats was looking at the wrong file.
An issue was fixed where a “short read” error was produced when memory was getting tight.
When appending to a string bat, we made an optimization where the string heap was sometimes copied completely to avoid having to insert strings individually. This copying was still done too eagerly, so now the string heap is copied less frequently. In particular, when appending to an empty bat, the string heap is now not always copied whole.

SQL Frontend

If the server has been idle for a while with no active clients, the write-ahead log is now rotated.
A problem was fixed where files belonging to bats that had been deleted internally were not cleaned up, leading to a growing database (dbfarm) directory.
A leak was fixed where extra bats were created but never cleaned up, each taking up several kilobytes of memory.
[This feature was already released in Jul2021 (11.41.5), but the ChangeLog was missing] Grant indirect privileges. With “GRANT SELECT ON <my_view> TO <another_user>” and “GRANT EXECUTE ON FUNCTION <my_func> TO <another_user>”, one can grant access to “my_view” and “my_func” to another user who does not have access to the underlying database objects (e.g. tables, views) used in “my_view” and “my_func”. The grantee will only be able to access data revealed by “my_view” or conduct operations provided by “my_func”.
Improved error reporting in COPY INTO by giving the line number (starting with one) for the row in which an error was found. In particular, the sys.rejects() table now lists the line number of the CSV file on which the record started in which an error was found.

Bug Fixes

7140: SQL Query Plan Non Optimal with View
7165: ‘JOINIDX: missing ‘.’’ when running distributed join query on merged remote tables
7172: Unexpected query result with merge tables
7173: If truncate is in transaction then after restart of MonetDB the table is empty
7178: Remote Table Throws Error - createExceptionInternal: !ERROR: SQLException:RAstatement2:42000!The number of projections don’t match between the generated plan and the expected one: 1 != 1200

Jul2021 Feature Release (11.41.5)

Client Package

The MonetDB stethoscope has been removed. There is now a separate package available with PIP (monetdb_stethoscope) or an RPM or DEB package (stethoscope) from the monetdb.org repository.

Mapi Library

Add optional MAPI header field which can be used to immediately set reply size, autocommit, time zone and some other options, see mapi.h. This makes client connection setup faster. Support has been added to mapilib, pymonetdb and the jdbc driver.

ODBC Driver

A typo that made the SQLSpecialColumns function unusable was fixed.

MonetDB Common

A bug in the grouping code has been fixed.
Hash indexes are no longer maintained at all cost: if the number of distinct values is too small compared to the total number of values, the index is dropped instead of being maintained during updates.
A new type, called msk, was introduced. This is a bit mask type. In a bat with type msk, each row occupies a single bit, so 8 rows are stored in a single byte. There is no NULL value for this type.
The function of the BAT iterator (type BATiter, function bat_iterator) has been expanded. The iterator now contains more information about the BAT, and it contains a pointer to the heaps (theap and tvheap) that are stable, at least in the sense that they will remain available even when parallel threads update the BAT and cause those heaps to grow (and therefore possibly move in memory). A call to bat_iterator must now be accompanied by a call to bat_iterator_end.
Implemented function BUNreplacemultiincr to replace multiple values in a BAT in one go, starting at a given position.
Implemented new function BUNreplacemulti to replace multiple values in a BAT in one go, at the given positions.
Removed function BUNinplace, just use BUNreplace, and check whether the BAT argument is of type TYPE_void before calling if you don’t want to materialize.
Implemented a function BUNappendmulti which appends an array of values to a BAT. It is a generalization of the function BUNappend.
Changed the interface of the atom read function. It now requires an extra pointer to a size_t value that gives the current size of the destination buffer, and when that buffer is too small, it receives the size of the reallocated buffer that is large enough. In any case, and as before, the return value is a pointer to the destination buffer.
Environment variables (sys.env()) must be UTF-8, but since they can contain file names which may not be UTF-8, there is now a mechanism to store the original values outside of sys.env() and store %-escaped (similar to URL escaping) values in the environment. The key must still be UTF-8.
We now save the location of the min and max values when known.

MonetDB5 Server

When using the –in-memory option, mserver5 will run completely in memory, i.e. not create a database on disk. The server can still be connected to using the name of the in-memory database. This name is “in-memory”.
By using the option “–dbextra=in-memory”, mserver5 can be instructed to keep transient BATs completely in memory.

SQL Frontend

The system view sys.ids has been updated to include some more system IDs.
The sys.storage() function now only returns meta data, i.e. data that can be calculated without access to the column contents.
Since STREAM tables support is removed, left over STREAM tables are dropped from the catalog.
Fix a warning emitted by some implementations of the tar(1) command when unpacking hot snapshot files.
support reading the concatenation of compressed files as a single compressed file.
COPY BINARY overhaul. Allow control over binary endianness using COPY [ (BIG | LITTLE | NATIVE) ENDIAN] BINARY syntax. Defaults to NATIVE. Strings are now \0 terminated rather than \n. Support for BOOL, TINYINT, SMALLINT, INT, LARGEINT, HUGEINT, with their respective “INTMIN” values as the NULL representation; 32 and 64 bit FLOAT/REAL, with NaN as the NULL representation; VARCHAR/TEXT, JSON and URL with \x80 as the NULL representation; UUID as fixed width 16 byte binary values, with (by default) all zeroes as the NULL representation; temporal type structs as defined in copybinary.h with any invalid value as the NULL representation.
In the Jul2021 release the storage and transaction layers have undergone major changes. The goal of these changes is robust performance under inserts/updates and deletes and lowering the transaction startup costs, allowing faster (small) queries. Where the old transaction layer duplicated a lot of data structures during startup, the new layer shares the same tree. Using object timestamps the isolation of object is guaranteed. On the storage side the timestamps indicate whether a row is visible (deleted or valid), to a transaction as well. The changes also give some slight changes on the perceived transactional behavior. The new implementation uses shared structures among all transactions, which do not allow multiple changes of the same object. And we then follow the principle of the first writer wins, i.e., if a transaction creates a table with name ’table_name’, and concurrently one other transaction does the same the later of the two will fail with a concurrency conflict error message (even if the first writer never commits). We expect most users not to notice this change, as such schema changes aren’t usually done concurrently.
There is now a function sys.current_sessionid() to return the session ID of the current session. This ID corresponds with the sessionid in the sys.queue() result.
Merge statements could not produce correct results on complex join conditions, so a renovation was made. As a consequence, subqueries now have to be disabled on merge join conditions.
preserve in-query comments
Use of CTEs inside UPDATE and DELETE statements are now more restrict. Previously they could be used without any extra specification in the query (eg. with “v1”(“c1”) as (…) delete from “t” where “t”.“c1” = “v1”.“c1”), however this was not conformant with the SQL standard. In order to use them, they must be specified in the FROM clause in UPDATE statements or inside a subquery.
Added ‘schema path’ property to user, specifying a list of schemas to be searched on to find SQL objects such as tables and functions. The scoping rules have been updated to support this feature and it now finds SQL objects in the following order: 1. On occasions with multiple tables (e.g. add foreign key constraint, add table to a merge table), the child will be searched on the parent’s schema. 2. For tables only, declared tables on the stack. 3. ’tmp’ schema if not listed on the ‘schema path’. 4. Session’s current schema. 5. Each schema from the ‘schema path’ in order. 6. ‘sys’ schema if not listed on the ‘schema path’. Whenever the full path is specified, ie “schema”.“object”, no search will be made besides on the explicit schema.
To update the schema path ALTER USER x SCHEMA PATH y; statement was added. [SCHEMA PATH string] syntax was added to the CREATE USER statement. The schema path must be a single string where each schema must be between double quotes and separated with a single comma, e.g. ‘“sch1”,“sch2”’ For every created user, if the schema path is not given, ‘“sys”’ will be the default schema path.
Changes in the schema path won’t be reflected on currently connected users, therefore they have to re-connect to see the change. Non existent schemas on the path will be ignored.
Leftover STREAM table definition from Datacell extension was removed from the parser. They had no effect anymore.

Merovingian

Deprecate ‘profilerstart’ and ‘profilerstop’ commands. Since stethoscope is a separate project (https://pypi.org/project/monetdb-pystethoscope/) the installation directory is not standard anymore. ‘profilerstart’ and ‘profilerstop’ commands assume that the stethoscope executable is in the same directory as ‘mserver5’. This is no longer necessarily true since stethoscope can now be installed in a python virtual environment. The commands still work if stethoscope is installed using the official MonetDB installers, or if a symbolic link is created in the directory where ‘mserver5’ is located.
The exittimeout value can now be set to a negative value (e.g. -1) to indicate that when stopping the dbfarm (using monetdbd stop dbfarm), any mserver5 processes are to be sent a termination signal and then waited for until they terminate. In addition, if exittimeout is greater than zero, the mserver5 processes are sent a SIGKILL signal after the specified timeout and the managing monetdbd is sent a SIGKILL signal after another five seconds (if it didn’t terminate already). The old situation was that the managing monetdbd process was sent a SIGKILL after 30 seconds, and the mserver5 processes that hadn’t terminated yet would be allowed to continue their termination sequence.

Bug Fixes

2030: Temporary table is semi-persistent when transaction fails
7031: I cannot start MonetDB, because the installation path has Chinese.
7055: Table count returning function used inside other function gives wrong results.
7075: Inconsistent Results using CTEs in Large Queries
7079: WITH table AS… UPDATE ignores the WHERE conditions on table
7081: Attempt to allocate too much space in UPDATE query
7093: ‘current_schema’ not in sys.keywords
7096: DEBUG SQL statement broken
7115: Jul2021: ParseException while upgrading Oct2020 database
7116: Jul2021: Cannot create filter functions
7125: MonetDB Round Function issues in the latest release
7126: The “lower” and “upper” functions doesn’t work for Cyrillic alphabet
7127: Bug report: “write error on stream” that results in mclient crash
7128: Bug report: strange error message “Subquery result missing”
7129: Bug report: TypeException:user.main[19]:‘batcalc.between’ undefined
7130: Bug report: TypeException:user.main[396]:‘algebra.join’ undefined
7131: Bug report: TypeException:user.main[273]:‘bat.append’ undefined
7133: WITH ( SELECT x ) DELETE FROM … deletes wrong tuples
7136: MERGE statement is deleting rows if the column is set as NOT NULL even though it should not
7137: Segmentation fault while loading data
7138: Monetdb Python UDF crashes because of null aggr_group_arr
7141: COUNT(DISTINCT col) does not calculate correctly distinct values
7142: Aggregates returning tables should not be allowed
7144: Type up-casting (INT to BIGINT) doesn’t always happen automatically
7146: Query produces this error: !ERROR: Could not find %102.%102
7147: Internal error occurs and is not shown on the screen
7148: Select distinct is not working correctly
7151: Insertion is too slow
7153: System UDFs lose their indentation - Python functions broken
7158: Python aggregate UDF returns garbage when run on empty table
7161: fix priority