This page explains the MonetDB Client/Server Protocol, also called MAPI (MonetDB API). In particular, this page highlights the differences between Protocol v9 and Protocol v10.
All communication with MonetDB happens through a block stream. When the server or client wants to send a chunk of data of n bytes, it is divided into blocks. Each block is prefixed with (1) the amount of data in the block, and (2) a continuation bit, which indicates whether or not the block is the last block in the sequence. The full message can be received by reading blocks until the continuation bit is no longer set, and concatenating the data in the blocks together.
MAPI 9 uses a block stream with a 2-byte indicator prefix, and a maximum block size of 8192 bytes. The low-order bit is the continuation bit; and the blocksize can be determined by right-shifting the initial 2-byte prefix by one.
MAPI 10 uses a block stream with a 8-byte indicator prefix for all communication except the initial authentication steps. The maximum block size is variable, and can be configured by the client/server. Aside from the increased potential size, the block stream can be compressed using either Snappy or LZ4. A compressed block stream works exactly like a normal block stream, except every single block (excluding the indicator bytes) are compressed before transferring and decompressed before reading.
The client-server protocol starts with the authentication phase. The initial communication always uses the MAPI 9 blocked stream (with 2-byte prefix) for backwards compatibility reasons. When the client connects, the server sends a message in the form of
- challenge is a random challenge used for authentication
- servertype is just mserver
- protover indicates the protocol version used by the server (always 9, even for MAPI 10)
- algos is a comma separated list of hash algorithms the server supports. In MAPI 10, this list also contains PROT10 (to indicate support for protocol 10) and optionally COMPRESSION_SNAPPY or COMPRESSION_LZ4 (to indicate support for those compression methods)
- endian indicates endianness of the server (BIG or LIT)
- hash' is the hash algorithm used by the server (always SHA512)
After receiving this challenge, the client sends a response to the server. This response has the form:
- endian indicates endianness of the client (BIG or LIT)
- username is the login username
- pwhash is a hash of the password
- language is SQL unless you're crazy
- database is a name of the database the client is connecting to
- PROT10 is a constant that indicates you want to use protocol 10
- compression is the compression you want to use (COMPRESSION_NONE, COMPRESSION_SNAPPY or COMPRESSION_LZ4)
- blocksize is the requested blocksize for the new block stream in bytes (typically 1MB, i.e. 1000000)
If the response of the client includes PROT10, both the client and the server now switch to the updated block stream with an 8-byte prefix. In addition, if the clients' response includes a compression method, that compression method will be used (unless the server does not support it). All further communication is performed using the new block stream.
Further communication proceeds in the same fashion for MAPI 9 and MAPI 10, in a line-based format. The only difference is the result set format. In MAPI 9, a result set is transferred using the header &1\n followed by a newline-separated list of the individual rows.
In the new protocol, two new message types are introduced.
Result Set Header
The result set header starts with *\n, followed by a blob of binary data containing metadata about the result set.
The first 32-bytes contain a bunch of integer values. These are displayed in the format [DESCRIPTION:BYTES] below.
- RESULTSETID: the result set id, used for subsequent Xexport commands
- QUERYID: query id of the query that this result set originated from
- ROWCOUNT: total amount of rows in the result set
- COLUMNCOUNT: total amount of columns in the result set
- TIMEZONE: timezone offset in seconds, used for timetz and timestamptz types
After that, for each of the columns in COLUMNCOUNT the information about that column is transferred. This information contains strings, labeled as [DESCRIPTION:STRING]. Strings are transferred using a NULL terminator.
- TABLENAME: name of table that column originates from
- COLUMNNAME: name of the column
- TYPENAME: SQL type of the column
- INTERNALSIZE: the internal size of the column (i.e. 4 for 4-byte standard integers), this is -1 for variable length types such as strings or blobs
- PRECISION: precision of the column
- SCALE: scale of the column, used for decimals
- NULL_LENGTH: length of the following NULL-value in bytes. If this is 0 the column has no NULL values. For fixed-width types, this value is either 0 or INTERNALSIZE.
- NULL_VALUE: The value that represents NULL in the subsequent data.
Result Set Chunk
After the header is transferred, data is transferred. This follows the same rules as MAPI 9 w.r.t. Xexport and Xreply_size except the way in which the result set is encoded is changed. Like the result set header, result set data starts with a single newline indicating the message type followed by a binary blob. This is +\n (for the first chunk) or -\n (for any subsequent chunks).
The binary data has the following format:
- ROWCOUNT: the amount of rows in the chunk
- COLUMNDATA: the data of the rows in binary columnar format
The result set header information contains everything you need to read COLUMNDATA. The data is send in the same order as the columns in the header (the first column in the header appears first in each chunk), and each column contains [ROWCOUNT] entries in the chunk. In the binary data, NULL values are encoded using the NULL_VALUE that was transferred in the header for that column.
The format of the binary data depends on the SQL type (transferred in TYPENAME).
- BOOLEAN: bool
- TINYINT: char
- SMALLINT: short
- INT: int
- BIGINT: lng
- HUGEINT: hge
- REAL: float
- DOUBLE: double
- OID: int or lng (depending on INTERNALSIZE)
- DECIMAL: char, short, int, lng or hge (depending on INTERNALSIZE). The floating point value can be obtained by dividing by 10^SCALE
- DATE: lng (unix timestamp)
- TIMESTAMP: lng (unix timestamp)
- TIME: int (ms since 00:00:00)
- VARCHAR: null-terminated strings
- BLOB: [SIZE:8][DATA:SIZE]
- Anything else: converted to strings
Any column with INTERNALSIZE < 0 will be prefixed by the width of that column as 8-byte integer (i.e. [SIZE:8][DATA:SIZE]). This way you can move to the next column by either computing the width (INTERNALSIZE * ROWCOUNT) or by reading the size of the column from the chunk.
NOTE All the start-positions of the column data is 8-byte aligned (so elements can be read directly from the buffer on systems that require this alignment). This means that, for example, if a column contains one four-byte integer, the next column will start at 8 bytes after that column, not 4 bytes.