The SQL Language

The SQL Language mk Mon, 02/24/2020 - 09:37

The de facto language for database applications is SQL. It evolved through several phases of standardization to the version currently known as SQL-2008. The SQL standard provides an ideal language framework, in terms of standardization committee viewpoints. It is, however, hardly met by any of the existing (commercial) implementations. This is largely due to software legacy, backward compatibility requirements from their client base, and introduction of new features. See for instance the wikipedia on-line article on SQL standards and an overview of  SQL features of main-stream systems.

In 2002 the first version of the SQL front end for MonetDB emerged. This late development made it possible to immediately start from the SQL'99 definition. As soon as the SQL'03 specifications became available, its content was taken as the frame of reference. The SQL development strategy is driven by immediate needs of the user base, so that less-frequently used features end up low on the development stack. Features beyond the standard are marked as non-standard SQL in their description.

The architecture is based on a compiler, which translates SQL statements into MAL (MonetDB Assembly Language) statements. In this process common optimization heuristics, specific to the relational algebra are performed. There are bindings for SQL with e.g. JDBC, PHP, Ruby on Rails, Python and C, to integrate seamlessly in the existing developments environments.

The remainder of this manual presents a synopsis of the language features. It provides a quick intro on the scope of the current implementation. Proficiency in elementary use of SQL is assumed. If you are new to this world then pick up any of the introduction books and study it carefully, e.g. J. Melton and A.R. Simon, SQL:1999 Understanding Relational Language Components, ISBN 1558604561.

The language features recognized are presented in simplified BNF notation. All reserved words are depicted in capitals. Language (non) terminals are depicted in lower case italics. Optional constructs are enclosed by squary brackets  ('[',']') with alternative  choices separated by a bar ('|'), at most one can be choosen. A set of options is enclosed by curly braces '{','}' and one must be choosen. Repetition of language structures is depicted by list structures, e.g. A ',' ...  denotes a sequence of at least one A separated by commas.

Identifiers are optional qualified with a semantic tag, e.g. authid means a name for an authorised user.

SQL Syntax

SQL Syntax giulia Mon, 02/24/2020 - 09:38

Under construction

Lexical Structure

Lexical Structure giulia Mon, 03/02/2020 - 14:16

Comments

Comments can be added to query scripts for documentation purposes. MonetDB/SQL supports two forms of comments. Any text beginning with '- -' and up to the end of line is ignored. Furthermore, C-style comments ( /* this is ignored */ ) can be injected in a SQL query where a language token is expected. A language token can be a keyword, identifier, a literal constant or a special character. Tokens should be  separated by 'white space' (tab, space, newlines) unless there juxtaposition does not create an ambiguous interpretation. Comments are considered equivalent to white spaces.

Identifiers and Keywords

SQL comes with a large collection of keywords, i.e. names reserved by the committee to designate language constructs. Keywords in MonetDB/SQL are case in-sensitive and adhere to the general convention for identifier denotation in programming languages.

Users can overrule the interpretation of an identifier as a keyword by encapsulation with double quotes, e.g. select denotes a keyword, while "select" denotes a user defined name. This scheme also permits inclusion of white space in the names. However, it is general advisable to limit the number of the escaped keywords.

Names are used to designate database objects. In that role, they are case in-sensitive. The terminal identifier classes distinguished are denoted with the suffix _name.

Literal Constants

Numeric constants follow the convention of most programming languages. A numeric constant that contains neither a decimal point or exponential is considered of type integer. The underlying machine determines whether it will be interpreted as a 32- or 64-bit value, although most likely it will be that latter case these days.

String constants in SQL are embraced with single quotes like: 'string data'. If your data contains single quote's, each single quote need to be doubled, so 'a single '' quote in a string' represents string: a single ' quote in a string. We support Unicode character string literals (from Apr2019 release 11.33.3) using: U&'...' including UESCAPE. For the string literals, you can have U&'...' '...' '...' UESCAPE '...' where the escape must be as single character and the other '...' strings are also Unicode character string literals. For now, these latter strings also undergo C-style backslash interpretation.

We also support PostgreSQL-like E'...' strings (from Apr2019 release 11.33.3). The strings can contain C-style backslash escapes.

Often strings can be cast to other types using the CAST( expr AS type) expression, provided the typed coercion routine is available. For example:

CAST ( '129' AS integer)
CAST ( '0.3' AS double)

illustrates conversion of a color value in hexadecimal string notation into an integer, and a value into a double precision floating point number.

Temporal Constants

The DATE, TIME and TIMESTAMP data types come with a straightforward string literal conversion structure, e.g.:

SELECT DATE '2014-02-03', TIME '15:45:56', TIMESTAMP '2014-02-03 15:45:56';

produces a single properly typed row result. It is equivalent to:

SELECT CAST('2014-02-03' AS DATE), CAST ('15:45:56' AS TIME), CAST ('2014-02-03 15:45:56' AS TIMESTAMP);

Special Characters

String literals (currently both the '...' and E'...' styles) may contain the C-style escape characters: '\n' for new lines, '\t' for tab, '\r' for return, and '\\' for backslash. The conventions '\ddd` where d a digit stands for a number denoted in octal.

Operator Precedences

Most operators in SQL have the same precedence and are left-associative. Parenthesis can be used to disambiguate the precedence order. The operator definitions can not be overloaded, nor can you define new operators.

. left table/column name separator
- right unary minus
* / % left multiplication, division, modulo
+ - left unary addition, subtraction
IS, IS TRUE, IS FALSE, IS UNKNOWN    
IS NULL, NOT NULL   test for (not) null
IN   set membership
BETWEEN   range containment
OVERLAPS   time interval overlap
LIKE, ILIKE   string pattern matching
< >   less than, greater than
= right equality, assignment
NOT right logical negation
AND left logical conjunction
OR left logical disjunction
^ left logical exclusive disjunction

Value Expressions

Value Expressions giulia Mon, 03/02/2020 - 14:18

The language is built around value- and table- expressions. Value expressions encompass denotation of literal constants, type casts, column references, operator and function invocation, and scalar returning subqueries.

The column references take the form [correlation_name '.'] column_name. The correlation_name is either a table name or an alias introduced in a from clause. The correlation name may be omitted if the column name uniquely identifies a column within the scope of current query.

The table references take the form [schema_name '.'] table_or_view_name [AS alias_name]. Table expressions produce a relational table. It is internally referenced through a correlation name, which supports attribute selection using the '.' denotation.

Data Types

Data Types giulia Mon, 02/24/2020 - 09:48

SQL prescribes a large collection of built-in types, most of which are directly supported. The MonetDB database kernel can also be extended with user defined types to better support application domains.

Some domain specific data types (e.g. GeoSpatial) are not part of the MonetDB core, but are provided by extensions. For more information check the Spatial data types article.

Built-in SQL Types

Built-in SQL Types giulia Mon, 03/02/2020 - 17:04
CHAR [ '(' length ')' ]  |
CHARACTER [ '(' length ')' ]
UTF-8 character string with length upperbound limit.
CHAR or CHARACTER without the "( length )" specification are treated as CHAR(1).
Note: currently no spaces are padded at the end
   
VARCHAR '(' length ')' |
CHARACTER VARYING '(' length ')'
UTF-8 character string with length upperbound limit
   
CHARACTER LARGE OBJECT | CLOB | TEXT | STRING UTF-8 character string with unbounded length
   
CHARACTER LARGE OBJECT '(' length ')' |
CLOB '(' length ')' |
TEXT '(' length ')' |
STRING '(' length ')'
UTF-8 character string with length upperbound limit
   
BLOB | BINARY LARGE OBJECT bytes string with unbounded length
   
BOOLEAN | BOOL logic values: true or false
   
TINYINT 8 bit signed integer between -127 and 127
SMALLINT 16 bit signed integer between -32767 and 32767
INT | INTEGER 32 bit signed integer between -2147483647 and 2147483647
BIGINT 64 bit signed integer between -9223372036854775807 and 9223372036854775807
   
HUGEINT 128 bit signed integer between -2127 +1 and +2127 -1 (±170141183460469231731687303715884105727)
Note: HUGEINT is only available on platforms with a C-compiler that supports the __int128 or __int128_t data type (e.g., recent gcc, clang, & icc on Linux or MacOS X) and from Jul2015 release onwards
   
DECIMAL '(' Prec ',' Scale ')' |
DEC '(' Prec ',' Scale ')' |
NUMERIC '(' Prec ',' Scale ')'
Exact decimal number with precision Prec and scale Scale. Prec must be between 1 and 18 (or 38 when HUGEINT is also supported). Scale must be between 0 and Prec
   
DECIMAL '(' Prec ')' |
DEC '(' Prec ')' |
NUMERIC '(' Prec ')'
Exact decimal number with precision Prec and scale 0. Prec must be between 1 and 18 (or 38 when HUGEINT is also supported).
   
DECIMAL | DEC | NUMERIC Exact decimal number with precision 18 and scale 3.
   
REAL 32 bit floating point approximate number
FLOAT | DOUBLE | DOUBLE PRECISION 64 bit floating point approximate number
FLOAT '(' Prec ')' floating point approximate number with binary precision Prec. Prec must be between 1 and 53. FLOAT(24) is same as REAL, FLOAT(53) is same as DOUBLE

 

All scalar types include a NULL value, which is internally represented as a valid domain value. For numerical types, this is the smallest value in the type's domain (i.e., the one omitted in the ranges given above). Arithmetic expressions that overflow may lead to returning the NULL instead.

The integer types align with the storage of 1, 2, 4, 8 and 16 bytes.

The numeric and decimal types are represented as fixed length integers, whose decimal point is produced during result rendering.

The types REAL, FLOAT and DOUBLE map to the underlying implementation system. No special attention is given to the value NaN.

JSON Data Type

JSON Data Type giulia Mon, 03/02/2020 - 17:17

JSON has become the de facto light weight data interchange format for many web applications. It has a simple hierarchical structure and supports a limited set of value types. JSON is a natural representation of data for the C family of programming languages.

JSON is supported in MonetDB as a subtype over type VARCHAR, which ensures that only valid JSON strings are added to the database.

Example: CREATE TABLE json_example (c1 JSON, c2 JSON(512) NOT NULL);

MonetDB supports most of the JSON path expressions defined in [ref]. It can be used to decompose the values into regular tables, which then act as an index. A limited set of operators are predefined in the SQL catalogue.

json.filter(J, Pathexpr) Extracts the component from J that satisfied the Pathexpr
json.filter(J, Number) Extracts a indexed component from J
json.text(J, [Sep]) Glue together the values separated by Sep character (default space)
json.number(J) Turn a number, singleton array value, or singleton object tag into a double
json."integer"(J) Turn a number, singleton array value, or singleton object element into an integer
json.isvalid(StringExpr) Checks the string for JSON compliance. Returns boolean.
json.isobject(StringExpr) Checks the string for JSON object compliance. Returns boolean.
json.isarray(StringExpr) Checks the string for JSON array compliance. Returns boolean.
json.length(J) Returns the number of top-level components of J.
json.keyarray(J) Returns a list of key tags for the top-level components of J.
json.valuearray(J) Returns a list of values for the top-level components of J.

JSON path expressions always refer to a single JSON structure. The root of this structure is identified by the identifier '$', which is implicitly assumed in most expressions. Components of the JSON structure are addressed through the dot notation, i.e. representing child steps and array element access. The wild card can be used for child names and undetermined array indices.

JSON path Description Example
"$" The root object json.filter(v, '$') = { "store":...}"
"." childname The child step operator json.filter(v, '$.store.bicycle') = {"color": "red", "price": 19.95}
".." childname Recursive child step json.filter(v, '$..price') = [8.95,12.99,8.99,22.99,19.95]
"*" Child name wildcard json.filter(v, '$.store.bicycle.*') = {"color": "red", "price": 19.95}
"[" nr "]" Array element access json.filter(v, '$.store.book.[1]') = the second book
"[" * "]" Any array element access  
E1 "," E2 Union path expressions json.filter(v, '$.store.bicycle,$..price')

An example JSON object used for these expressions [ref], more examples in the testsuite.

{ "store": {
    "book": [
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

Network Address Type

Network Address Type mk Mon, 03/02/2020 - 17:24

The SQL type inet describes IPv4 network addresses. The inet module contains a collection of functions that operate on IPv4 addresses.  The most relevant functions are the `containment' functions that deal with subnet masks.

  Description
  sys.broadcast(p inet) returns inet
  sys.host(p inet) returns clob
  sys.masklen(p inet) returns int
  sys.setmasklen(p inet, mask int) returns inet
  sys.netmask(p inet) returns inet
  sys.hostmask(p inet) returns inet
  sys.network(p inet) returns inet
  sys.text(p inet) returns clob
  sys.abbrev(p inet) returns clob
  sys."left_shift"(i1 inet, i2 inet) returns boolean
  sys."right_shift"(i1 inet, i2 inet) returns boolean
  sys."left_shift_assign"(i1 inet, i2 inet) returns boolean
  sys."right_shift_assign"(i1 inet, i2 inet) returns boolean

The functionality of this module is greatly inspired by the PostgreSQL inet atom. It should be extended to also support IPv6.

Serial Data Types

Serial Data Types giulia Mon, 03/02/2020 - 17:07

As of 2003 the SQL standard supports serial types (sequences). They are of particular use in auto-generating key values.A serial type is defined as a primary database object over any of the built-in data types. The NEXT VALUE FOR operation generates the next value and can be used anywhere a value expression is allowed. Its name should be unique within the current schema. A sequence can only be dropped when the references (e.g. in the DEFAULT specification of a column) have previously been removed.

serial_data_types:
     SERIAL
   | BIGSERIAL
   | int_datatype AUTO_INCREMENT
   | int_datatype GENERATED ALWAYS AS IDENTITY [ '(' seq_params ')' ]

sequence_commands:
     CREATE SEQUENCE [ schema_name '.'] seq_name [ seq_params ]
   | DROP SEQUENCE [ schema_name '.'] seq_name
   | ALTER SEQUENCE [ schema_name '.'] seq_name [ alter_seq_params ]

seq_params:
    [ AS int_datatype ]
    [ START WITH bigint ]
    [ INCREMENT BY bigint ]
    [ MINVALUE bigint | NO MINVALUE ]
    [ MAXVALUE bigint | NO MAXVALUE ]
    [ CACHE bigint ]
    [ [ NO ] CYCLE ]

alter_seq_params:
    [ AS int_datatype ]
    [ RESTART [WITH bigint ] ]
    [ INCREMENT BY bigint ]
    [ MINVALUE bigint | NO MINVALUE ]
    [ MAXVALUE bigint | NO MAXVALUE ]
    [ CACHE bigint ]
    [ [ NO ] CYCLE ]

int_datatype:
    BIGINT | INTEGER | INT | SMALLINT | TINYINT

Example. The example shown below introduces the column count, which is incremented with each row being added. It is conceptually identical to the value expression max(count)+1in each insert. The column info is a limited range with wrap around.The serial type as found in PostgreSQL and the auto_increment  flag as found in MySQL are both mapped onto a sequence type in MonetDB/SQL.

CREATE TABLE test_serial (
   d DATE,
   id SERIAL,  -- this will implicitly create a PKey. Use BIGSERIAL if you want the id to be of type bigint instead of int.
   count INT AUTO_INCREMENT,
   info INT GENERATED ALWAYS AS

        IDENTITY (
           START WITH 100 INCREMENT BY 2
           NO MINVALUE MAXVALUE 1000
           CACHE 2 CYCLE
)      );

Much like other primary database objects, the sequence type can be altered at any time as illustrated below.

sql>CREATE SEQUENCE "my_test_seq" as integer START WITH 2;
sql>CREATE TABLE test (t int DEFAULT NEXT VALUE FOR "my_test_seq", v char);
sql>INSERT INTO test(v) VALUES ('a');
Rows affected 1
sql>INSERT INTO test VALUES (10, 'b');
Rows affected 1
sql>ALTER SEQUENCE "my_test_seq" RESTART WITH (SELECT MAX(t) + 1 FROM test);
sql>INSERT INTO test(v) VALUES ('c');
Rows affected 1
sql>SELECT * FROM test;
+----+---+
| t  | v |
+====+===+
| 2  | a |
| 10 | b |
| 11 | c |
+----+---+

The functions sys.get_value_for('<schema name>', '<sequence name>') and sys.next_value_for('<schema name>', '<sequence name>') can be used to query the current value of a sequence. The difference is that next_value_for() also advances the current value of a sequence to the next value.  The SQL statement SELECT NEXT VALUE FOR <schema name>.<sequence name> is an equivalent of sys.next_value_for('<schema name>', '<sequence name>'). The following queries demonstrate how these functions and statement work:

sql>CREATE SEQUENCE "myseq" AS INTEGER;
operation successful
sql>SELECT get_value_for('sys', 'myseq');
+------+
| L2   |
+======+
|    1 |
+------+
1 tuple
sql>SELECT next_value_for('sys', 'myseq');
+------+
| L2   |
+======+
|    1 |
+------+
1 tuple
sql>SELECT NEXT VALUE FOR myseq;
+------+
| L2   |
+======+
|    2 |
+------+
1 tuple
sql>SELECT get_value_for('sys', 'myseq');
+------+
| L2   |
+======+
|    2 |
+------+
1 tuple

Temporal Types

Temporal Types giulia Mon, 03/02/2020 - 17:05

MonetDB/SQL supports a rich set of time-related operations within the kernel. The starting point are SQL 92 and the ODBC time-related data types.  There are six basic types and operations on them:

DATE a date in the Gregorian calendar YYYY-MM-DD, e.g. 1999-12-31
TIME  [ '(' posint ')' ] time of day (HH.MI.SS.sss) with precision (default 0)
TIME  [ '(' posint ')' ] WITH TIME ZONE time of day (HH.MI.SS.sss) with precision (default 0) and time zone information
TIMESTAMP [ '(' posint ')' ] date concatenated with unique time, precision
TIMESTAMP [ '(' posint ')' ] WITH TIME ZONE date concatenated with unique time, precision and time zone information
INTERVAL interval_qualifier a temporal interval such as: INTERVAL SECOND or INTERVAL MONTH

interval_qualifier :
    { YEAR | MONTH | DAY | HOUR | MINUTE | SECOND [time_precision] }
   |  start_field  TO  end_field

start_field :
   
{ YEAR | MONTH | DAY | HOUR | MINUTE }

end_field:
   
{ YEAR | MONTH | DAY | HOUR | MINUTE | SECOND [time_precision] }

time_precision :
     '(' posint ')'

A timestamp is a combination of date and time, indicating an exact point in time (GMT). GMT is the time at the Greenwich meridian without a daylight savings time (DST) regime. Absence of DST means that hours are consecutive (no jumps) which makes it easy to perform time difference calculations.

The local time is often different from GMT (even at Greenwich in summer, as the UK also has DST). Therefore, whenever a timestamp is composed from a local date and time a timezone should be specified in order to translate the local time to GMT (and vice versa if a timestamp is to be decomposed in a local date and time). To adjust the local time can issue a command such as SET TIME ZONE INTERVAL '1' HOUR TO MINUTE.

We provide predefined timezone objects for a number of timezones (see below). Also, there is one timezone called the local timezone, which can be set to one global value in a running MonetDB server, that is used if the timezone parameter is omitted from a command that needs it (if not set, the default value of the local timezone is plain GMT).

The value ranges and lexical denotations are defined as follows:

Min and max years. The maximum and minimum dates and timestamps that can be stored are in the years 5,867,411 and -5,867,411, respectively. Interestingly, the year 0 is not a valid year. The year before 1 is called -1.

Dates. Fall in a valid year, and have a month and day that is valid in that year. The first day in the year is January 1, the last December 31. Months with 31 days are January, March, May, July, August, October, and December, while April, June, September and November have 30 days. February has 28 days, expect in a leap year, when it has 29. A leap year is a year that is an exact multiple of 4. Years that are a multiple of 100 but not of 400 are an exception; they are no leap years.

Time. The smallest time is 00:00:00.000 and the largest 23:59:59.999 (the hours in a time range between [0,23], minutes and seconds between [0,59] and milliseconds between [0:999] ). Time identifies a valid time-of-day, not an amount of time (for denoting amounts of time, or time differences, we use here concepts like "number of days" or "number of seconds" denoted by some value of a standard integer type).

Timestamp. A valid timestamp is formed by a combination of a valid date and valid daytime. When creating a timestamp from a date and daytime, a timezone should be specified (if timezone is omitted, the local timezone is assumed). If a timezone is specified, it is used to convert the date and time in that timezone to GMT, which is the internal timestamp representation. One problem here is that the first hour after DST has ended (some Sunday night in autumn, generally), the time is set back one hour, so the same times occur twice. Hence two translations to a timestamp are possible for such date and time combinations. In those case, we act as if it was the first occurrence (still within DST).

For difference calculations between dates (in numbers of days) we use signed integer. Hence, the valid range for difference calculations is between -2147483647 and 2147483647 days (which corresponds to roughly -5,867,411 and 5,867,411 years).

For difference between timestamps (in numbers of milliseconds) we use 64-bit bigint. These are large integers of maximally 19 digits, which therefore impose a limit of about 106,000,000,000 years on the maximum time difference used in computations.

datetime_func:
     EXTRACT '(' datetime_field FROM scalar_exp ')'
   | CURRENT_DATE [ '(' ')' ]
   | CURRENT_TIME [ '(' ')' ]
   | CURRENT_TIMESTAMP [ '(' ')' ]
   | LOCALTIME [ '(' ')' ]
   | LOCALTIMESTAMP [ '(' ')' ]

datetime_field:
     YEAR | MONTH | DAY | HOUR | MINUTE | SECOND | CENTURY | DECADE | QUARTER | WEEK | DOW | DOY

Gregorian dates.
The basics of the Gregorian calendar stem from the time of Julius Caesar, when the concept of a solar year as consisting of 365.25 days (365 days plus once in 4 years one extra day) was introduced. However, this Julian Calendar, made a year 11 minutes long, which subsequently accumulated over the ages, causing a shift in seasons. In medieval times this was noticed, and in 1582 Pope Gregory XIII issued a decree, skipped 11 days. This measure was not adopted in the whole of Europe immediately, however.  For this reason, there were many regions in Europe that upheld different dates.

It was only on September 14, 1752 that some consensus was reached and more countries joined the Gregorian Calendar, which also was last modified at that time. The modifications were twofold: first, 12 more days were skipped. Second, it was determined that the year starts on January 1 (in England, for instance, it had been starting on March 25). Other parts of the world have adopted the Gregorian Calendar even later.

MonetDB implements the Gregorian Calendar in all its regularity. This means that values before the year 1752 probably do not correspond with the dates that people really used in times before that (what they did use, however, was very vague anyway, as explained above). In solar terms, however, this calendar is reasonably accurate (see the "correction seconds" note below).

Timezones
The basic timezone regime was established on November 1, 1884 in the International Meridian Conference held in Greenwich (UK). Before that, a different time held in almost any city. The conference established 24 different time zones defined by regular longitude intervals that all differed by one hour.  Not for long it was that national and political interest started to erode this nicely regular system.  Timezones now often follow country borders, and some regions (like the Guinea areas in Latin America) have times that differ with a 15 minute grain from GMT rather than an hour or even half-an-hour grain.

An extra complication became the introduction of daylight saving time (DST), which causes a time jump in spring, when the clock is skips one hour and in autumn, when the
clock is set back one hour (so in a one hour span, the same times occur twice). The DST regime is a purely political decision made on a country-by-country basis. Countries in the same timezone can have different DST regimes. Even worse, some countries have DST in some years, and not in other years.

To avoid confusion, this temporal type module stores absolute points of time in GMT only (GMT does not have a DST regime). When storing local times in the database, or retrieving local times from absolute timestamps, a correct timezone object should be used for the conversion.

Applications that do not make correct use of timezones, will produce irregular results on e.g. time difference calculations.

Correction seconds
Once every such hundred years, a correction second is added on new year's night. This rule would seriously complicate the temporal type module (as then the duration of a day, which is now the fixed number of 24*60*60*1000 milliseconds, becomes parametrized by the date), it is not implemented. Hence these seconds are lost, so time difference calculations in milliseconds (rather than in days) have a small error if the time difference spans many hundreds of years.

We cannot handle well changes in the timezone rules (e.g., DST only exists since 40 years, and some countries make frequent changes to the DST policy). To accommodate this we should make timezone_local a function with a year parameter. The tool should maintain and access a timezone database. Lookup of the correct timezone would be dynamic in this structure. The timezone_setlocal would just set the string name of the timezone.

URL Data Type

URL Data Type giulia Mon, 03/02/2020 - 17:18

The related URL data type is a subdomain over strings and represent valid Uniform Resource Locators.

Example: CREATE TABLE URL_example (c1 URL, c2 URL(512) NOT NULL);

A collection of functions are provided to create, validate and extract portions.

  Description
  sys.getAnchor(theUrl url) returns string
  sys.getBasename(theUrl url) returns string
  sys.getContent(theUrl url) returns string
  sys.getContext(theUrl url) returns string
  sys.getDomain(theUrl url) returns string
  sys.getExtension(theUrl url) returns string
  sys.getFile(theUrl url) returns string
  sys.getHost(theUrl url) returns string
  sys.getPort(theUrl url) returns string
  sys.getProtocol(theUrl url) returns string
  sys.getQuery(theUrl url) returns string
  sys.getUser(theUrl url) returns string
  sys.getRobotURL(theUrl url) returns string
  sys.isaURL(theUrl string) returns boolean
  sys.newurl(protocol string, hostname string, "port" int, file string) returns url
  sys.newurl(protocol string, hostname string, file string) returns url

UUID Data Type

UUID Data Type mk Mon, 03/02/2020 - 17:23

The data type  UUID  stores a Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. It can be used as a global unique 128-bit identifier. UUIDs are generated by an algorithm that ensures an extreme low probability that two calls to the same create function will ever produce the same value. They are often used in distributed (web-based) systems for this reason.

A UUID is written as a sequence of lower-case hexadecimal digits, in several groups separated by hyphens, specifically a group of 8 digits followed by three groups of 4 digits followed by a group of 12 digits, for a total of 32 digits representing the 128 bits. An example of a UUID in this standard form is:

select uuid();
+--------------------------------------+ 
| uuid                                 | 
+======================================+
| 65950c76-a2f6-4543-660a-b849cf5f2453 |
+--------------------------------------+

The system function sys.uuid() generates a new uuid and returns the uuid.

The system function sys.isauuid(string) checks whether the string value satisfies the grammatical UUID structure and returns a boolean value.

You can use cast() or convert() to convert a valid uuid string to a uuid type.

select cast('26d7a80b-7538-4682-a49a-9d0f9676b765' as uuid);
select convert('83886744-d558-4e41-a361-a40b2765455b', uuid);

User-Defined Types

User-Defined Types giulia Mon, 03/02/2020 - 17:25

The MonetDB kernel supports creation of user defined types, e.g. geometric types. The relationship between SQL and MAL world is expressed using an external name.

CREATE TYPE [schema_name '.'] type_name EXTERNAL NAME type_identification

DROP TYPE [schema_name '.'] type_name [RESTRICT | CASCADE]

The implementation of new atomary types is best postponed until there is no other performance-wise acceptable solution. Addition of an atomary type in the kernel would be beneficial if it is also complemented with bulk-operations and when type specific optimizers could exploit their semantics.

Data Definition

Data Definition mk Mon, 02/24/2020 - 09:39

The MonetDB/SQL supports the full range of object definitions.

Table Basics

Table Basics giulia Mon, 03/02/2020 - 14:23

The CREATE TABLE statement conforms to the full SQL standard. Tables are assigned to the current schema unless the schema name is explicitly given as part of the table name. Table names should be unique amongst those mentioned within the same schema and distinct from view names.

table_def:
     CREATE TABLE [ IF NOT EXISTS ] table_name  table_content_source [ STORAGE ident string ]
   | CREATE [ LOCAL | GLOBAL ] TEMPORARY TABLE [ IF NOT EXISTS ] table_name  table_content_source
                     [ ON COMMIT { DELETE ROWS  |  PRESERVE ROWS  |  DROP } ]
   | CREATE TABLE [ IF NOT EXISTS ] table_name FROM LOADER function_ref
   | CREATE MERGE TABLE [ IF NOT EXISTS ] table_name  table_content_source
   | CREATE REPLICA TABLE [ IF NOT EXISTS ] table_name  table_content_source
   | CREATE REMOTE TABLE [ IF NOT EXISTS ] table_name  table_content_source  ON
                     'mapi:monetdb://host:port/dbname' [ WITH [ USER 'username' ] [ [ ENCRYPTED ] PASSWORD 'password'] ]
   | CREATE STREAM TABLE [ IF NOT EXISTS ] table_name  table_content_source

table_content_source:
   '(' table_element_list ')'
  |  [ '(' column_name_list ')' ] AS select_query   [ WITH DATA | WITH NO DATA ]

table_element_list:
   table_element
  |  table_element_list , table_element

The derived (temporary) tables are either filled upon creation or automatically upon use within queries.

Temporary tables are stored automatically under the schema 'tmp'. Temporary local tables are limited to the client session. The qualifiers denote the actions taken during transaction commit over a temporary table. If the ON COMMIT clause is omitted then all tuples are dropped while retaining the structure. In most cases you would use: ON COMMIT PRESERVE ROWS

For using Loader functions we support the MonetDB specific CREATE table FROM LOADER syntax.

For merging partitioned table data we support the MonetDB specific CREATE MERGE table syntax.

For replicating table data we support the MonetDB specific CREATE REPLICA table syntax.

For distributed query processing we support the MonetDB specific CREATE REMOTE table syntax.

For data stream processing we support the MonetDB specific CREATE STREAM table syntax.

Table Elements

Table Elements giulia Mon, 03/02/2020 - 14:29
table_element:
     column_name   data_type   [ column_option ... ]
   | column_name { SERIAL | BIGSERIAL }
   | column_name   WITH OPTIONS '(' column_option [ ',' ... ] ')'
   | LIKE table_name
   | [ CONSTRAINT constraint_name ] table_constraint_type
column_option:
     DEFAULT default_value_expr
   | [ CONSTRAINT constraint_name ] column_constraint_type
   | GENERATED ALWAYS AS IDENTITY [ '(' serial_parameters ')' ]
   | AUTO_INCREMENT

Identity columns take their values from a sequence generator. The MySQL (auto_increment) and PostgreSQL (serial data type) syntax are also supported. The SERIAL type is an identity column which is mapped to an INTEGER NOT NULL PRIMARY KEY. Similarly the BIGSERIAL type is an identity column which is mapped to a BIGINT NOT NULL PRIMARY KEY.

Column and Table constraints are both supported. Besides the not null constraint also unique, primary key and foreign key constraints are supported. A table can have at most one primary key definition. We currently check constraints directly on insert, update and delete, so immediate. This limitation stems from the missing triggers. The null matching on foreign keys is limited to the SIMPLE match type (null values satisfy the constraint). The full and partial match types are not supported. The referential action is currently limited to RESTRICT, i.e. an update fails if other columns have references to it.

column_constraint_type:
     NOT NULL
   | NULL
   | UNIQUE
   | PRIMARY KEY
   | REFERENCES table_name [ '(' column_name [ ',' ... ')' ] ]   [ match_options ]   [ ref_actions ]

match_options:
     MATCH { FULL | PARTIAL | SIMPLE }

ref_actions:
     { ON UPDATE | ON DELETE } { NO ACTION | CASCADE | RESTRICT | SET NULL | SET DEFAULT }

table_constraint_type:
     UNIQUE '(' column_name [ ',' ... ] ')'
   | PRIMARY KEY '(' column_name [ ',' ... ] ')'
   | FOREIGN KEY '(' column_name [ ',' ... ] ')' REFERENCES table_name [ '(' column_name [ ',' ... ] ')' ]   [ match_options ]   [ ref_actions ]

Note: The column CHECK constraint definitions are no longer accepted by the parser as of Nov2019 (11.35.3) release. They used to be accepted (for ease of migration) but were not enforced nor recorded in a data dictionary table.

A table constraint definition is not tied to a particular column, but rather to a column group. A column constraint is a notational convenience when the constraint only affects one column.

A constraint over a referenced value can be either a FULL, PARTIAL or SIMPLE (default) match. A full match requires all column values to be not null unless all are null. The simple match is more relaxed; any of the keys may be null. The partial match is considered noise for the time being.

The serial columns are a slight extension over the serial types defined earlier. In particular, we can designate RESTART option when the sequence is exhausted. This may involve a subquery over the database.

serial_parameters:
     [ START WITH nonzero-bigint ]
   | [ RESTART | RESTART WITH subquery
   | RESTART WITH nonzero-bigint ]
   | [INCREMENT BY nonzero-bigint ]
   | [MINVALUE nonzero-bigint | NOMINVALUE]
   | [MAXVALUE nonzero-bigint | NOMAXVALUE ]
   | [CACHE nonzero-bigint ]
   | [CYCLE | NOCYCLE]

Index Definitions

Index Definitions giulia Mon, 03/02/2020 - 14:47

The index statements in the SQL standard are recognized, but their implementation is different from competitive products. MonetDB/SQL interprets these statements as an advice and often freely neglects it, relying on its own decision to create and maintain indexes for fast access. Also the UNIQUE qualifier is not honored or enforced. To add a uniqueness constraint for a table or column use UNIQUE in the CREATE TABLE or ALTER TABLE statement.

index_def:
    CREATE [ UNIQUE ] INDEX ident ON [ schema name . ] table name '(' ident ','... ')'

 

MonetDB supports two special kinds of secondary indices: IMPRINTS and ORDERED INDEX, introduced in release Dec2016 (v11.25.3). These index types are experimental and have some limitations: Only 1 column can be indexed per index. Only columns of numeric data type (tinyint, smallint, integer, bigint, hugeint, float, real, double, decimal) can be indexed. The index is not maintained automatically and will become inactive when inserts, deletes or updates are done on the column data. Since creating these indices can be expensive, they are only used when explicitly created. They are useful for large static (or read only) tables with numeric columns.

index_def:
    CREATE IMPRINTS INDEX ident ON [ schema name . ] table name '(' numeric column name ')'

Creates a new imprints index on one numerical column of a specific table. This index stores meta data (min, max, nulls) on segments of column data. It can speed-up queries which have a column selection condition (such as: AGE IS NULL OR AGE >= 18). It is a very compact index.

See also: Documentation/Manuals/MonetDB/Kernel/Modules/Imprints

index_def:
    CREATE ORDERED INDEX ident ON [ schema name . ] table name '(' numeric column name ')'

Creates a new ordered index on one numerical column of a specific table. This index is a special single column index where the numeric values are stored in ascending order. It can speed-up queries which have a column selection condition (such as: AGE BETWEEN 18 AND 30) or need sorting such as when used in a GROUP BY clause.

Alter Statements

Alter Statements giulia Mon, 03/02/2020 - 14:49

The ALTER statement can be used to change the TABLE properties, it requires authorization to do so. Addition of a column follows the same syntax and functionality as the CREATE TABLE statement. If the table is used in a foreign key relationship the actions to be take can be further specified as RESTRICT to forbid it if there are external references to the column values. The CASCADE nullifies all references to the values being removed.

alter_statement:

         ALTER TABLE [IF EXISTS] qname ADD [COLUMN] column_name { data_type [ column_option ... ] | SERIAL | BIGSERIAL }
      |  ALTER TABLE [IF EXISTS] qname ALTER [COLUMN] column_name SET [NOT] NULL
      |  ALTER TABLE [IF EXISTS] qname ALTER [COLUMN] column_name SET DEFAULT value
      |  ALTER TABLE [IF EXISTS] qname ALTER [COLUMN] column_name DROP DEFAULT
      |  ALTER TABLE [IF EXISTS] qname ALTER [COLUMN] column_name SET STORAGE {string | NULL}
      |  ALTER TABLE [IF EXISTS] qname DROP [COLUMN] column_name [RESTRICT | CASCADE]
      |  ALTER TABLE [IF EXISTS] qname ADD [ CONSTRAINT ident ]  table_constraint_type
      |  ALTER TABLE [IF EXISTS] qname DROP CONSTRAINT ident [RESTRICT | CASCADE]

As of Apr2019 release (11.33.3) you can also change the name of a column, the name of the table itself or the schema the table is belonging to, provided there are no objects which depend on the table or column name.

alter_rename_statement:

         ALTER TABLE [IF EXISTS] qname RENAME [COLUMN] column_name TO new_column_name
      |  ALTER TABLE [IF EXISTS] qname RENAME TO new_table_name
      |  ALTER TABLE [IF EXISTS] qname SET SCHEMA schema_name

An individual table can be protected using the READ ONLY and INSERT ONLY mode. All attempts to update are flagged as a SQL error. The reverse operation is ALTER TABLE qname READ WRITE, which makes the table accessible for all update operations.

alter_access_statement:

         ALTER TABLE [IF EXISTS] qname SET READ ONLY
      |  ALTER TABLE [IF EXISTS] qname SET INSERT ONLY
      |  ALTER TABLE [IF EXISTS] qname SET READ WRITE

The ALTER statement has been extended for merge tables with ADD TABLE and DROP TABLE options to attach and detach a partition table to/from a MERGE TABLE and with SET TABLE p AS PARTITION to change the partitioning specification of the partition table.

alter_statement_for_merge_tables:

        ALTER TABLE [IF EXISTS] qname ADD TABLE qname [ AS PARTITION partition_spec ]
      | ALTER TABLE [IF EXISTS] qname SET TABLE qname AS PARTITION partition_spec
      | ALTER TABLE [IF EXISTS] qname DROP TABLE qname [ RESTRICT | CASCADE ]

View Definitions

View Definitions giulia Mon, 03/02/2020 - 14:52

Regular SQL view definitions are supported.

view_def:
    CREATE [ OR REPLACE ] VIEW [ schema_name '.' ] view_name
           [ '(' column_name [ ',' column_name ] [ ',' ...] ')' ]
           AS select_query_without_limit
           [ WITH CHECK OPTION ]

A view can be seen as a stored SELECT query with a unique name. It can be used in queries at all the places where you can normally use a table name. Views are useful to reduce user query complexity as they can predefine joins, computations, derivations, aggregations, selections, conditions, ordering, etc in the view, so the user doesn't have to define them in the queries again and again. They are also very useful to standardise and simplify reporting.

Note: The "WITH CHECK OPTION" is accepted for SQL compliance but has no effect.

Note: Views do not contain or store data, so do not require disk space.

Limitations: Recursive views and reference-able views are not (yet) supported.
Updatable views are not supported, so it's not possible to insert, update, merge, delete or truncate data from a view.

Tip: to find out which user views are defined in the database run query:
  SELECT * FROM sys.tables WHERE type IN (SELECT table_type_id FROM sys.table_types WHERE table_type_name = 'VIEW')

Schema Definitions

Schema Definitions mk Mon, 03/02/2020 - 14:56

A SCHEMA is a container for tables, views, indices, triggers, functions and procedures.  Schema access and modification is strictly controlled using the user role and authorisation scheme.

Every SCHEMA belongs to a database. One reason for organizing the logically separated data in schemas rather than databases is that in this way there will still be just one MonetDB process running.

schema:
   CREATE SCHEMA schema_name_clause
      [ DEFAULT CHARACTER SET ident ]
      [ PATH schema_name ','... ]
      [ schema_element ... ]

The DEFAULT CHARACTER SET and PATH options are here for compatibility reasons with the the SQL standard, however they are not yet implemented. The default character set is UTF-8.

schema_name_clause:
   ident | [ ident ] AUTHORIZATION ident

The AUTHORIZATION option allows specifying the name of the user or the role that will own the schema. If omitted, the user who has executed the query will be the owner. The owner of the schema is allowed to create, alter and drop tables. With the AUTHORIZATION option, an explicit name for the schema is optional. If omitted, the schema automatically gets the name of the authorised user/role as its name.

Notes on schema creation:

  1. The ownership of a schema can be assigned to only one user/role, and it can not be modified after its creation. Therefore, to share the ownership of a schema, one must assign the ownership of a schema to a role at the creation of the schema. Subsequently, the role can be granted to multiple users to own the schema. 
  2. Only the 'monetdb' user and the 'sysadmin' role can create a new schema. Therefore, to allow other users other to create schema, the 'monetdb' user should assign the 'sysadmin' role to the intended users.

schema_element:
   grant | revoke | create_statement | drop_statement | alter_statement

schema:
  | DROP SCHEMA qname drop_action

drop_action:
   RESTRICT | CASCADE

The drop_action option is supported for compatibility with the SQL standard, however it is not implemented yet. Currently it runs with the CASCADE option, meaning that once a schema is dropped every object inside the schema is dropped as well, such as tables and functions. However objects that are dependent on the schema, such as users will not automatically be dropped and will stop the schema from being dropped as well. One can either ALTER the user and give it a different default schema, or to simply drop the user if it is no longer needed.

schema:
  | SET SCHEMA ident

When opening the database, by default the “sys” schema is set. Another automatically created schema is “tmp”, used for temporally local tables. It is not possible for a user to access the “tmp” schema of another user.

One can create a new schema using the CREATE command and change to it, by using the SET command. When creating a table without specifying the schema, the table will be created in the schema that is currently in usage. In order to create a table in a different schema, use the schema name as a prefix.

Privileges

Privileges giulia Mon, 03/02/2020 - 15:56

Under construction

Grant and Revoke

Grant and Revoke giulia Mon, 03/02/2020 - 16:01

Every user takes a role with respect to the privileges on a database scheme. A user can be granted multiple roles and a role can be granted to multiple users.

By the creation of a new user, a role with the same name is automatically created, which is also the default role the user takes. However, only the role can be granted to other users/roles.

Privileges cannot be inherited via another role. For example Alice can only SELECT, Bob can only INSERT, and Alice is granted to Bob. If Bob in turn is granted to Charlie, then Charlie can only INSERT, but not SELECT.

 

role_statement:
    CREATE ROLE role_name [ WITH ADMIN grantor ]
  | DROP ROLE role_name [ WITH ADMIN grantor ]
  | SET ROLE role_name
  | GRANT role_name TO user_name
 

grantor:
    CURRENT_USER | CURRENT_ROLE

A user can only assume a role after he/she has logged in. The user assumes a role by using the SET ROLE command. Then the user can use all the right given to that role. Two roles cannot be assumed at once.

Roles

Roles mk Mon, 03/02/2020 - 15:58

Every user takes a role with respect to the privileges on a database scheme. A user can be granted multiple roles and a role can be granted to multiple users.

By the creation of a new user, a role with the same name is automatically created, which is also the default role the user takes. However, only the role can be granted to other users/roles.

Privileges cannot be inherited via another role. For example Alice can only SELECT, Bob can only INSERT, and Alice is granted to Bob. If Bob in turn is granted to Charlie, then Charlie can only INSERT, but not SELECT.

 

role_statement:
    CREATE ROLE role_name [ WITH ADMIN grantor ]
  | DROP ROLE role_name [ WITH ADMIN grantor ]
  | SET ROLE role_name
  | GRANT role_name TO user_name
 

grantor:
    CURRENT_USER | CURRENT_ROLE

A user can only assume a role after he/she has logged in. The user assumes a role by using the SET ROLE command. Then the user can use all the right given to that role. Two roles cannot be assumed at once.

Users

Users giulia Mon, 03/02/2020 - 15:57

All interactions with the database server are attributed to a particular user known to the system. Therefore, a user login name should be created upfront, much like you would have to on any computer system, Your user name determines the schemas you have permission to access and your role in managing them. Much like your computers file system, it is essential to restrict access to your valuable database content via grant and revoke privileges and/or roles.

To create, modify/alter or drop a user in the database you can use the following syntax:

user_stmt:
    CREATE USER user_name WITH [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'user_password' NAME 'full name' SCHEMA schema_name
   | ALTER USER user_name RENAME TO new_user_name
   | ALTER USER SET [ENCRYPTED | UNENCRYPTED] PASSWORD 'new_password' USING OLD PASSWORD 'user_password'
   | ALTER USER user_name WITH [ENCRYPTED | UNENCRYPTED] PASSWORD 'user_password'
   | ALTER USER user_name [ WITH [ENCRYPTED | UNENCRYPTED] PASSWORD 'user_password' ] SET SCHEMA schema_name
   | DROP USER user_name

The user name, password and default schema can be changed using the different alter user statements.

Note that for a user (including the administrator user: monetdb) to change its own password, the ALTER USER SET form has to be used where the current password is given. When the ENCRYPTED clause is given for PASSWORD, a hashed form of the password has to be supplied. The type of this hash depends on how the server was configured, but defaults to SHA512.

You may view the list of users in the database via query:

    SELECT * FROM sys.users;

The user name, password and database name are needed to authenticate yourself against tools such as mclient and all APIs, to gain access to the particular database. Once connected the current schema is the one as specified for the user.

Every user plays a certain role. So, when a user is created, a role with the same name is automatically created as well. The difference is that, the user has all corresponding permissions, e.g., the user has all permissions on the table it creates. However, the role with the same name remains empty until permissions have been granted to it explicitly.

Drop Statement

Drop Statement giulia Mon, 03/02/2020 - 14:59

The objects created can be removed provided the authorization permissions are set. Note: If you do not specify the full signature of the function the DROP query will successfully execute if there is only one function with this name, if not the query is aborted. The DROP ALL is used to drop all the functions with the name specified in the query.

drop_statement:
     DROP TABLE qname [ RESTRICT | CASCADE ]
   | DROP VIEW qname [ RESTRICT | CASCADE ]
   | DROP [ FILTER ] FUNCTION qname [ '(' [ data_type ','... ] ')' ] [ RESTRICT | CASCADE ]
   | DROP PROCEDURE qname [ '(' [ data_type ','... ] ')' ] [ RESTRICT | CASCADE ]
   | DROP AGGREGATE qname [ '(' [ data_type ','... ] ')' ] [ RESTRICT | CASCADE ]
   | DROP LOADER qname [ '(' [ data_type ','... ] ')' ]
   | DROP ALL [ FILTER ] FUNCTION qname [ RESTRICT | CASCADE ]
   | DROP ALL PROCEDURE qname [ RESTRICT | CASCADE ]
   | DROP ALL AGGREGATE qname [ RESTRICT | CASCADE ]
   | DROP INDEX qname
   | DROP TRIGGER qname
   | DROP SEQUENCE qname
   | DROP ROLE qname
   | DROP USER ident
   | DROP TYPE qname
   | DROP SCHEMA name [ RESTRICT | CASCADE ]

 

Data Manipulation

Data Manipulation giulia Mon, 02/24/2020 - 09:41

Under construction

Bulk I/O

Bulk I/O giulia Mon, 03/02/2020 - 17:30

The COPY INTO command enables fast insertion of multiple tuples from an text file or standard input. Each tuple  in the input file is terminated by a record-separator (default '\n') and fields are separated by the field separator (default '|'). The field values should adhere to the  syntax for value literals. Alternative separators should adhere to the SQL lexical syntax for string values. A different NULL value representation can be passed using the NULL as null_string option. Furthermore, the fields are optionally enclosed with a user defined quote character. The text file should use UTF-8 encoding if specified as file_name, and the same encoding as mclient is using if read using FROM STDIN.

It is strongly advised to announce the maximum number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of table space which may involve potentially expensive copying. A portion of the input file can be skipped using the offset feature.  The offset value specifies the record at which loading should commence, the first record having offset 1.

As of the Aug2018-SP2 release, if using FROM STDIN and the number of records is not specified, reading the input stops at an empty line. This means that if a one column table is being read, there may be confusion as to whether an empty line is an empty (string) value or a NULL (if NULL AS '' is specified). The end of input takes precedence here.

As of the Apr2019 release (11.33.3), the option ON CLIENT or ON SERVER can be used. This allows the client to read/write files from/to the client instead of doing it in the server. This has the advantage that COPY INTO is then no longer restricted to only the "super user" monetdb, nor only to absolute file names. The syntax to have the server communicate with the client for file content is COPY INTO table FROM file ON CLIENT ...; and COPY query INTO file ON CLIENT ...;. This also works for COPY BINARY INTO. There is also the possibility to specify that files are to be read/written by the server by using ON SERVER. This is also the default when ON CLIENT or ON SERVER is not specified. In that case the file must be accessible by the server. Therefore, it must reside on or be accessible to the database server machine and they must be identified with an absolute path name.

The STDIN file designator reads data streaming from the client application. An empty record determines the end of sequence.

COPY   {  [ int_val [ OFFSET int_val ] RECORDS ]  |  [ int_val RECORDS ] OFFSET int_val  }
      INTO table_name
      FROM ['('] file_name ',' ... [')']
      [ ON { CLIENT | SERVER } ]
      [ [ USING ] DELIMITERS field_separator [',' record_separator [',' string_quote ] ] ]
      [ NULL AS null_string ] [ LOCKED ] [ BEST EFFORT ] [ NO CONSTRAINT ]
      [ FWF '(' pos [ ',' pos [ ... ] ] ')' ]

COPY   {  [ int_val [ OFFSET int_val ] RECORDS ]  |  [ int_val RECORDS ] OFFSET int_val  }
      INTO table_name
      FROM STDIN
      [ [ USING ] DELIMITERS field_separator [',' record_separator [',' string_quote ] ] ]
      [ NULL AS null_string ] [ LOCKED ] [ BEST EFFORT ] [ NO CONSTRAINT ]

The input syntax should comply to the following grammar: [ [ [quote] [[escape]char] * [quote]] feldspar] * record separator. Quote characters in quoted fields may be escaped with a backslash. Field and record separators can be embedded in quoted fields.

LOCKED mode

In many bulk loading situations, the original file can be saved as a backup or recreated for disaster handling. This reliefs the database system from having to prepare for recovery as well and to safe significant storage space. The LOCKED qualifier can be used in this situation (and in single user mode!) to skip the logging operation normally performed.

WARNING It is advised to add integrity constraints to the table after the file has been loaded. The ALTER statements perform bulk integrity checking and perform these checks often more efficiently.

For more see the CSV bulk load recipe.

Copy into File

The COPY INTO command with a file name argument allows for fast dumping of a result set into an ASCII file. The file must be accessible by the server and a full path name may be required. The file STDOUT can be used to direct the result to the primary output channel.

The delimiters and NULL AS arguments provide control over the layout required.  

COPY data_query
      INTO file_name
      [ ON { CLIENT | SERVER } ]
      [ [ USING ] DELIMITERS field_separator [',' record_separator [',' string_quote ] ] ]
      [ NULL AS null_string ]

COPY data_query
      INTO STDOUT
      [ [ USING ] DELIMITERS field_separator [',' record_separator [',' string_quote ] ] ]
      [ NULL AS null_string ]

For both the input and output versions of the COPY INTO commands one can specify a file name ending with '.gz' or '.bz2' or '.xz' or '.lz4' to use the appropriate compression library (if available).

For more see the Exporting bulk data recipe.

Copying binary files

Migration of tables between MonetDB/SQL instances can be speed up using the COPY BINARY INTO format.

For more see the Binary bulk data load recipe.

Deleting Data

Deleting Data giulia Mon, 03/02/2020 - 16:50

[ WITH cte_alias AS SELECT_query [, cte_alias AS SELECT_query , ...] ]
DELETE FROM [ schema_name '.' ] table_name  [ [AS] table_alias ]
[ WHERE search_condition ]

You must have DELETE privilege for the table. The creator/owner of the table will have DELETE privilege automatically.
The "WITH cte_alias AS SELECT_query" option as well table alias is supported from release Apr2019 (11.33.3) onwards.

MonetDB/SQL does not support data deletions on views.

 

To quickly delete all rows in a table use TRUNCATE TABLE.

TRUNCATE [ TABLE ] [ schema_name '.' ] table_name
[ CONTINUE IDENTITY | RESTART IDENTITY ]
[ RESTRICT | CASCADE ]

You must have TRUNCATE privilege for the table. The creator/owner of the table will have TRUNCATE privilege automatically.

A 'CONTINUE IDENTITY' or 'RESTART IDENTITY' clause can be passed to restart or not an identity sequence if present in the table. Default is to CONTINUE IDENTITY sequence numbering.
The 'CASCADE' option instructs to truncate referencing table(s) also if the referencing table(s) have foreign key references to this table. The default behavior is 'RESTRICT'.

Note: it is possible to use TRUNCATE statements in a transaction and thus to roll back the effects of a truncate.

MonetDB/SQL does not support truncations of data on views.

Inserting Data

Inserting Data giulia Mon, 03/02/2020 - 16:48

A table can be populated using an insert statement. It takes a table name and a value expression list. The expression result types should align with the columns in the table definition. Otherwise the column-value association should be explicitly defined using a column name list. Multiple rows can be inserted in a single statement.

The result of a query can be bulk inserted into a table, provided both source and destination are type compatible. Insertion of a table into itself effectively doubles its content, provided non of the table constraints is violated.

[ WITH cte_alias AS SELECT_query [, cte_alias AS SELECT_query , ...] ]
INSERT INTO [ schema_name '.' ] table_name [ '(' column_name [ ',' ... ] ')' ]
{ VALUES '('value_expression [ ',' ... ' ] )' [ ',' ... ] | select_query }

value_expression : { scalar_exp | DEFAULT | NULL }

You must have INSERT privilege for the table. The creator/owner of the table will have INSERT privilege automatically.
The "WITH cte_alias AS SELECT_query" option is supported from release Apr2019 (11.33.3) onwards.

MonetDB/SQL does not support data insertions on views.

Updating Data

Updating Data giulia Mon, 03/02/2020 - 16:49

The update statement syntax follows the SQL standard, but its semantics for bulk updates on keys may be slightly different than expected from other systems. In particular, the update implementation ensures that you can freely update any column without the danger of run-away values. MonetDB/SQL doest not support updates through views.

[ WITH cte_alias AS SELECT_query [, cte_alias AS SELECT_query , ...] ]
UPDATE [ schema_name '.' ] table_name  [ [AS] table_alias ]
SET assignment [ ',' assignment ... ]
[ WHERE search_condition ]

assignment : column_name '=' { scalar_exp  | search_condition | NULL | DEFAULT }

You must have UPDATE privilege for the table or column(s). The creator/owner of the table will have UPDATE privilege automatically.
The "WITH cte_alias AS SELECT_query" option as well table alias is supported from release Apr2019 (11.33.3) onwards.

MonetDB/SQL does not support data updates on views.

Queries

Queries giulia Mon, 02/24/2020 - 09:47

Under construction

Predicates

Predicates giulia Mon, 03/02/2020 - 17:00

The pattern matching operations are used over predicates. It is possible to compare them, see the differences between them, if a predicate is a sub-predicate of another, etc. The following syntax description cover all the operations supported by MonetDB.

predicate:
       comparison_predicate
     | between_predicate
     | like_predicate
     | test_for_null
     | in_predicate
     | all_or_any_predicate
     | existence_test
     | '(' predicate ')'

comparison_predicate:
       pred_expr COMPARISON pred_expr
     | pred_expr = pred_expr

between_predicate:
     pred_expr [NOT] BETWEEN [SYMMETRIC | ASYMMETRIC] pred_expr AND pred_expr

like_predicate:
     pred_expr [NOT] { LIKE | ILIKE } atom_expr

test_for_null:
     column_name IS [NOT] NULL

in_predicate:
     pred_expr [NOT] IN '(' element_name ',' ... ')'

all_or_any_predicate :
     pred_expr COMPARISON { ANY | ALL | SOME } subquery

existence_test:
     [NOT] EXISTS subquery

pred_expr:
       scalar_expr
     | predicate

atom_expr:
       atom [ ESCAPE string ]
     | '(' atom ')' [ ESCAPE string ]
     | '(' atom_expr ')'
     | '?'

string_funcs:
      scalar_expr '||' scalar_expr
    | SUBSTRING '(' scalar_expr FROM scalar_expr [ FOR scalar_expr ] ')'
    | SUBSTRING '(' scalar_expr ',' scalar_expr [ ',' scalar_expr ] ')'
    | SUBSTRING '(' scalar_expr int_value scalar_expr ')'
   

Select Queries

Select Queries giulia Mon, 03/02/2020 - 16:57

The SQL framework for table expressions is based on the select-from-where construct.

select_query:
    [ with_clause ]
    SELECT [ ALL | DISTINCT ]   ( '*' | column_expr [',' ...] )
    [ INTO column_expr [',' ...] ]
    [ FROM table_reference [',' ... ] ]
    [ WINDOW window_definition_list ]
    [ WHERE search_condition ]
    [ GROUP BY column_expr [',' ...] ]
    [ HAVING search_condition ]
    [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] [ CORRESPONDING [ BY '(' column_ref_commalist ')' ] ] select_query ]
    [ ORDER BY ordering_spec [',' ...] ]
    [ LIMIT posint ]
    [ OFFSET posint ]
    [ SAMPLE posint [ SEED integer ] ]

Table Expressions

Table Expressions giulia Mon, 03/02/2020 - 16:56

When a query is created a table can be referenced in different ways, sometimes by its name or by a select query or a join result. Here is the syntax to refer to a table.

table_reference:
      simple_table
    | joined_table [ [AS] alias [ '(' column_name [',' ...] ')' ] ]
    | '(' select_query ')' [ [AS] alias [ '(' column_name [',' ...] ')' ] ]
    | '(' VALUES '(' tuple ')' [, '(' tuple ')' [, '('...')' ] ] ')' [ [AS] alias [ '(' column_name [',' ...] ')' ] ]

joined_table:
      '(' joined_table ')'
    | table_reference CROSS JOIN table_reference
    | table_reference NATURAL [ join_type ] JOIN table_reference
    | table_reference [ join_type ] JOIN table_reference join_spec
    | table_reference UNIONJOIN table_reference join_spec

join_type:
      INNER
    | { LEFT | RIGHT | FULL } [ OUTER ]

join_spec:
      ON search_condition
    | USING '(' colunm_name [',' ...] ')'

When no join_type is specified, INNER is assumed.

WITH Queries

WITH Queries giulia Mon, 03/02/2020 - 16:58

The WITH clause prefix (aka Common Table Expressions (CTE)) provides the mechanism to introduce temporary in-line view definitions:

with_clause:
    WITH cte_element [',' cte_element [',' ...] ]

cte_element:
    query_alias [ '(' column_name [',' ...] ')' ] AS '(' select_query ')'

Functions and Operators

Functions and Operators giulia Mon, 02/24/2020 - 09:52

MonetDB provides the following groups of built-in SQL functions and operators.

Aggregate Functions

Aggregate Functions giulia Tue, 03/03/2020 - 14:39

Aggregate Functions operate on a column or column expression of a table or group (as created via GROUP BY clause).

For the Examples below we use a simple table t(i int, c varchar(8)) with contents: INSERT into t(i,c) VALUES
(1, 'one'),
(2, 'two'),
(3, 'tree'),
(4, 'four');

to create the shown Results.

Standard Aggregate Functions

Function Return type Description Example Result
avg(numeric_expr) double compute the average of the non-NULL integer/float/decimal/month_interval values in the column or group avg(i) 2.5
count(*) bigint count the number of rows in the column or group including those who may have NULL values count(*) 4
count(col_expr) bigint count the number of non-NULL values in the column or group count(i) 4
count(distinct col_expr) bigint count the number of distinct non-NULL values in the column or group count(distinct i%2) 2
max(col_expr) type of col_expr return the maximum non-NULL value in the column or group max(c) two
min(col_expr) type of col_expr return the minimum non-NULL value in the column or group min(c) four
prod(numeric_expr) bigint or hugeint or double compute the product of the non-NULL numeric values in the column or group prod(i) 24
prod(distinct numeric_expr) bigint or hugeint or double compute the product of the distinct non-NULL numeric values in the column or group prod(distinct 1 + i%2) 2
sum(integer_expr) bigint or hugeint compute the sum of all non-NULL integer values in the column or group sum(i) 10
sum(distinct integer_expr) bigint or hugeint compute the sum of all distinct non-NULL integer values in the column or group sum(distinct i/2) 3
sum(decimal_expr) decimal compute the sum of all non-NULL decimal values in the column or group sum(cast(i as decimal)) 10.000
sum(float_expr) real or double compute the sum of all non-NULL floating point values in the column or group sum(cast(i as real)) 10.0
sum(month_interval_expr) month_interval compute the sum of all non-NULL interval month values in the column or group sum(cast(i as interval month)) 10
sum(sec_interval_expr) sec_interval compute the sum of all non-NULL interval second values in the column or group sum(cast(i as interval second)) 10.000

 

Aggregate Functions returning a concatenated CLOB string

Note: You must include the sys. or json. prefix for these aggregate functions (see examples) in order to work properly.

Function Return type Description Example Result
sys.group_concat(col_expr) clob concatenate non-NULL values in a column or group into one large string with values separated by a comma (the default separator) sys.group_concat(c) one,two,tree,four
sys.group_concat(col_expr, separator_str) clob concatenate non-NULL values in a column or group into one large string with values separated by separator_str sys.group_concat(i, ' | ') 1 | 2 | 3 | 4
json.tojsonarray(clob) clob convert string values in the column or group into a json array string json.tojsonarray(c) [ "one", "two", "tree", "four" ]
json.tojsonarray(double) clob convert numeric values in the column or group into a json array string json.tojsonarray(i) [ "1", "2", "3", "4" ]

 

Aggregate Functions for Statistics

Note: You must include the sys. prefix for below aggregate functions (see examples) in order to work properly.

Statistic Function Return type Description Example Result
sys.corr(numeric_a, numeric_b) double return the correlation coefficient of the non-NULL values in the numeric (integer/decimal/real/double) column or group numeric_a versus numeric_b sys.corr(i, i+2) 0.7500000000000001
sys.median(col_expr) type of col_expr return the statistic median of the non-NULL values in the numeric (integer/decimal/real/double) or temporal (date/time/timestamp/interval) column or group sys.median(i) 2
sys.median_avg(numeric_expr) double return the statistic median average of the non-NULL values in the numeric (integer/decimal/real/double) column or group sys.median_avg(i) 2.5
sys.quantile(col_expr, percentile) type of col_expr return the statistic quantile of the non-NULL values in the numeric (integer/decimal/real/double) or temporal (date/time/timestamp/interval) column or group. The percentile argument must be between 0.0 and 1.0 sys.quantile(i, 0.7) 3
sys.quantile_avg(numeric_expr, percentile) double return the statistic quantile average of the non-NULL values in the numeric (integer/decimal/real/double) column or group. The percentile argument must be between 0.0 and 1.0 sys.quantile_avg(i, 0.6) 2.8
sys.stddev_pop(numeric_expr) double return the population standard deviation of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square root of var_pop(). sys.stddev_pop(i) 1.118033988749895
sys.stddev_samp(numeric_expr) double return the sample standard deviation of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square root of var_samp(). sys.stddev_samp(i) 1.2909944487358056
sys.var_pop(numeric_expr) double return the population standard variance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square of stddev_pop(). sys.var_pop(i) 1.25
sys.var_samp(numeric_expr) double return the sample standard variance of numeric_expr of the non-NULL values in the numeric (integer/decimal/real/double/interval) column or group. Same as the square of stddev_samp(). sys.var_samp(i) 1.6666666666666667

 

For more info see: Statistic Functions

Aggregation Into Single String

Aggregation Into Single String giulia Mon, 03/02/2020 - 17:32

In the Aug2018 release of MonetDB, we added a new aggregation function "sys.group_concat", which aggregates an input column into a single string as output. We provide two versions of this aggregate: sys.group_concat(string) and sys.group_concat(string, string). In both versions, the first parameter corresponds to the input string column to be concatenated. In the former the default delimiter is the ',' character, in the latter the second parameter indicates the separator to be used. If either a group in has a NULL value, or the delimiter is NULL, the output will be NULL.

sql>create table demo (a int, b clob);
operation successful
sql>insert into demo values (1, 'chair'), (1, 'desk'), (2, 'room'), (1, 'decoration'), (2, 'window'), (2, 'sofa');
6 affected rows
sql>select '[' || sys.group_concat(a) || ']' from demo;
+---------------+
| L3            |
+===============+
| [1,1,2,1,2,2] |
+---------------+
1 tuple
sql>select a, sys.group_concat(b) from demo group by a;
+------+-----------------------+
| a    | L4                    |
+======+=======================+
| 1    | chair,desk,decoration |
| 2    | room,window,sofa      |
+------+-----------------------+
2 tuples
sql>select a, sys.group_concat(b, '|') from demo group by a;
+------+-----------------------+
| a    | L5                    |
+======+=======================+
|    1 | chair|desk|decoration |
|    2 | room|window|sofa      |
+------+-----------------------+
2 tuples
sql>insert into demo values (3, 'car'), (3, NULL);
2 affected rows
sql>select '[' || sys.group_concat(b, '-') || ']' from demo group by a;
+-------------------------+
| L6                      |
+=========================+
| [chair-desk-decoration] |
| [room-window-sofa]      |
| null                    |
+-------------------------+
3 tuples

Binary String Functions

Binary String Functions giulia Tue, 05/19/2020 - 18:18

These apply to MonetDB SQL binary string type: blob.
To use binary string data in scripts you can use the hex cast notation: x'0012FF' to represent 3 bytes: 0, 18 (16+2) and 255. The blob cast notation: blob'0012FF' is also allowed.

Binary String Functios

Function Return type Description Example Result
length(blob) int number of bytes in binary string blob length(x'0012FF') 3
octet_length(blob) int number of bytes in binary string blob octet_length(x'0012FF') 3

 

In MonetDB binary string data is stored as hex character strings, using two hex characters per byte. Therefore some of the character string functions can also be applied to binary string data, for example

SELECT x'1a2b3c4d5e';
SELECT blob '1a2b3c4d5e';

CREATE TABLE test_blob(b blob NOT NULL, b52 blob(52));
INSERT INTO test_blob(b,b52) VALUES (x'123456', blob'78901234');
INSERT INTO test_blob(b,b52) VALUES ('123456abcdef', '78901234abcdef'); -- implicit conversion from char to blob is done

SELECT length(b) as byte_length, octet_length(b) as octet_length, b from test_blob where b = x'123456';
-- returns: 3 3 123456
SELECT length(b) as byte_length, char_length(b) as char_length, b from test_blob where b = x'123456';
-- returns: 3 6 123456

-- example of how to extract part of a binary string
SELECT b52, substring(b52, 3, 4) as part_as_Clob from test_blob;
SELECT b52, cast(substring(b52, 3, 4) as blob) as part_as_Blob from test_blob;

-- example of how to concatenate two binary strings
SELECT b||b52 as b_concat_result_Clob from test_blob;
SELECT convert(b||b52 , blob) as b_concat_result_Blob from test_blob;

-- comparisons are allowed, except for LIKE and ILIKE pattern matches on blobs
SELECT * from test_blob where b < b52;
SELECT * from test_blob where b like '1234%'; -- NOT allowed, generates an ERROR
SELECT * from test_blob where substring(b,0,9) like '1234%'; -- This is allowed as substring() returns a Clob

Comparison Functions and Operators

Comparison Functions and Operators giulia Tue, 03/03/2020 - 11:46

Comparison operators, predicates and functions are available for all built-in data types.

Comparison Operators

All comparison operators are binary operators that return values of type boolean. The usual comparison operators are available:

Operator Description Example Result
< less than 2 < 5 true
> greater than 2 > 5 false
<= less than or equal to 2 <= 5 true
>= greater than or equal to 2 >= 5 false
= equal 2 = 5 false
<> not equal 2 <> 5 true


 


 


 

 

Comparison Predicates

The usual comparison predicates are available:

Predicate Description Example Result
val BETWEEN lb AND ub is val between lb and ub.
equivalent to: val >= lb AND val <= ub
'db' between 'abc' and 'db' true
val NOT BETWEEN lb AND ub is val not between lb and ub.
equivalent to: val < lb OR val > ub
'db' not between 'abc' and 'db' false
val BETWEEN SYMMETRIC lb AND ub is val between lb and ub after sorting lb and ub 'db' between symmetric 'abc' and 'db' true
val NOT BETWEEN SYMMETRIC lb AND ub is val not between lb and ub after sorting lb and ub 'db' not between symmetric 'abc' and 'db' false
val IS NULL is val NULL 'db' is null false
val IS NOT NULL is val not NULL 'db' is not null true


 


 


 


 

 

Comparison Functions

Function Return type Description Example Result
"<"(arg1, arg2) boolean is arg1 less than arg2 "<"('aa', 'ab') true
">"(arg1, arg2) boolean is arg1 greater than arg2 ">"('aa', 'ab') false
"<="(arg1, arg2) boolean is arg1 less than or equal to arg2 "<="('aa', 'ab') true
">="(arg1, arg2) boolean is arg1 greater than or equal to arg2 ">="('aa', 'ab') false
"="(arg1, arg2) boolean is arg1 equal to arg2 "="('aa', 'ab') false
"<>"(arg1, arg2) boolean is arg1 not equal to arg2 "<>"('aa', 'ab') true
"between"(arg_1 any, arg_2 any, arg_3 any, boolean, boolean, boolean, boolean, boolean) boolean is arg1 between arg2 and arg3 "between"('ab', 'aa', 'ac', false, false, false, false, false) true
ifthenelse(boolean arg1, arg2, arg3) same as arg2 or arg3 when arg1 is true then return arg2 else arg3 ifthenelse(('a' = 'b'), 1, 2) 2
isnull(arg1) boolean is arg1 null isnull('aa') false

Cube and Rollup

Cube and Rollup giulia Mon, 03/16/2020 - 18:10

Under contruction

Date/Time Functions and Operators

Date/Time Functions and Operators giulia Tue, 03/03/2020 - 12:12

These apply to MonetDB SQL types: date, time, time with time zone, timestamp, and timestamp with time zone.
Some functions apply to MonetDB SQL type: second interval and month interval.

Date/Time Operators

Operator Description Example Result
  + add seconds to a date or time or timestamp date '2020-09-28' + (7 * 24 * 60 * 60) date '2020-10-05'
  - subtract seconds from a date or time date '2020-09-28' - (7 * 24 * 60 * 60) date '2020-09-21'
  - subtract months from a date or timestamp date '2020-09-28' - month_interval '7' date '2020-02-28'
  - subtract two dates date '2020-09-28' - date '2020-02-28' interval '7 months'
  - subtract two times time '14:35:45' - time '02:12:24' interval '12 hours 23 minutes 31 seconds'

 

Date/Time Functions

NOTE: Below table is UNDER CONSTRUCTION

Function Return type Description Example Result
century(date) int extract century from a date or timestamp or timestamptz century(date '2020-03-22') 21
curdate() date get current date curdate() date '2020-03-22'
current_date() date get current date current_date() date '2020-03-22'
current_time() timetz get current time with time zone current_time() timetz '13:16:57.734639+01:00'
current_timestamp() timestamptz get current timestamp (date and time and fractions) with time zone current_timestamp() timestamptz '2020-03-22 13:16:57.734639+01:00'
curtime() timetz get current time with time zone current_time() timetz '13:16:57.734639+01:00'
sys.date_to_str(date, format_str) clob convert date value to a string using format_str sys.date_to_str(date '2020-03-22', '%D %d.%m.%Y'); 03/22/20 22.03.2020
date_trunc(clob, timestamp) timestamp ??? date_trunc(s, ts)  
"day"(date) int extract month day nr (1..31) of a date or timestamp or timestamptz. same as: dayofmonth(date) "day"(date '2020-03-22') 22
"day"(sec_interval) bigint compute nr of days and truncate it to an integer "day"(3.89 * (24 * 60 * 60)) 3
dayofmonth(date) int extract month day nr (1..31) of a date or timestamp or timestamptz. same as: "day"(date) dayofmonth(date '2020-03-22') 22
dayofweek(date) int extract week day nr (1..7) of a date or timestamp or timestamptz dayofweek(date '2020-03-22') 7
dayofyear(date) int extract year day nr (1..366) of a date or timestamp or timestamptz dayofyear(date '2020-03-22') 82
decade(date) int extract decade nr of a date or timestamp or timestamptz decade(date '2027-03-22') 202
sys.epoch(bigint nr_secs) timestamp add nr of seconds to timestamp '1970-01-01 00:00:00.0' sys.epoch(1234567890) 2009-02-13 23:31:30.0
sys.epoch(timestamp) int convert timestamp (or timestamptz) to nr of seconds since '1970-01-01 00:00:00.0' sys.epoch(timestamp '2009-02-13 23:31:30.0') 1234567890
extract(field from dt_value) int extract a specific field from a date or time or timestamp.
supported keywords for field are: CENTURY, DECADE, YEAR, QUARTER, MONTH, WEEK, DAY, DOW, DOY, HOUR, MINUTE and SECOND.
extract(doy from date '2027-03-22') 82
"hour"(time) int extract hour nr (0..23) of a time or timetz or timestamp or timestamptz or sec_interval "hour"(timetz '15:35:02.002345+01:00') 15
local_timezone() sec_interval get local time zone as represented in seconds local_timezone() 3600
localtime() time get current local time without time zone localtime() time '13:16:57.734639'
localtimestamp() timestamp get current local timestamp (date and time and fractions) without time zone localtimestamp() timestamp '2020-03-22 13:16:57.734639'
"minute"(time) int extract minutes (0..59) of a time or timetz or timestamp or timestamptz or sec_interval "minute"(timetz '15:35:02.002345+01:00') 35
"month"(date) int extract month nr (1..12) of a date or timestamp or timestamptz or month_interval "month"(date '2020-07-22') 7
now() timestamptz get current timestamp (date and time and fractions) with time zone now() timestamptz '2020-03-22 13:16:57.734639+01:00'
quarter(date) int extract quarter nr (1..4) of a date or timestamp or timestamptz quarter(date '2020-07-22') 3
"second"(sec_interval) int extract seconds (0 .. 59) of a sec_interval "second"(24.603) 24
"second"(time) decimal(9,6) extract seconds (0.000 .. 59.999999) of a time or timetz or timestamp or timestamptz "second"(timetz '15:35:02.002345+01:00') 2.002345
sql_add(date, month_interval) same as arg 1 add an integer month interval to a date or timestamp or timestamptz sql_add(a, b)  
sql_add(date, sec_interval) same as arg 1 add a decimal seconds interval to a date or time or timetz or timestamp or timestamptz sql_add(a, b)  
sql_sub(date, date) int subtract a date from a date returning nr of seconds between the two dates sql_sub(a, b)  
sql_sub(date, month_interval) same as arg 1 subtract an integer month interval from a date or timestamp or timestamptz sql_sub(a, b)  
sql_sub(date, sec_interval) same as arg 1 subtract a decimal seconds interval from a date or time or timetz or timestamp or timestamptz sql_sub(a, b)  
sql_sub(time, time) sec_interval subtract a time from a time returning nr of seconds interval between the two times sql_sub(a, b)  
sql_sub(timestamp, timestamp) sec_interval subtract a timestamp from a timestamp returning nr of seconds interval between the two timestamps sql_sub(a, b)  
sql_sub(timestamptz, timestamptz) sec_interval subtract a timestamptz from a timestamptz returning nr of seconds interval between the two timestamptzs sql_sub(a, b)  
sql_sub(timetz, timetz) sec_interval subtract a timetz from a timetz returning nr of seconds interval between the two timetzs sql_sub(a, b)  
str_to_date(s, format_str) date parse string s using format_str and construct a date value str_to_date(a)  
str_to_time(s, format_str) time parse string s using format_str and construct a time value str_to_time(a)  
str_to_timestamp(s, format_str) timestamp parse string s using format_str and construct a timestamp value str_to_timestamp(a)  
time_to_str(time, format_str) clob convert time value to a string using format_str time_to_str(a)  
timestamp_to_str(timestamp, format_str) clob convert timestamp value to a string using format_str timestamp_to_str(a)  
week(date) int extract week nr (1 .. 53) of a date or timestamp or timestamptz. same as: weekofyear(date) week(date '2020-03-22') 12
weekofyear(date) int extract week nr (1 .. 53) of a date or timestamp or timestamptz. same as: "week(date) weekofyear(date '2020-03-22') 12
"year"(date) int extract year nr of a date or timestamp or timestamptz or month_interval "year"(date '2020-03-22') 2020

 

Date/Time Pseudo Columns

Pseudo Column Description Example Result
CURRENT_DATE get current date select current_date date '2020-02-02'
CURRENT_TIME get current time select current_time timetz '15:35:02.073522+01:00'
CURRENT_TIMESTAMP get current timestamp select current_timestamp timestamptz '2020-02-02 15:35:02.002345+01:00'
CURRENT_TIMEZONE get current time zone as seconds interval select current_timezone 3600.000
NOW same as: CURRENT_TIMESTAMP select now timestamptz '2020-02-02 15:35:02.002345+01:00'
LOCALTIME get current local time without time zone select localtime time '15:35:02.073522'
LOCALTIMESTAMP get current local timestamp without time zone select localtimestamp timestamp '2020-02-02 15:35:02.073522'

See also: Date/Time Pseudo Columns

JSON Functions

JSON Functions giulia Tue, 05/19/2020 - 18:21

These apply to MonetDB SQL type: JSON (JavaScript Object Notation).

Note: All JSON functions are located in the json schema. You must include the json. prefix for all these functions in order to work properly.

Function Return type Description Example Result
json.filter(js, indx) json extracts a indexed component from js. The index starts at 0 for the first element. json.filter(json '["a", 4]', 1) [4]
json.filter(js, pathexpr) json extracts the compgonent from js that satisfied the pathexpr string. The supported JSON path expression syntax is documented at JSONPath json.filter(json '{"price":9}', '$..price') [9]
json."integer"(js) bigint or int or smallint or tinyint turn a number or singleton array value or singleton object element of js into an integer. Returns null if it fails. json."integer"(json '{"n":4}') 4
json.isarray(str) boolean checks the string str for JSON array compliance json.isarray('[1,7]') true
json.isarray(js) boolean checks the JSON value js for JSON array compliance json.isarray(json '[1,7]') true
json.isobject(str) boolean checks the string str for JSON object compliance json.isobject('{"n":4}') true
json.isobject(js) boolean checks the JSON value js for JSON object compliance json.isobject(json '{"n":4}') true
json.isvalid(str) boolean checks the string str for JSON syntax compliance json.isvalid('{"a":[1]}') true
json.isvalid(js) boolean checks the JSON value js for JSON validity json.isvalid(json '{"a":[1,2]}') true
json.keyarray(js) json returns an arraylist of key tags for the top-level components of js. Note js must represent a JSON object else an error (Object expected) is reported json.keyarray(json '{"id":2, "nm":"X"}') ["id","nm"]
json.length(js) int returns the number of top-level components of js json.length(json '{"a":[1]}') 1
json.number(js) double turn a number or singleton array value or singleton object tag of js into a double. Returns null if it fails. json.number(json '{"n":4}') 4.0
json.text(js) clob glue together the values in js separated by space character json.text(json '[1,2,3,4]') 1 2 3 4
json.text(js, SepChar) clob glue together the values in js separated by SepChar character json.text(json '[1,2,3,4]', 'x') 1x2x3x4
json.valuearray(js) json returns an arraylist of value tags for the top-level components of js. Note js must represent a JSON object else an error (Object expected) is reported json.valuearray(json '{"a":1, "b":2}') [1,2]

 

Aggregate Functions returning a concatenated CLOB string

Function Return type Description Example Result
json.tojsonarray(double) clob convert numeric values in the column/group into a json array string json.tojsonarray(i) [ "1", "2", "3", "4" ]
json.tojsonarray(clob) clob convert string values in the column/group into a json array string json.tojsonarray(c) [ "one", "two", "tree", "four" ]

Logical Functions And Operators

Logical Functions And Operators giulia Mon, 03/02/2020 - 17:37

Logical Operators

The usual logical operators are available: AND, OR, NOT

SQL uses a three-valued logic system with true, false, and null, which represents “unknown”. Observe the following truth tables:

 a  b a AND b a OR b
true true true true
true false false true
true null null true
false false false false
false null false null
null null null null

 

 

 

 

 

 

 a NOT a a IS NULL a IS NOT NULL
true false false true
false true false true
null null true false

 

 

 


The operators AND and OR are commutative, that is, you can switch the left and right operand without affecting the result.

Logical Functions

Function Return type Description Example Result
"all"(a boolean, b boolean, c boolean) boolean ??? "all"(true, false, true) null
"and"(a boolean, b boolean) boolean a AND b "and"(true, false) false
"any"(a boolean, b boolean, c boolean) boolean ??? "any"(true, false, true) true
"not"(a boolean) boolean NOT a "not"(true) false
"or"(a boolean, b boolean) boolean a OR b "or"(true, false) true
"xor"(a boolean, b boolean) boolean a OR b, but NOT, a AND b "xor"(true, true) false

Mathematical Functions and Operators

Mathematical Functions and Operators giulia Tue, 03/03/2020 - 11:51

These apply to MonetDB numeric SQL types: tinyint, smallint, int, bigint, hugeint, decimal, double, float and real.
Some of these functions and operators work also on SQL types: interval month and interval second.

Mathematical Operators

Operator Description Example Result
+ addition 2 + 3 5
- subtraction 2 - 3 -1
* multiplication 2 * 3 6
/ division (integer division truncates the result) 5 / 2 2
/ division (on decimals or floating point) 5.0 / 2 2.5
% modulo (remainder) 5 % 4 1
& bitwise AND 91 & 15 11
| bitwise OR 32 | 3 35
^ bitwise XOR 17 ^ 5 20
~ bitwise NOT ~1 -2
<< bitwise shift left 1 << 4 16
>> bitwise shift right 8 >> 2 2


 


 


 


 


 


 

Mathematical Functions

Function Return type Description Example Result
abs(x) same as input absolute value of a number abs(-17.4) 17.4
acos(x) double arc cosine of angle x in radians acos(0.54) 1.0003592173949747
asin(x) double arc sine of angle x in radians asin(1.0) 1.5707963267948966
atan(x) double arc tangent of angle x in radians atan(1.0) 0.7853981633974483
atan(x, y) double arc tangent of angle x/y in radians atan(1.0, 2.0) 0.4636476090008061
bit_and(x, y) int or decimal bitwise AND bit_and(91, 15) 11
bit_not(x) int or decimal bitwise NOT bit_not(1) -2
bit_or(x, y) int or decimal bitwise OR bit_or(32, 3) 35
bit_xor(x, y) int or decimal bitwise XOR bit_xor(17, 5) 20
ceil(x) double nearest integer greater than or equal to argument
(same as ceiling(x))
ceil(-42.8) -42
ceiling(x) double nearest integer greater than or equal to argument
(same as ceil(x))
ceiling(-95.3) -95
cos(x) double cosine of angle x in radians cos(12.2) 0.9336336440746373
cosh(x) double hyperbolic cosine of angle x in radians cosh(3.0) 10.067661995777765
cot(x) double cotangent of angle x in radians cot(16.0) 3.326323195635449
sys.degrees(rad double) double convert radians angle value to degrees: r*180/pi() sys.degrees(0.5) 28.64788975654116
exp(x) double exponential exp(1) 2.718281828459045
floor(x) real or double nearest integer less than or equal to argument floor(-42.8) -43
get_value_for(schema_nm, seq_nm) bigint get current value of a sequence get_value_for('dwh2', 'seq_9201') 1234
greatest(x, y) same as input greatest (highest) value of x and y
(same as sql_max(x,y))
greatest(1.2, 5) 5
least(x, y) same as input least (lowest) value of x and y
(same as sql_min(x,y))
least(1.2, 5) 1.2
left_shift(x, i) same as first arg bitwise shift left i bits left_shift(1, 4) 16
ln(x) double natural logarithm ln(2.0) 0.6931471805599453
log(x) double natural logarithm log(2.0) 0.6931471805599453
log(x, b) double logarithm to base b log(64.0, 2) 6.0
log10(x) double base 10 logarithm log10(100.0) 2.0
log2(x) double base 2 logarithm log2(64.0) 6.0
mod(x, y) same as input modulo (remainder) mod(5.0, 2.1) 0.8
next_value_for(schema_nm, seq_nm) bigint get current value of a sequence and increment it next_value_for('dwh2', 'seq_9201') 1234
pi() double π constant pi() 3.141592653589793
power(x, y) double x raised to the power of y power(2, 5) 32
sys.radians(degr double) double convert degrees angle value to radians: d*pi()/180 sys.radians(45) 0.7853981633974483
rand() int random integer value between 0 and 2147483648 rand() 917632440
rand(seed) int random integer value with seed set rand(-5) 1345532277
right_shift(x, i) same as first arg bitwise shift right i bits right_shift(16, 2) 4
round(x, d) same as first arg round x to d decimal places round(42.4382, 2) 42.44
scale_down(x, y) same as first arg ??? divide x by y and round to integer scale_down(100, 3) 33
scale_up(x, y) same as first arg ??? multiply x by y scale_up(13, 3) 39
sign(x) tinyint sign of the argument: -1 when x is negative, 0 when x is zero, 1 when x is positive sign(-8.4) -1
sin(x) double sine of angle x in radians sin(1.4) 0.9854497299884601
sinh(x) double hyperbolic sine of angle x in radians sinh(1.4) 1.9043015014515339
sql_add(x, y) same as input x plus y sql_add(2, 3.4) 5.4
sql_div(x, y) same as input x divided by y sql_div(3.4, 2) 1.7
sql_max(x, y) same as input greatest (highest) value of x and y
(same as greatest(x,y))
sql_max(1.2, sql_max(3.3, 5)) 5
sql_min(x, y) same as input least (lowest) value of x and y
(same as least(x,y))
sql_min(1.2, sql_min(3.3, 5)) 1.2
sql_mul(x, y) same as input x multplied by y sql_mul(1.2, 5) 6.0
sql_neg(x) same as input x times -1 sql_neg(-2.5) 2.5
sql_sub(x, y) same as input subtract y from x sql_sub(5, 7) -2
sqrt(x) double square root sqrt(2.0) 1.4142135623730951
sys.alpha(pdec double, pradius double) double compute alpha 'expansion' of theta for a given declination (used by SkyServer) sys.alpha(5.0, 1.2) 1.2045844792437546
sys.degrees(rad double) double convert radians angle value to degrees: r*180/pi() sys.degrees(0.5) 28.64788975654116
sys.fuse(smallint, smallint) integer combine the bits of two integers into a larger integer type sys.fuse(2, 6) 518
sys.ms_round(x double, prc int, trunc int) double round to prc decimal places and ?? truncate to trunc decimal places sys.ms_round(1.2359, 2, 0) 1.24
sys.ms_trunc(num double, prc int) double truncate to prc decimal places sys.ms_trunc(1.2359, 2) 1.23
sys.radians(degr double) double convert degrees angle value to radians: d*pi()/180 sys.radians(45) 0.7853981633974483
tan(x) double tangent of angle x in radians tan(1.4) 5.797883715482887
tanh(x) double hyperbolic tangent of angle x in radians tanh(1.4) 0.8853516482022625

Statistics Operators

Statistics Operators giulia Mon, 03/02/2020 - 17:29

The SQL implementation provides the well-known standard SQL aggregate functions COUNT(*|...), COUNT(DISTINCT ...), SUM(...), AVG(...), MIN(...) and MAX(...) over scalar types/expressions and groupings. In addition, a few important statistical aggregate functions: MEDIAN, QUANTILE, STDDEV, VAR and correlation CORR are available.

extra_statistical_aggregate_functions:
      sys.median '(' scalar_expr ')'
    | sys.median_avg '(' scalar_expr ')'
    | sys.quantile '(' scalar_expr ',' scalar_expr ')'
    | sys.quantile_avg '(' scalar_expr ',' scalar_expr ')'
    | sys.stddev_samp '(' scalar_expr ')'
    | sys.stddev_pop '(' scalar_expr ')'
    | sys.var_samp '(' scalar_expr ')'
    | sys.var_pop '(' scalar_expr ')'
    | sys.corr '(' scalar_expr ',' scalar_expr ')'

For sys.quantile the percentile argument is a float value between 0.0 and 1.0. sys.median(<expr>) is equivalent to sys.quantile(<expr>, 0.5).
sys.stddev_samp computes the cumulative sample standard deviation and returns the square root of the sample variance. sys.stddev_pop computes the population standard deviation and returns the square root of the population variance. Both functions take as an argument any numeric datatype.
Likewise, sys.var_samp and sys.var_pop functions return the sample variance (/n-1) of a set of numbers and the biased variance (/n) of a set of numbers, respectively.

Note: The aggregate functions sys.median_avg and sys.quantile_avg are added in Nov2019 (11.35.3) release. They return the interpolated value if the median/quantile doesn't fall exactly on a particular row. These functions always return a value of type DOUBLE and only work for numeric types (various width integers, decimal and floating point).

Usage example:

   create schema aggr_tst;
   set schema aggr_tst;
   create table tc (c real);
   insert into tc values (1), (2), (3), (4), (5), (9);
   select * from tc;
   select count(*) countstar, COUNT(c) count, COUNT(DISTINCT c) countdistinct
        , SUM(c) sum, AVG(c) average, PROD(c) product, MIN(c) minimum, MAX(c) maximum
        , sys.MEDIAN(c) median
        , sys.MEDIAN_AVG(c) median_avg
        , sys.QUANTILE(c, 0.5) quantile
        , sys.QUANTILE_AVG(c, 0.5) quantile_avg
        , sys.STDDEV_SAMP(c) stddev_samp
        , sys.STDDEV_POP(c) stddev_pop
        , sys.VAR_SAMP(c) var_samp
        , sys.VAR_POP(c) var_pop
        , sys.CORR(c, c+1) corr
        , sys.group_concat(c) group_concat
        , sys.group_concat(c, '|') group_concat_mysep
    from tc;
   drop table tc;
   set schema sys;
   drop schema aggr_tst;

 

Tip: To view all the available aggregate functions in your MonetDB server use query:

SELECT * FROM sys.functions where type = 3;

 

String Functions and Operators

String Functions and Operators giulia Tue, 03/03/2020 - 11:54

These apply to MonetDB SQL types: char, varchar, clob and subtypes: json and url. In MonetDB all character strings are UTF-8 encoded.

Character String Operators

Operator Description Example Result
|| concatenate two strings 'Monet' || 'DB' MonetDB
|| concatenation with one non-string input 'Value: ' || 42 Value: 42


 

 

Character String Functions

NOTE: Below table is UNDER CONSTRUCTION

Function Return type Description Example Result
ascii(s) int Unicode code point of the first character of the string s ascii('€') 8364
char_length(s) int number of UTF-8 characters in UTF-8 string s char_length('2€') 2
character_length((s) int number of UTF-8 characters in UTF-8 string s character_length(('2€') 2
charindex(substr, s) int start position of substr in string s or 0 when not found charindex('gh', 'highigh') 3
charindex(substr, s, i) int start position of substr in string s starting at offset i
or 0 when not found
charindex('gh', 'highigh', 4) 6
concat(s1, s2) same as input concatenate strings s1 and s2.
returns null when either s1 or s2 is null
concat('Front ', 242) Front 242
difference(s1, s2) int converts two strings s1 and s2 to their soundex codes and reports the number of matching code positions. Since soundex codes have four characters, the result ranges from zero to four, with zero being no match and four being an exact match. difference('MonetDB', 'DB3') 4
editdistance(s1, s2) int compute the 'distance' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string editdistance('MonetDB', 'DB3') 6
editdistance2(s1, s2) int compute the 'distance2' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string editdistance2('MonetDB', 'DB3') 6
"index"(s, boolean) int ??? "index"('abcdef', false) 0
insert(s1, int pos, int nr, s2) same as input arg 4 insert string s2 into string s1 after position pos replacing nr characters in string s1. when pos is negative it counts from the end of the string. nr must be >= 0. insert('abcdef', 2, 1, 'XY') abXYdef
lcase(s) same as input convert string s to lower case lcase('MonetDB') monetdb
left(s, int n) same as input arg 1 return first n characters in the string left('MonetDB', 3) Mon
length(s) int count number of UTF-8 characters in UTF-8 string s length('2€') 2
levenshtein(s1, s2) int compute the 'levenshtein distance' between two strings returning an integer between 0 (equal) and n (largest distance) where n is the length of largest string levenshtein('MonetDB', 'DB3') 6
levenshtein(s1, s2, int ??, int ??, int ??) int compute the 'levenshtein distance' between two strings returning an integer between 0 (equal) and n + ?? (largest distance) where n is the length of largest string levenshtein('MonetDB', 'DB3', 4, 2, 1) 8
locate(s1, s2) int locate string s1 in s2, when found return start position else 0 locate('DB', 'MonetDB') 6
locate(s1, s2, int pos) int locate string s1 in s2 starting from position pos, when found return start position else 0 locate('DB', 'DBMonetDB', 3) 8
lower(s) same as input convert string s to lower case lower('MonetDB') monetdb
lpad(s, int len) same as input arg 1 fill up the string to length len by prepending spaces. If the string is already longer than length then it is truncated (on the right). lpad('abc', 6)    abc
lpad(s1, int len, s2) same as input arg 1 fill up the string s1 to length len by prepending the characters s2. If the string s1 is already longer than length then it is truncated (on the right). lpad('abc', 6, 'DB') DBDabc
ltrim(s) varchar remove leading whitespace characters from string s ltrim(' \t\fab \tdf ') ab df
ltrim(s1, s2) varchar remove leading s2 strings from string s1 ltrim('aabcdaa', 'a') bcdaa
sys.md5(s) clob(32) calculates the MD5 hash of string, returning the result in hexadecimal sys.md5('abc') 900150983cd24fb0d6963f7d28e17f72
sys.ms_stuff(varchar(32) s1, int pos, int nr, varchar(32) s2) varchar(32) insert string s2 into string s1 at position pos replacing nr characters in string s1. If pos <= 0 or > length of s1 then an empty string is returned. nr may be negative. sys.ms_stuff('abcdef', 2, 1, 'XY') aXYcdef
not_ilike(s1, s2) boolean   not_ilike(a)  
not_ilike(s1, s2, s3) boolean   not_ilike(a)  
not_like(s1, s2) boolean   not_like(a)  
not_like(s1, s2, s3) boolean   not_like(a)  
octet_length(s) int number of bytes in UTF-8 string s octet_length('2€') 4
patindex(s1, s2) int   patindex(a)  
qgramnormalize(s) same as input q-gram analysis on a character sequence of string s qgramnormalize(a)  
repeat(s, nr) same as input arg 1 repeat string s the specified nr of times repeat('kiss,', 3) kiss,kiss,kiss,
replace(s1, s2, s3) same as input arg 1   replace(a)  
sys.reverse(s) clob return reversed string sys.reverse('MonetDB') BDtenoM
right(s, int n) same as input arg 1 return last n characters in the string right('MonetDB', 4) etDB
rpad(s, int) same as input arg 1 fill up the string to length len by appending spaces. If the string is already longer than length then it is truncated (on the right). rpad('abc', 6) abc   
rpad(s1, int len, s2) same as input arg 1 fill up the string s1 to length len by appending the characters s2. If the string s1 is already longer than length then it is truncated (on the right). rpad('abc', 6, 'DB') abcDBD
rtrim(s) varchar remove trailing whitespace characters from string s rtrim(' \tab df \t\n') ab df
rtrim(s1, s2) varchar remove trailing s2 strings from string s1 rtrim('aabcdaa', 'a') aabcd
similarity(s1, s2) double computes the similarity between string s1 and s2 and returns a number between 0 (no similarity) and 1 (equal). similarity('MonetDB', 'DB3') 0.4
soundex(s) varchar(4) computes the Soundex code from string s using a phonetic algorithm soundex('MonetDB') M533
splitpart(s1, s2, int) same as input arg 1   splitpart(a)  
strings(s) varchar ??? strings('MonetDB') null
substr(s, startpos) same as input arg 1 extract substring starting from position startpos substr('abCde', 3) Cde
substr(s, startpos, count) clob(count) extract substring starting from position startpos for count characters substr('abCde', 2, 3) bCd
substring(s, startpos) same as input arg 1 extract substring starting from position startpos substring('abcde', 3) cde
substring(s, startpos, count) clob(count) extract substring starting from position startpos for count characters substring('abcde', 2, 2) bc
trim(s) varchar remove leading and trailing whitespace characters from string s trim(' \t\fab df \t\n') ab df
trim(s1, s2) varchar remove leading and trailing s2 strings from string s1 trim('aabcdaa', 'a') bcd
truncate(s, int) same as input arg 1   truncate(a)  
ucase(s) same as input convert string s to upper case ucase('MonetDB') MONETDB
upper(s) same as input convert string s to upper case upper('MonetDB') MONETDB

URL Functions

URL Functions giulia Tue, 05/19/2020 - 18:31

These functions apply to MonetDB SQL type: URL (Uniform Resource Locator).

A URL string has a specific syntax, see URL syntax for valid URL string patterns.

 

For the Examples below we use following url_example string: 'https://me@www.monetdb.org:458/Doc/Abc.html?lang=nl&sort=asc#example' to create the shown Results.

Warning: You must include the sys. prefix for all these URL functions (see examples) in order to work properly from other schemas than sys.

Function Return type Description Example Result
sys.getanchor(url) clob extract the anchor part (after the #) from the url, may return null if no achor exists sys.getanchor(url_example) example
sys.getbasename(url) clob extract the base name part (after the / excluding the extension) from the path of the url, may return null sys.getbasename(url_example) Abc
sys.getcontext(url) clob extract the path context part of the url sys.getcontext(url_example) /Doc/Abc.html
sys.getdomain(url) clob extract the top-level domain part of the host part of the url, may return null sys.getdomain(url_example) org
sys.getextension(url) clob extract the file extension part of the url, may return null sys.getextension(url_example) html
sys.getfile(url) clob extract the last file part of the path of the url, may return null sys.getfile(url_example) Abc.html
sys.gethost(url) clob extract the host part of the url, may return null sys.gethost(url_example) www.monetdb.org
sys.getport(url) clob extract the port number part of the url, may return null sys.getport(url_example) 458
sys.getprotocol(url) clob extract the protocol or scheme part of the url sys.getprotocol(url_example) https
sys.getquery(url) clob extract the query part (after the ?) of the url, may return null sys.getquery(url_example) lang=nl&sort=asc
sys.getroboturl(url) clob extract the location of the robot control file of the url sys.getroboturl(url_example) https://me@www.monetdb.org:458/robots.txt
sys.getuser(url) clob extract the user part of the url, may return null sys.getuser(url_example) me
sys.isaurl(string) boolean checks whether a character string complies to the url syntax format sys.isaurl(url_example) true
sys.newurl(scheme, authority, path) url create a new url from three strings: scheme, authority or host and path sys.newurl('https', 'usr@www.a.com:123', 'docs/index.html') https://usr@www.a.com:123/docs/index.html
sys.newurl(scheme, host, int port, path) url create a new url from three strings and a port number as third argument sys.newurl('https', 'www.a.com', 5567, 'docs/index.html') https://www.a.com:5567/docs/index.html

Window Functions

Window Functions giulia Mon, 03/02/2020 - 17:27

SQL provides a method to aggregate over a series of related tuples. They are called window functions and always come with an OVER() clause which determines how tuples are split up over the window functions.The PARTITION BY clause within OVER divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed over all rows participating in the group. The order within a partition can be used as well.

window_function_spec:
    window_function OVER   { window_name | '(' window_specification ')' }

window_function:
      RANK '(' ')'
    | DENSE_RANK '(' ')'
    | PERCENT_RANK '(' ')'
    | ROW_NUMBER '(' ')'
    | CUME_DIST '(' ')'
    | FIRST_VALUE '(' query_expression ')'
    | LAST_VALUE '(' query_expression ')'
    | NTH_VALUE '(' query_expression ',' query_expression ')'
    | NTILE '(' query_expression ')'
    | LEAD '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | LAG '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | aggregate_function

aggregate_function:
      COUNT '(' '*' ')'
    | COUNT '(' query_expression ')'
    | MAX '(' query_expression ')'
    | MIN '(' query_expression ')'
    | SUM '(' query_expression ')'
    | PROD '(' query_expression ')'
    | AVG '(' query_expression ')'

window_specification:
    [ window_name ]   [ PARTITION BY column_ref   [ ',' ... ] ]   [ ORDER BY sort_spec ]   [ window_frame ]

window_frame:
    { ROWS | RANGE | GROUPS }   { window_frame_start | BETWEEN window_bound AND window_bound }
        [ EXCLUDING { CURRENT ROW | GROUP | TIES | NO OTHERS }   ]

window_frame_start:
     UNBOUNDED PRECEDING
   | value PRECEDING
   | CURRENT ROW

window_bound:
    window_frame_start
   | UNBOUNDED FOLLOWING
   | value FOLLOWING

Supported Window Functions:
    RANK() : BIGINT - Returns the rank number within a partition, starting at 1.
    DENSE_RANK() : BIGINT - Returns the rank of the current row without gaps, it counts peer groups.
    PERCENT_RANK() : DOUBLE - Calculates the relative rank of the current row: (rank() - 1) / (rows in partition - 1).
    ROW_NUMBER() : BIGINT - Returns the position of the tuple currently in the result set, starting at 1.
    CUME_DIST() : DOUBLE - Calculates the cumulative distribution: number of rows preceding or peer with current row / rows in partition.
    FIRST_VALUE(input A) : A - Returns input value at first row of the window frame.
    LAST_VALUE(input A) : A - Returns input value at last row of the window frame.
    NTH_VALUE(input A, nth BIGINT) : A - Returns input value at “nth” row of the window frame. If there is no “nth” row in the window frame, then NULL is returned.
    NTILE(nbuckets BIGINT) : BIGINT - Enumerates rows from 1 in each partition, dividing it in the most equal way possible.
    LAG(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” before the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
    LEAD(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” after the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
    MIN(input A) : A
    MAX(input A) : A
    COUNT(*) : BIGINT
    COUNT(input A) : BIGINT
    SUM(input A) : A
    PROD(input A) : A
    AVG(input A) : DOUBLE.

The supported frames are:
    ROWS - Frames are calculated on physical offsets of input rows.
    RANGE - Result frames are calculated on value differences from input rows (used with a custom PRECEDING or FOLLOWING bound requires an ORDER BY clause).
    GROUPS - Groups of equal row values are used to calculate result frames (requires an ORDER BY clause).

See also en.wikibooks.org/wiki/Structured_Query_Language/Window_functions

 

Examples:

create table ranktest (id int, k varchar(3));
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');

select ROW_NUMBER() over () as foo from ranktest;
select ROW_NUMBER() over (PARTITION BY id) as foo, id from ranktest;
select ROW_NUMBER() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select ROW_NUMBER() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo from ranktest;
select RANK() over (PARTITION BY id) as foo, id from ranktest;
select RANK() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select RANK() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo, id, k from ranktest;
select RANK() over (PARTITION BY id) as foo, id, k from ranktest;
select RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest;
select RANK() over (ORDER BY id, k) as foo, id, k from ranktest;

select DENSE_RANK() over () as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (ORDER BY id, k) as foo, id, k from ranktest order by k;
drop table ranktest;

For more examples see extended_sql_window_functions

Type Conversion

Type Conversion giulia Mon, 02/24/2020 - 09:53

Under construnction

Indexes

Indexes giulia Mon, 02/24/2020 - 09:56

Under construction

Indexes are a common mechanism in the DBMS landscape to improve performance of the queries. In the following section we are going to give  ahigh-level overview of the indexes that are used in MonetDB. The technical details on the subject will be discussed in the MonetDB Internals section

Primary Key

Foreign Key

HASH Index

Imprints

Ordered Index

Dictionary

 

Concurrency Control

Concurrency Control mk Mon, 02/24/2020 - 09:57

MonetDB/SQL supports a multi-statement transaction scheme marked by START TRANSACTION and closed with either COMMIT or ROLLBACK. The session variable AUTOCOMMIT can be set to true (default) if each SQL statement should be considered an independent transaction.

In the AUTOCOMMIT mode, you can use START TRANSACTION and COMMIT/ROLLBACK to indicate transactions containing multiple SQL statements.  In this case, AUTOCOMMIT is automatically disabled by a START TRANSACTION, and reenabled by a COMMIT or ROLLBACK.

If AUTOCOMMIT mode is OFF, the START TRANSACTION is implicit, and you should only use COMMIT/ROLLBACK.

WARNING. The transaction management scheme is based on optimistic concurrency control. It provides each transaction with a consistent view on the database, but updates are collected in an addendum processed on transaction commit. If at commit time it can be assured that the data prepared for update affects tables has not changed in the mean time, the results are merged. Otherwise the transaction is aborted.
This optimistic concurrency scheme is particularly useful for query dominant environments. It negatively affects long running transactions which concurrently are affected by updates on their underlying tables. The same holds for applications that tries to perform concurrent updates from multiple threads in a single application. They should be internally serialized by the application to avoid unexpected transaction aborts.

WARNING. Optimistic concurrency control may be confusing for those who built online-transaction applications, because the granularity of the concurrency control scheme will show higher then expected transaction failures. There is not a locking schema to avoid this. Applications may have to resort to serial execution.

WARNING. The tuples being deleted are only marked as such. They do not reduce the table size. You will even experience after many updates that your queries will run slower, because each query first has to establish a consistent private view on the database by inspection of the delete/update lists. It calls for a vacuum cleaning algorithm in the background, which is not yet available.

Performance Optimization

Performance Optimization giulia Mon, 02/24/2020 - 09:58

Under construction

Optimizer Pipelines

Optimizer Pipelines giulia Tue, 03/03/2020 - 14:29

The code produced by MonetDB/SQL is massaged by an optimizer pipeline.  The pipeline to be used is identified by the SQL global variable optimizer, which can be modified using a SQL assignment, e.g. the minimal optimizer pipeline is set using:

set optimizer='minimal_pipe';
sql>select optimizer;
+--------------+
| single value |
+==============+
| minimal_pipe |
+--------------+

Each pipeline consists of a sequence of MAL function calls that inspect and transform the plan for better execution. The preferred optimizer can be changed by either providing its name or the complete MAL function sequence.  For example the minimal pipe can be set also using:

sql> set optimizer='optimizer.inline();optimizer.remap();optimizer.deadcode();optimizer.multiplex();optimizer.garbageCollector();';

The final result of the optimizer steps becomes visible using the SQL EXPLAIN statement modifier. Alternatively, the SQL DEBUG statement modifier in combination with the 'o' command provides access to the intermediate optimizer results.

Several optimizer pipelines are pre-defined in the MonetDB sources. They represent ongoing development activities and experimental code. Their scope of applicability and/or stability  has not reached a satisfactory level to include it in the default pipeline. The predefined optimizer pipelines can be inspected as follows:

sql>select * from optimizers();

A user-defined optimizer pipeline is checked against the dependency information maintained in the optimizer library to ensure there are no conflicts and at least the pre-requisite optimizers are used.

The default SQL optimization pipe line contains the following steps:

inline Inline functions identified as such.
remap Locate hardwired multiplex operations.
costModel Inspects the SQL catalog for size information.
coercions Performs static type coercions.
evaluate Evaluate constant expressions.
emptybind Removes empty bindings
pushselect Push selections through e.g. joins.
aliases Remove alias assignments.
mitosis Horizontal fragmentation.
mergetable Expand horizontal fragmented plans.
deadcode Remove all code not leading to used results.
aliases Remove alias assignments.
constants Evaluate constant expressions.
commonterms Remove duplicate expressions.
projectionpath Searchs multiple joins and glues them together for better runtime optimization.
deadcode Remove all code not leading to used results.
reorder Reorder the execution order.
matpack Recombine partitions into a single BAT.
dataflow item Prepare code for multi-core execution
querylog Keep track of SQL query definitions and invocations.
multiplex Expand all remaining multiplex operations to iterators.
generator Implement for-loop generator as table producing function.
profiler Highlight the candidate lists in the code.
candidates Highlight the candidate lists in the code.
garbageCollector Injects calls to the garbage collector to free up space.

 

Performance Tips

Performance Tips giulia Tue, 03/03/2020 - 14:30

Under construction