Lexical Structure

Comments

Comments can be added to query scripts for documentation purposes. MonetDB/SQL supports two forms of comments.

Any text beginning with '--' and up to the end of line is ignored. Furthermore, C-style comments ( /* this is ignored */ ) can be injected in a SQL query where a language token is expected.

A language token can be a keyword, identifier, a literal constant or a special character. Tokens should be separated by 'white space' (tab, space, newlines) unless there juxtaposition does not create an ambiguous interpretation. Comments are considered equivalent to white spaces.

Identifiers and Keywords

SQL comes with a large collection of keywords, i.e. names reserved by the SQL committee to designate language constructs. Keywords in MonetDB/SQL are case sensitive only when quoted. Otherwise, the normal behavior of MonetDB is to map unquoted names to lower case.

E.g. select TEST from table1 will be transformed to select test from table1. If you want to refer to column TEST (uppercase) you should double quote it select "TEST" from table1.

Users can overrule the interpretation of an identifier as a keyword by encapsulation with double quotes, e.g. select denotes a keyword, while "select" denotes a user defined name. This scheme also permits inclusion of case sensitive names and names with special characters or white space in the names. This may be useful for reporting purposes where you want user friendly column header information. However, it is general advisable to limit the number of the escaped keywords.

MonetDB/SQL has following restictions on identifier names:

  • they may not start with the % character,
  • they may not be equal to TID% and
  • they may not be longer than 1024 characters.

Identifier names are used to designate database objects. In that role, they are by default case in-sensitive unless encapsulated by double quotes. The terminal identifier classes distinguished are denoted with the suffix _name.

Literal Constants

Numeric constants follow the convention of most programming languages. A numeric constant that contains neither a decimal point or exponential is considered of type integer. The underlying machine determines whether it will be interpreted as a 32- or 64-bit value, although most likely it will be that latter case these days.

String constants in SQL are embraced with single quotes like: 'string data'. If your data contains single quote's, each single quote need to be doubled, so 'a single '' quote in a string' represents string: a single ' quote in a string.

We support Unicode character string literals using: U&'...' including UESCAPE. For the string literals, you can have U&'... '...' '...' UESCAPE '...' where the escape must be as single character and the other '...' strings are also Unicode character string literals. For now, these latter strings also undergo C-style backslash interpretation.

We support PostgreSQL-like E'...' strings (from Apr2019). The strings can contain C-style backslash escapes.

We support raw strings (from Jun2020) by using R'...' or r'...'. This means that C-style backslash escapes will remain uninterpreted within those strings. For instance SELECT r'\t' returns a char string of length 2 where SELECT e'\t'; returns a char string of length 1.

Binary data can be represented in hexadecimal string notation using 2 hexadecimal characters per byte, for example x'10FF' to represent 2 byte values: 16 and 255. This notation can be used with the blob (binary large object) data type.)

Often strings can be cast to other types using the CAST( expr AS type ) expression, provided the typed coercion routine is available. The following example illustrates conversion of a digits string value into an integer, a decimal and a double precision floating point number.

SELECT CAST( '129' AS integer );
SELECT CAST( '129.42' AS decimal );
SELECT CAST( '0.3' AS double );

Temporal Constants

The DATE, TIME and TIMESTAMP data types come with a straightforward string literal conversion structure, e.g.:

SELECT DATE '2014-02-03', TIME '15:45:56', TIMESTAMP '2014-02-03 15:45:56';

produces a single properly typed row result. It is equivalent to:

SELECT CAST('2014-02-03' AS DATE), CAST ('15:45:56' AS TIME), CAST ('2014-02-03 15:45:56' AS TIMESTAMP);

Special Characters

String literals (currently both the '...' and E'...' styles) may contain the C-style escape sequences for special characters such as: \n for new line, \t for horizontal tab, \r for carriage return, \f for form feed, \' for single quotation mark, and \\ for the backslash character.
The convention \ddd where d is a digit stands for a character code number denoted in octal.
The convention \xhh where h is a hexadecimal digit stands for a character code number denoted in hex.
The convention \uhhhh where h is a hexadecimal digit stands for a unicode code point denoted in hex.

Operator Precedences

Most operators in SQL have the same precedence and are left-associative. Parenthesis can be used to disambiguate the precedence order. The operator definitions can not be overloaded, nor can you define new operators.

.lefttable/column name separator
-rightunary minus
* / %leftmultiplication, division, modulo
+ -leftunary addition, subtraction
IS, IS TRUE, IS FALSE, IS UNKNOWN
IS NULL, IS NOT NULLtest for (not) null
INset membership
BETWEENrange containment
LIKE, ILIKEstring pattern matching
< >less than, greater than
=rightequality, assignment
NOTrightlogical negation
ANDleftlogical conjunction
ORleftlogical disjunction
^leftlogical exclusive disjunction

Value Expressions

The language is built around value- and table- expressions. Value expressions encompass denotation of literal constants, type casts, column references, operator and function invocation, and scalar returning subqueries.

The column references take the form [correlation_name '.'] column_name. The correlation_name is either a table name or an alias introduced in a from clause. The correlation name may be omitted if the column name uniquely identifies a column within the scope of current query.

The table references take the form [schema_name '.'] table_or_view_name [AS alias_name]. Table expressions produce a relational table. It is internally referenced through a correlation name, which supports attribute selection using the '.' denotation.