Data manipulation

Data manipulation mk Sat, 03/27/2010 - 22:42

Table updates

Table updates mk Thu, 06/30/2011 - 16:01

Data insertions

A table can be populated using an insert statement. It takes a table name and a value expression list. The expression result types should align with the attributes in the table definition. Otherwise the attribute-value association should be explicitly defined using a column name list. Multiple rows can be inserted in a single statement.

The result of a query can be inserted into a table, provided both source and destination are type compatible. Insertion of a table into itself effectively doubles its content, provided non of the table constraints is violated. MonetDB/SQL doest not support inserts through view definitions.

INSERT INTO table_name [ '(' column_name ',' ... ')' ] [ VALUES '('value_expression','...')' ',' ... | select_query ]

Data updates

The update statement syntax follows the SQL standard, but its semantics for bulk updates on keys may be slightly different than expected from other systems. In particular, the update implementation ensures that you can freely update any column without the danger of run-away values. MonetDB/SQL doest not support updates through views.

UPDATE table_name SET assignment ',' ... [WHERE search_condition ]

assignment : column_name '='  ( scalar_exp  | search_condition | NULL )

 

 

 

Data deletions

DELETE FROM table_name [WHERE search_condition ]

MonetDB/SQL doest not support deletions through views. Deletion of ALL records  also garbage collects the disk-space of the table. It is similar to a TRUNCATE operation in other systems.

Table expressions

Table expressions mk Sat, 03/27/2010 - 22:45

When a query is created a table can be referenced in different ways, sometimes by its name or by a select query or a join result. Here is the syntax to refer a table.

table_reference: simple_table
    | '(' select_query ')' [AS] table_name [ '(' column_name ',' ... ')' ]
    | '(' select_query ')'
    | joined_table | '(' joined_table ')' [AS] table_name [ '(' column_name ',' ... ')' ]

joined_table: '(' joined_table ')'
    | table_reference { CROSS | NATURAL } JOIN table_reference
   
| table_reference { JOIN | UNIONJOIN} table_reference join_spec
    | table_reference join_type JOIN table_reference join_spec
    | table_reference NATURAL join_type JOIN table_reference

join_spec : ON search_condition | USING '(colunm_name ',' ...' )'

join_type: INNER | { LEFT | RIGHT | FULL } [OUTER]

The SQL framework for table expressions is based on the select-from-where construct.

select_query:
   
SELECT [ ALL | DISTINCT ]  ( '*' | column_name ',' ... ) [ INTO column_name ',' ... ]
    [ FROM table_name ',' ... ]
    [ WHERE search_condition ]
    [ GROUP BY column_name ',' ... ]
    [ HAVING search_condition]
    [ ORDER BY ordering_spec ',' ... ]
    [ LIMIT posint ] [ OFFSET posint ]

The with clause provides the mechanism to introduce in-line view definitions:

with_clause :  WITH with_element ',' ... select_query

with_element: table_name '(' column_name ',' ... ')' AS '(' select_query ')'

The pattern matching operations are used over predicates. It is possible to compare them, see the differences between them, if a predicate is a sub-predicate of another, etc. The following syntax description cover all the operations supported by MonetDB.

predicate:
    comparison_predicate
    | between_predicate
    | like_predicate
    | test_for_null
    | in_predicate
    | all_or_any_predicate
    | existence_test
    | '(' predicate ')'

comparison_predicate:
    pred_expr COMPARISON pred_expr
    | pred_expr = pred_exp between_predicate:
    pred_expr [NOT] BETWEEN [SYMMETRIC | ASYMMETRIC] pred_expr AND pred_expr like_predicate:
    pred_expr [NOT] LIKE atom_expr

test_for_null:
    column_name IS [NOT] NULL

in_predicate:
    pred_expr [NOT] IN '(' element_name ',' ... ')' all_or_any_predicate :
    pred_expr COMPARISON { ANY | ALL | SOME } subquery

existence_test:
    [NOT] EXISTS subquery pred_expr:
    scalar_expr
    | predicate

atom_expr:
    ['('] atom [')']
    | ['('] atom [')'] ESCAPE string
    | '?'

string_funcs:
      SUBSTRING ( scalar_expr FROM scalar_expr FOR scalar_expr ')'
    | SUBSTRING '(' scalar_expr ',' scalar_expr ',' scalar_expr ')'
    | SUBSTRING '(' scalar_exprFROM scalar_expr ')'
    | SUBSTRING '(' scalar_expr ',' scalar_expr ')'
    | SUBSTRING '(' scalar_expr int_value scalar_expr ')'
    | scalar_expr '||' scalar_expr

 

Window functions

Window functions mk Sun, 10/13/2013 - 13:52

SQL provides a method to aggregate over a series of related tuples.  They are called window functions and always come with an OVER() clause which determines how tuples are split up over the window functions.The PARTITION BY clause  within OVER divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed overall rows participating in the group. The order within a partition can be used as well.

The ROW_NUMBER() returns the position of the tuple currently in the result set. The RANK() function produces the row number within a partition, starting at 1. The DENSE_RANK() produces the rank of the current row without gaps, it  counts peer groups.

window_function:

{RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST} OVER window_name |
{RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST} OVER '('
    [window_name] [PARTITION BY column_ref ... ]
    [ORDER BY sort_spec]
    {ROWS | RANGE} window_bound_start
    [BETWEEN {window_bound_end | window_bound_start}
    AND {window_bound_end | window_bound_start} ]
    [EXCLUDING {CURRENT ROW | GROUP | TIES | NO OTHERS}

window_bound_start:

    UNBOUNDED PRECEDING | value PRECEDING | CURRENT ROW

window_bound_end:

     UNBOUNDED FOLLOWING | value FOLLOWING

The snippet below (taken from the test suite) illustrate the functionality provided.

 

create table ranktest ( id int, k string);
select ROW_NUMBER() over () as foo from ranktest;
select ROW_NUMBER() over (PARTITION BY id) as foo, id from ranktest;
select ROW_NUMBER() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select ROW_NUMBER() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo from ranktest;
select RANK() over (PARTITION BY id) as foo, id from ranktest;
select RANK() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select RANK() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo, id, k from ranktest;
select RANK() over (PARTITION BY id) as foo, id, k from ranktest;
select RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest;
select RANK() over (ORDER BY id, k) as foo, id, k from ranktest;

select DENSE_RANK() over () as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (ORDER BY id, k) as foo, id, k from ranktest order by k;
drop table ranktest;

Statistics

Statistics mk Tue, 11/19/2013 - 08:52

The SQL implementation provides the well-known standard statistical aggregates COUNT, SUM, AVG, MIN and MAX over scalar types and groupings. In addition, a few important statistical aggregate functions: MEDIAN, QUANTILE, and correlation CORR are available. The percentile is a float value between 0.0 and 1.0. MEDIAN(<expr>) is equivalent to QUANTILE(<expr>,0.5). 

STDDEV_SAMP computes the cumulative sample standard deviation and returns the square root of the sample variance. STDDEV_POP computes the population standard deviation and returns the square root of the population variance. Both functions take takes as an argument any numeric datatype.
Likewise, VAR_SAMP and VAR_POP function returns the sample variance (/n-1) of a set of numbers and the biased variance (/n) of a set of numbers, respectively.

stat_funcs:
      QUANTILE  '(' scalar_expr ',' scalar_expr ')'
    | STDDEV_SAMP '(' scalar_expr ')'
    | STDDEV_POP '(' scalar_expr ')'
    | VAR_SAMP '(' scalar_expr ')'
    | VAR_POP '(' scalar_expr ')'
    | MEDIAN '(' scalar_expr ')'
    | CORR '(' scalar_expr',' scalar_expr ')'

Bulk input/output

Bulk input/output mk Sun, 03/28/2010 - 00:48

The COPY INTO command enables fast insertion of multiple tuples from an text file or standard input. Each tuple  in the input file is terminated by a record-separator (default '\n') and fields are separated by the field separator (default '|'). The field values should adhere to the  syntax for value literals. Alternative separators should adhere to the SQL lexical syntax for string values. A different NULL value representation can be passed using the NULL as null_string option. Furthermore, the fields are optionally enclosed with a user defined quote character. The text file should use UTF-8 encoding if specified as file_name, and the same encoding as mclient is using if read using FROM STDIN.

It is strongly advised to announce the maximum number of records to be inserted. It avoids guessing by the server and subsequent re-allocation of table space which may involve potentially expensive copying. A portion of the input file can be skipped using the offset feature.  The offset value specifies the record at which loading should commence, the first record having offset 1.

The file read from must be accessible by the server. Therefore, they must reside on or be accessible to the database server machine and they must be identified with an absolute path name. The STDIN file designator reads data streaming from the client application. An empty record determines the end of sequence.

COPY [ int_val [ OFFSET int_val ] RECORDS ] INTO table_name
      FROM ( file_name ',' ... | STDIN )
      [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ] [LOCKED]

COPY [ int_val RECORDS ] OFFSET int_val INTO table_name
      FROM ( file_name ',' ... | STDIN )
      [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ] [LOCKED]

The input syntax should comply to the following grammar: [ [ [quote] [[escape]char] * [quote]] feldspar] * record separator. Quote characters in quoted fields may be escaped with a backslash. Field and record separators can be embedded in quoted fields.

LOCKED mode

In many bulk loading situations, the original file can be saved as a backup or recreated for disaster handling. This reliefs the database system from having to prepare for recovery as well and to safe significant storage space. The LOCKED qualifier can be used in this situation (and in single user mode!) to skip the logging operation normally performed.

WARNING It is advised to add integrity constraints to the table after the file has been loaded. The ALTER statements perform bulk integrity checking and perform these checks often more efficiently.

For more see the CSV bulk load recipe.

Copy into File

The COPY INTO command with a file name argument allows for fast dumping of a result set into an ASCII file. The file must be accessible by the server and a full path name may be required. The file STDOUT can be used to direct the result to the primary output channel.

The delimiters and NULL AS arguments provide control over the layout required.  

COPY subquery INTO file_name [ [USING] DELIMITERS field_separator [',' record_separator [ ',' string_quote ]]] [ NULL AS null_string ]

For both the input and output version of the COPY INTO command one can specify a file name ending with 'gz' or 'bz2'  to use the appropriate compression library (if available).

For more see the Exporting bulk data recipe.

Copying binary files

Migration of tables between MonetDB/SQL instances can be sped up using the binary COPY INTO/FROM format.

For more see the Binary bulk data load recipe.