
SQL is the prime language of interaction with MonetDB. However, it is continuously being extended with small and major code fragments that extend the functionality described in the SQL 2003 standard. It includes completely wrapped libraries, e.g. geos, or address non-core SQL.
BEWARE, as mentioned in the Mar2011 release notes, work on the Pathfinder compiler, which forms the basis of MonetDB XQuery, continues at The University of Tübingen [1].
[2]
MonetDB/SQL comes with an interface to the Simple Feature Specification of OpenGIS which opens the route to develop GIS applications
The MonetDB/SQL/GIS module supports all objects and functions specified in the OGC "Simple Features for SQL" specification. Spatial objects can, however, for the time being only expressed in the Well-Known Text (WKT) format. WKT includes information about the type of the object and the object's coordinates.
The GIS functionality is packaged as a separate MonetDB module called geom. To benefit from the geometry functionality you first have to download and install GEOS [3]. It is a well-known and sound library to built upon. The next step is to (re-)build MonetDB with the --enable-geom configure argument. This will build the necessary extension modules and activate them upon the first start of the server. Note that databases created before you configured with support for geom will lack geom functions in SQL. You best start on a new database.
The spatial extension of MonetDB requires the user to simply use geom data types from SQL.
Example The script below creates and populates a 'forest' table and a 'buildings' table followed by a spatial query in this fictive landscape.
CREATE TABLE forests(id INT,name TEXT,shape MULTIPOLYGON);
CREATE TABLE buildings(id INT,name TEXT,location POINT,outline POLYGON);
INSERT INTO forests VALUES(109, 'Green Forest',
'MULTIPOLYGON( ((28 26,28 0,84 0,84 42,28 26), (52 18,66 23,73 9,48 6,52 18)), ((59 18,67 18,67 13,59 13,59 18)))');
INSERT INTO buildings VALUES(113, '123 Main Street',
'POINT( 52 30 )',
'POLYGON( ( 50 31, 54 31, 54 29, 50 29, 50 31) )');
INSERT INTO buildings VALUES(114, '215 Main Street',
'POINT( 64 33 )',
'POLYGON( ( 66 34, 62 34, 62 32, 66 32, 66 34) )');
SELECT forests.name,buildings.name
FROM forests,buildings
WHERE forests.name = 'Green Forest' and
Overlaps(forests.shape, buildings.outline) = true;
There are no special accelerators to speed up access to Spatial Objects yet. However, we can use the Minimum Bounding Rectangle (mbr) datatype to accelerate operations considerably. This requires a small query rewrite. In the example above the performance of the query can be improved in the following manner:
ALTER TABLE forests ADD bbox mbr;
UPDATE forests SET bbox = mbr(shape);
ALTER TABLE buildings ADD bbox mbr;
UPDATE buildings SET bbox = mbr(outline);
SELECT forests.name,buildings.name
FROM forests,buildings
WHERE forests.name = 'Green Forest' AND
mbroverlaps(forests.bbox,buildings.bbox) = TRUE AND
Overlaps(forests.shape, buildings.outline) = TRUE;
In this way the mbr operation acts as a filter. Upon request, and availabilty of resources, we will develop MAL optimizers to automate this process.
The MonetDB distribution includes the program 'shp2monetdb' to convert a shapefile into SQL insert statements. This is a port from the shp2pgsql program that is shipped with PostGIS.
The functionality of the functionality has been tested with OpenJUMP [4], an open-source Geographic Information System.
This is the first implementation of OpenGIS functionality in MonetDB. Many issues require our attention, but priority will be derived from concrete external requests and availability of manpower. The shortlist of open issues is:
MonetDB supports the OpenGIS [5] types: Point, Curve, LineString, Surface, Polygon, MultiPoint, MultiCurve, MultiLineString, MultiSurface, MultiPolygon, Geometry and GeomCollection.
One non-OpenGIS type for fast access is used. This type 'mbr' is used for storing a 2D box. Functions to create these boxes are specified in following sections.
The following functions return the minimum bounded rectangle (or boolean) of a given geometry:
| CREATE FUNCTION mbr (p Point) RETURNS mbr |
| CREATE FUNCTION mbr (c Curve) RETURNS mbr |
| CREATE FUNCTION mbr (l LineString) RETURNS mbr |
| CREATE FUNCTION mbr (s Surface) RETURNS mbr |
| CREATE FUNCTION mbr (p Polygon) RETURNS mbr |
| CREATE FUNCTION mbr (m multipoint) RETURNS mbr |
| CREATE FUNCTION mbr (m multicurve) RETURNS mbr |
| CREATE FUNCTION mbr (m multilinestring) RETURNS mbr |
| CREATE FUNCTION mbr (m multisurface) RETURNS mbr |
| CREATE FUNCTION mbr (m multipolygon) RETURNS mbr |
| CREATE FUNCTION mbr (g Geometry) RETURNS mbr |
| CREATE FUNCTION mbr (g GeomCollection) RETURNS mbr |
| CREATE FUNCTION mbroverlaps(a mbr, b mbr) RETURNS BOOLEAN |
Convert a Well-Known-Text string to a spatial object. The SRID parameter is a reference to the Spatial Reference System in which the coordinates are expressed.
| CREATE FUNCTION GeomFromText(wkt string, srid SMALLINT) RETURNS Geometry |
| CREATE FUNCTION PointFromText(wkt string, srid SMALLINT) RETURNS Point |
| CREATE FUNCTION LineFromText(wkt string, srid SMALLINT) RETURNS LineString |
| CREATE FUNCTION PolyFromText(wkt string, srid SMALLINT) RETURNS Polygon |
| CREATE FUNCTION MPointFromText(wkt string, srid SMALLINT) RETURNS MultiPoint |
| CREATE FUNCTION MLineFromText(wkt string, srid SMALLINT) RETURNS MultiLineString |
| CREATE FUNCTION MPolyFromText(wkt string, srid SMALLINT) RETURNS MultiPolygon |
| CREATE FUNCTION GeomCollectionFromText(wkt string, srid SMALLINT) RETURNS MultiPolygon |
| -- alias |
| CREATE FUNCTION PolygonFromText(wkt string, srid SMALLINT) RETURNS Polygon |
| CREATE FUNCTION AsText(p Point) RETURNS STRING |
| CREATE FUNCTION AsText(c Curve) RETURNS STRING |
| CREATE FUNCTION AsText(l LineString) RETURNS STRING |
| CREATE FUNCTION AsText(s Surface) RETURNS STRING |
| CREATE FUNCTION AsText(p Polygon) RETURNS STRING |
| CREATE FUNCTION AsText(p MultiPoint) RETURNS STRING |
| CREATE FUNCTION AsText(c MultiCurve) RETURNS STRING |
| CREATE FUNCTION AsText(l MultiLineString) RETURNS STRING |
| CREATE FUNCTION AsText(s MultiSurface) RETURNS STRING |
| CREATE FUNCTION AsText(p MultiPolygon) RETURNS STRING |
| CREATE FUNCTION AsText(g Geometry) RETURNS STRING |
The following functions perform analysis operations on geometries:
| CREATE FUNCTION Area(g Geometry) RETURNS FLOAT |
| CREATE FUNCTION Length(g Geometry) RETURNS FLOAT |
| CREATE FUNCTION Distance(a Geometry, b Geometry) RETURNS FLOAT |
| CREATE FUNCTION Buffer(a Geometry, distance FLOAT) RETURNS Geometry |
| CREATE FUNCTION ConvexHull(a Geometry) RETURNS Geometry |
| CREATE FUNCTION Intersection(a Geometry, b Geometry) RETURNS Geometry |
| CREATE FUNCTION "Union"(a Geometry, b Geometry) RETURNS Geometry |
| CREATE FUNCTION Difference(a Geometry, b Geometry) RETURNS Geometry |
| CREATE FUNCTION SymDifference(a Geometry, b Geometry) RETURNS Geometry |
| CREATE FUNCTION Dimension(g Geometry) RETURNS integer |
| CREATE FUNCTION GeometryTypeId(g Geometry) RETURNS integer |
| CREATE FUNCTION SRID(g Geometry) RETURNS integer |
| CREATE FUNCTION Envelope(g Geometry) RETURNS Geometry |
| CREATE FUNCTION IsEmpty(g Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION IsSimple(g Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Boundary(g Geometry) RETURNS Geometry |
| CREATE FUNCTION Equals(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Disjoint(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION "Intersect"(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Touches(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Crosses(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Within(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Contains(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Overlaps(a Geometry, b Geometry) RETURNS BOOLEAN |
| CREATE FUNCTION Relate(a Geometry, b Geometry, pattern STRING) RETURNS BOOLEAN |
| SQL functions on Point |
| CREATE FUNCTION X(g Geometry) RETURNS double |
| CREATE FUNCTION Y(g Geometry) RETURNS double |
| CREATE FUNCTION Point(x double,y double) RETURNS Point |
| SQL functions on Curve |
| CREATE FUNCTION IsRing(l LineString) RETURNS BOOLEAN |
| CREATE FUNCTION StartPoint(l LineString) RETURNS Point -- not yet implemented |
| CREATE FUNCTION EndPoint(l LineString) RETURNS Point -- not yet implemented |
| SQL functions on LineString |
| CREATE FUNCTION NumPoints(l LineString) RETURNS integer -- not yet implemented |
| CREATE FUNCTION PointN(l LineString,i integer) RETURNS Point -- not yet implemented |
| SQL functions on Surface |
| CREATE FUNCTION PointOnSurface(s Surface) RETURNS Point -- not yet implemented |
| CREATE FUNCTION Centroid(s Surface) RETURNS Point -- not yet implemented |
| SQL functions on Polygon |
| CREATE FUNCTION ExteriorRing(s Surface) RETURNS LineString -- not yet implemented |
| CREATE FUNCTION NumInteriorRing(s Surface) RETURNS integer -- not yet implemented |
| CREATE FUNCTION InteriorRingN(s Surface,n integer) RETURNS LineString -- not yet implemented |
| SQL functions on GeomCollection |
| -- Unimplemented Documentation |
| CREATE FUNCTION NumGeometries(GeomCollection c) RETURNS integer -- not yet implemented |
| CREATE FUNCTION GeometryN(GeomCollection c,n integer) RETURNS Geometry -- not yet implemented |
The stream processing facilities of MonetDB are best illustrated using a minimalist example, where a sensor sends events to the database, which are picked up by a continuous query and sent out over a stream towards an actuator. To run the example, you should have a MonetDB binary with DataCell functionality enabled. This will ensure that the required libraries are loaded and the SQL catalog is informed about the stream specific functions/operators. It will also create the DataCell schema, which is used to collect compiled continuous queries. The final step in the startup is to enable the DataCell optimizer pipeline.
sql> set optimizer = 'datacell_pipe';
sql> create table datacell.bsktin (id integer, tag timestamp, payload integer);
sql> create table datacell.bsktout (like datacell.bsktin);
sql> call datacell.receptor('datacell.bsktin', 'localhost', 50500);
sql> call datacell.emitter('datacell.bsktout', 'localhost', 50600);
sql> call datacell.query('datacell.pass', 'insert into datacell.bsktout select * from datacell.bsktin;');sql> call datacell.resume();
After these simple steps, it suffices to hook up a sensor to sent events to the DataCell and to hook up an actuator to listen for response events. The result of this experiment will be a large number of randomly generated events passing through the stream engine in a bulk-fashion.
$ nc -l -u localhost 50600 &$ sensor --host=localhost --port=50500 --events=1000 --columns=3 &
The Linux netcat (nc) can be used as a strawmen's actuator to monitor the output of the DataCell. The distribution comes with a sensor and actuatoror simulator. The DataCell source code contains a fire detection scenario to exercise the DataCell and forms as a basis for cloning your own application.
The example reconsidered
The DataCell operates on relational tables. The first action is to identify all such tables and redefine them as baskets by attaching them to receptors, emittors, or intermittent baskets.
sql> call datacell.receptor('datacell.bsktin', 'localhost', 50500);
sql> call datacell.emitter('datacell.bsktout', 'localhost', 50600);
A receptor thread is attached to the 'bsktin' basket on a TCP stream on port 50500 by default over which we receive tuples in CSV format. The number of fields and their lexical convention should comply with the corresponding table definition. The same semantics apply to the format as you would normally use when a COPY INTO command over a CSV file is given. The receptor mode is either active or passive. In passive mode, the default setting, it is the sensor that takes the initiative in contacting the streaming engine to deposits events. In the active mode, it is the streaming engine that contacts the sensor for more events. Note, the receptor becomes active only after you issue the datacall.resume('bsktin') or datacell.resume() operation. The calls shown are actually a shorthand for the more verbose version, where protocol and mode are made explicit.
sql> call datacell.receptor('datacell.bsktin', 'localhost', 50500,'tcp','active');
sql> call datacell.emitter('datacell.bsktout', 'localhost', 50600,'udp','passive');
The sensor simulator is geared at testing the infrastructure and takes short cuts on the event formats sent. Currently, it primarily generates event records starting with an optional event identifier, followed by an optional timestamp, and a payload of random integer values. To generate a test file with 100 events for the example, you can rely mostly on the default settings. Hooking up the sensor to the stream engine merely requires an additional hostname and port instead of a file argument. A glimpse of the sensor interaction can be obtained using --trace, which writes the events to standard output, or a specific file. The sensor simulator asks for user input before it exits. This way, the receiving side can pick up the events and not be confronted with a possible broken UDP channel.
$ sensor --events=100 --protocol=debug --columns=3
1,306478,1804289383
... 98 more ...100,137483,1956297539$ sensor --host=localhost --port=50500 --events=100 --columns=3
An alternative scheme is to replay an event log using the --file and the --replay option. It reads an event possibly with a fixed delay (--delay=<milliseconds>), and sents it over the receptor. An exact (time) replay calls for identifying the column with the temporal information, i.e. using option --time=<field index>.
After this step, the events have been picked up by the receptor and added to the basket datacell.bsktin. This table can be queried like any other table, but be aware that it may be emptied concurrently. The next step is define a continuous query, which in this case passes the input receveid to the output channel. Reception and emitting can be temporarily interrupted using the datacell.pause(objectname) operation.
After registration of the query, the datacell module contains the necessary optimized code for the continuous query processing. The scheduler is subsequently restarted using datacell.resume(), which moves the data from the bsktin into bsktout when it arrives. You can check the result using ordinary SQL queries over the table producing functions:datacell.receptors(), datacell.emitters(), datacell.baskets() and datacell.queries();
sql> call datacell.query('datacell.pass', 'insert into datacell.bsktout select * from datacell.bsktin;');
The DataCell Architecture
The DataCell approach is easily understood using previously mentioned example. The sensor program is a simulator of a real-world sensor which emits events at a regular interval, e.g. a temperature, humidity, noise, etc. The actuator is a device simulator that is controlled using events received, e.g. a fire alarm. The sensors and actuators work independently. They are typically proprietary devices that communicate with a controlling station using a wired network. The only requirement in the DataCell is that devices can communicate using the UDP protocol to deliver events by default with the most efficient event message format CSV. Alternative message format handlers can readily be included by extending the formats recognized by the adaptors or as a simple filter between the device and the DataCell.

Baskets
The basket is the key data structure of the streaming engine. Its role is to hold a portion of an event stream, also denoted as an event window. It is represented as a temporary main-memory table. Unlike other stream systems there is no a priori order or fixed window size. The basket is simply a (multi-) set of event records received from an adapter or events ready to be shipped to an actuator. There is no persistency and no transaction management over the baskets. If a basket should survive session brackets, its content should be inserted into a normal table. The baskets can be queried with SQL like any other table, but concurrent actions may leave you with a mostly empty table to look at.
Adapters
The receptor and emitter adapters are the interface units in the DataCell to interact with sensors and actuators. Both communicate with their environment through a channel. The default channel is a UDP connection for speed. By default the receptor is a passive thread, opening a channel and awaiting events to arrive. Contrary, the emitter is an active thread, which immediately throws the events on the channel identified. Hooks have been created to change the roles, e.g. the receptor polling a device and emitter to wait for polling actuators.
Events that can not be parsed are added to the corresponding basket as an error. All errors collected can be inspected using the table producing function datacell.errors().
Continuous queries
The continuous queries are expressed as ordinary SQL queries, where previously declared basket tables are recognized by the DataCell optimizer. For convenience they can be packed in a procedure, where the events from a basket can be delivered to multiple baskets. Access to these tables is replaced and interaction with the adapters is regulated with a locking scheme. Mixing basket tables and persistent tables is allowed. An SQL procedure can be used to encapsulate multiple SQL statements and deliver the derived events to multiple destinations.
Continuous queries often rely on control over the minimum/maximum number of events to consider when the query is executed. This information is expressed as an ordinary predicate in the where clause. The following pre-defined predicates are supported. They inform the DataCell scheduler when the next action should be taken. They don't affect the current query, which allows for a dynamic behavior. The window slide size can be calculated with a query. It also means that a startup query is needed to inform the scheduler the first time or set the properties explicitly using datacell.basket() and datacell.beat() calls.
datacell.threshold(B,N) |
query is only executed when the basket B has at least size N |
datacell.window(B,M,S) |
extract a window of at most size M and slide with size S afterwards |
datacell.window(B,T,Ts) |
extract a window based on a temporal interval of size T followed by a stride Ts |
datacell.beat(B,T) |
next query is executed after a T milliseconds delay (excluding query execution time) |
The sliding windows constraints are mutually exclusive. Either one slide based on the number of events is consumed or the time window. For time slicing, the first timestamp column in the basket is used as frame of reference. This leaves all other temporal columns as ordinary attributes.
The status of the DataCell is mapped onto a series of table producing SQL functions. The status of the DataCell can be queried using the table producing functions datacell.baskets(), datacell.receptors(), datacell.emitters() and datacell.queries().
sql>sql>select * from datacell.receptors();
+-----------------+-----------+-------+----------+---------+---------+----------------------------+--------+----------+---------+
| nme | host | port | protocol | mode | status | lastseen | cycles | received | pending |
+=================+===========+=======+==========+=========+=========+============================+========+==========+=========+
| datacell.bsktin | localhost | 50500 | TCP | passive | running | 2012-08-15 19:31:28.000000 | 2 | 20 | 0 |
+-----------------+-----------+-------+----------+---------+---------+----------------------------+--------+----------+---------+
1 tuple (1.800ms)
sql>select * from datacell.emitters();
+------------------+-----------+-------+----------+--------+---------+----------------------------+--------+------+---------+
| nme | host | port | protocol | mode | status | lastsent | cycles | sent | pending |
+==================+===========+=======+==========+========+=========+============================+========+======+=========+
| datacell.bsktout | localhost | 50600 | UDP | active | running | 2012-08-15 19:31:28.000000 | 2 | 10 | 0 |
+------------------+-----------+-------+----------+--------+---------+----------------------------+--------+------+---------+
1 tuple (1.725ms)
The receptors and emitters are qualified by their communication protocal and modes. The last time they have received/sent events is shown. The events not yet handled by a continuous query are denoted as pending.
sql>select * from datacell.baskets();
+---------------------+-----------+---------+-----------+-----------+------------+------+----------------------------+--------+
| nme | threshold | winsize | winstride | timeslice | timestride | beat | seen | events |
+=====================+===========+=========+===========+===========+============+======+============================+========+
| datacell.bsktmiddle | 0 | 0 | 0 | 0 | 0 | 0 | 2012-08-15 19:31:28.000000 | 0 |
| datacell.bsktin | 0 | 0 | 0 | 0 | 0 | 0 | 2012-08-15 19:31:28.000000 | 0 |
| datacell.bsktout | 0 | 0 | 0 | 0 | 0 | 0 | 2012-08-15 19:31:28.000000 | 0 |
+---------------------+-----------+---------+-----------+-----------+------------+------+----------------------------+--------+
3 tuples (1.639ms)
sql>select * from datacell.queries();
+-----------------+---------+----------------------------+--------+--------+------+-------+---------------------------------------------------------------------------------+
| nme | status | lastrun | cycles | events | time | error | def |
+=================+=========+============================+========+========+======+=======+=================================================================================+
| datacell.pass | running | 2012-08-15 19:31:28.000000 | 6 | 20 | 613 | | insert into datacell.bsktmiddle select * from datacell.bsktin; |
| datacell.filter | running | 2012-08-15 19:31:28.000000 | 4 | 7 | 653 | | insert into datacell.bsktout select * from datacell.bsktmiddle where id %2 = 0; |
+-----------------+---------+---------------------------+--------+--------+------+-------+----------------------------------------------------------------------------------+
The baskets have properties used by the scheduler for emptying them. The events pending are shown. The continuous queries are marked with how often they have been selected for execution, the total number of events take from all input baskets, the total execution time and their definition.
The sensor simulator is geared at testing the total infrastructure and takes short cuts on the event formats sent. Currently, it primarily generates event records starting with an optional event identifier, followed by an optional timestamp, and a payload of random integer values.
sensor [options]
--host=<host name>, default=localhost
--port=<portnr>, default=50500
--sensor=<name>
--protocol=<name> udp or tcp(default)
--increment=<number>, default=1
--timestamp, default=on
--columns=<number>, default=1
--events=<events length>, (-1=forever,>0), default=1
--file=<data file>
--replay use file or standard input
--time=<column> where to find the exact time
--batch=<batchsize> , default=1
--delay=<ticks> interbatch delay in ms, default=1
--trace=<trace> interaction
--server run as a server
--client run as a client
To generate a test file with 100 events for the example, you can rely mostly on the default settings. Hooking up the sensor to the stream engine merely requires an additional hostname and port instead of a file argument. A glimpse of the sensor interaction can be obtained using the 'debug' protocol, which writes the events to standard output, or a specific file. The status of the DataCell can be checked with datacell.dump(), which now shows a hundred events gathered in the basket bsktin. The sensor simulator asks for user input before it exits. This way, the receiving side can pick up the events and not be confronted with a broken UDP channel.
$ sensor --events=100 --protocol=debug --columns=3
1,306478,1804289383
... 98 more ...100,137483,1956297539$ sensor --host=localhost --port=50500 --events=100 --columns=3
To test the DataCell, the distribution contains a simple event simulator. It generates a stream of MonetDB tuple values containing only random integers. Each tuple starts with a time stamp it has been created.
The actuator simulator provides the following options:
actuator [options]--host=<host name>, default localhost
--port=<portnr>, default 50600
--protocol=<name> either tcp/udp, default tcp
--actuator=<actuator name> to identify the event received
--server run as a server (default)
--client run as a client
--events=<number> number of events to receive
--statistics=<number> show statistics after a series of events
Jacqueline: JSON Query Language for MonetDB
MonetDB/JAQL is an implementation of the Query Language for JavaScript Object Notation (JSON) [6] on top of MonetDB's relational column-store engine. It implements JAQL Core from its specifications, thereby ignoring the Hadoop-centric view that flows through the various examples. The result is a pure column-store JSON query processing system, benefitting from the power of the MonetDB engine.
Note: MonetDB/JAQL was first released as a beta in the Jul2012 release. It is, however, still a work in progress!
JAQL is a query language for the JSON data-format. JSON [7] itself is a free-form format that allows to e.g. express hierarchical data or mix datatypes. It is similar to XML in this respect, albeit a whole lot more limited, which greatly simplifies working with JSON data. Increasing popularity for the JSON format has stimulated support for JSON in many popular programming languages, and development of query languages. We have chosen to implement JAQL Core, for it appears to be designed with the same simplicity in mind as JSON itself, unlike XQuery alike alternatives.
The driving force behind JAQL is the use of pipes to create a flow of JSON data between operations. Typically, a pipeline starts with a data source to operate on. Operations are chained, operating on the output of the previous operation. This constitutes in a logical flow per operator, suitable for MapReduce-like parallelisation. The core operations from JAQL are similar to SQL's operations. This includes a selection operation filter and general purpose projection operation transform.
To quickly give an impression on JAQL, an example query that selects data from an input and projects it into another shape:
[
{"name": "Fabian", "data": [1, 3, 4]},
{"name": "Niels", "data": [3, 5]},
{"name": "Martin"}
]
-> filter 3 in $.data
-> transform $.name;
would result into an array with two members: [ "Fabian", "Niels" ].
The current state of MonetDB/JAQL has implemented JAQL Core. The core operators can be found in the JAQL documentation. It is important to note that MonetDB/JAQL differs from the original JAQL specification in many subtle ways.
[ { a: 1 } ] is not allowed, because the JSON valid syntax is [ { "a": 1 } ].a{* - .b}. Selecting all members (a.*) is supported, though.The SQL/XML standard defines the mechanisms to produce XML formatted results from relational queries and to import relational data from XML documents. The SQL/XML 2003 definition and its enhancements in SQL/XML 2005 are the frame of reference.
The implementation in MonetDB/SQL is initially geared at providing the basic functionality for publishing XML and simple querying. A more complete and optimized implementation of XQuery is already available with MonetDB/XQuery. However, data between the two systems is not shared. They are managed by a different server implementation.
SQL/XML introduces a new datatype xml. It can be used as a column type in table, view definitions, parameters, and variables. The datatype tells the system that values are properly structured XML objects.
The SQL/XML implementation uses the string type as the carrier for XML values and relies on the widely available library libxml2 for additional functionality. It is a poor man's approach compared to MonetDB/XQuery.
XML data can be inserted into a database column using any of the available APIs. A single document is added to a XML column using the blob operation file().
insert into dossier(id,doc) values(20070831, file("/tmp/letter"));
Often an XML document contains a heterogenous collection of objects, which should be broken into pieces before they are stored in the database. There are many ways to shred a document and often an application specific front-end application is needed. However, MonetDB SQL/XML provides a few simple operations to cope with the majority of situations.
copy into database from '/tmp/jacktheripper.xml';
copy into victim(name,hair) from '/tmp/jacktheripper.xml'
delimiter 'victim(name,hair)'
The first statements reads the XML document and breaks it into a series of relational tables with foreign key references. It results in a structured object representation. A XML view named after the top level element is automatically defined to rebuild the original document.
The second example performs a top-down parse of the XML document. Within every victim element it extracts the sub-trees named name and hair. This scheme is equivalent to using XPath expressions victim[//name,//hair].
The XML publishing functions are designed to construct XML values from data stored in the database. They can be used anywhere a string value is allowed.
XML_value_expression : XML_primary
XML_primary : value expression primary
| XML_value_function
XML_value function : XML_comment
| XML_concatenation
| XML_element
| XML_forest
| XML_parse
| XML_serialize
| XML_PI
XML_comment : XMLCOMMENT '(' <string_value_expression> ')'
XML_concatenation : XMLCONCAT '(' XML_value_expression
{ ',' XML_value_expression }... ')'
XML_element : XMLELEMENT '(' NAME identifier
[ ',' XML_namespace_declaration ] [ ',' XML_attributes ]
[ { ',' XML_element_content }... [ OPTION XML_content_option]
[ XML_returning_clause ] ')'
XML_attributes : XMLATTRIBUTES '(' XML_attribute_list ')'
XML_attribute_list : XML_attribute [ { ',' XML_attribute }... ]
XML_attribute : value_expression [ AS identifier ]
XML_content_option : NULL ON NULL
| EMPTY ON NULL
| ABSENT ON NULL
| NIL ON NULL
| NIL ON NO CONTENT
XML_element_content : value_expression
XML_returning_clause :
RETURNING { CONTENT | SEQUENCE }
XML_forest : XMLFOREST '(' [ XML_namespace_declaration> ',' ]
forest_element list ')'
forest_element list : forest_element [ { ',' forest_element }... ]
forest_element : forest_element_value [ AS identifier ]
forest_element value : value_expression
XML_aggregate : XMLAGG '(' XML_value_expression
[ ORDER BY sort_specification_list ]
[ XML_returning_clause ] ')'
XML_PI : XMLPI '(' NAME identifier
[ ',' string_value_expression ] ')'
XML_parse : XMLPARSE '(' document_or_content_string_value_expression
[ XML_whitespace_option ] ')'
XML_whitespace_option : { PRESERVE | STRIP } WHITESPACE
XML_serialize:
XMLSERIALIZE '(' {DOCUMENT | CONTENT} value AS data_type
[ VERSION string_literal ]
[ ENCODING SQL_language_identifier]
[ [INCLUDING | EXCLUDING] XMLDECLARATIONS] ')'
XML_document predicate : XML_value_expression IS [ NOT ] DOCUMENT
|
The XML table construct provides a mechanism to extract a table from an XML document using simple path expression to the elements of interest.
XML_iterate : XMLITERATE '(' XML_value_expression ')'
XML_table : XMLTABLE '('
[ XML_namespace_declaration ',' ]
XML_table_row_pattern
[ XML_table_argument_list ]
COLUMNS XML_table_column_definitions ')'
XML_table_row_pattern :
character_string_literal
XML_table_argument_list :
PASSING XML_table_argument_passing_mechanism
XML_query_argument
[ { ',' XML_query_argument }... ]
XML_table_argument_passing mechanism : XML_passing_mechanism
XML_table_column_definitions :
XML_table_column_definition
[ { ',' XML_table_column_definition }... ]
XML_table_column_definition_:
XML_table_ordinality_column_definition
| XML_table_regular_column_definition
XML_table_ordinality_column_definition :
column_name FOR ORDINALITY
XML_table_regular_column_definition :
column_name_data_type [ XML_passing_mechanism ]
[ default_clause ]
[ PATH XML_table_column_pattern ]
XML_table_column_pattern : character_string_literal
|
Declare one or more XML namespaces and the encoding to use for binary strings.
XML_lexically_scoped_options :
XML_lexically_scoped _option [ ',' XML_lexically_scoped_option ]
XML_lexically_ scoped_option : XML_namespace_declaration
| XML_binary_encoding
XML_namespace_declaration :
XMLNAMESPACES '(' XML_namespace_declaration item
[ { ',' XML_namespace_declaration_item }... ] ')'
XML_namespace_declaration_item :
XML_regular_namespace_declaration_item
| XML_default_namespace_declaration_item
XML_namespace_prefix :
identifier
XML_namespace_URI :
character_string_literal
XML_regular_namespace_declaration_item :
XML_namespace URI AS XML_namespace_prefix
XML_default_namespace_declaration_item :
DEFAULT XML_namespace_URI
| NO DEFAULT
XML_binary_encoding :
XMLBINARY [ USING ] { BASE64 | HEX }
The SQL/XML module aims to support XQuery through a well-defined (and narrow) interface. It relies on linkage of the libxml2 library to explore the rich world of XPath processing.
Indicate a registered XML Schema, and (optionally) an XML namespace of that registered XML Schema, and (optionally) a global element declaration schema component of that registered XML Schema.
XML_valid_according_to_clause:
ACCORDING_TO_XMLSCHEMA_XML_valid_according_to_what
[_XML_valid_element_clause_]
XML_valid_according_to_what:
XML_valid_according_to_URI
|_XML_valid_according_to_identifier
XML_valid_according_to_URI:
URI_XML_valid_target_namespace_URI_[_<XML_valid_schema_location>_]
|_NO_NAMESPACE_[_XML_valid_schema_location_]
XML_valid_target_namespace_URI: XML_URI
XML_URI: character_string_literal
XML_valid_schema_location:
LOCATION_XML_valid_schema_location_URI
XML_valid_schema_location_URI: XML_URI
XML_valid_according_to_identifier:
ID_registered_XML_Schema_name
XML_valid_element_clause:
XML_valid_element_name_specification
|_XML_valid_element_namespace_specification
[_XML_valid_element_name_specification_]
XML_valid_element_name_specification:
ELEMENT_XML_valid_element_name
XML_valid_element_namespace_specification:
NO NAMESPACE
| NAMESPACE <XML valid element namespace URI>
|
| XML_character_string_serialization: XMLSERIALIZE '(' [ document_or_content ] XML_value_expression_AS_data_type [ XML_serialize_version ] [ XML_declaration_option ] ')' XML_declaration_option: INCLUDING XMLDECLARATION |_EXCLUDING XMLDECLARATION document_or_content: DOCUMENT | CONTENT XML_serialize_version: VERSION character_string_literal blob_value_function: |_XML_binary_string_serialization XML_binary_string_serialization: XMLSERIALIZE'(' [_document_or_content ] XML_value_expression_AS_data_type [ ENCODING_XML_encoding_specification ] [ XML_serialize_version ] [ XML_declaration_option ] right_paren XML_encoding_specification: XML_encoding_name XML_encoding_name: SQL_language_identifier |
Global variables are used to control choices.
SET XML OPTION { DOCUMENT | CONTENT };
Have you ever tried to work with XML documents? If so, you probably have noticed that these documents are not easy to handle. To query these documents a query-language has been created: XQuery and the XQuery Update Facility (XQUF). What SQL is for relational databases, is XQuery/XQUF for XML-databases. However, existing prototype implementations are often too slow to be used beyond toy examples.
MonetDB/XQuery provides XML database functionality comprising quite complete support for the XQuery language, including modules, transaction-safe XQUF updates, user-defined functions, and a query cache (using XQuery modules). The system has very high performance and is scalable to large XML collections.
March 2011: MonetDB/XQuery project is frozen. Due to lack of development and manpower to port the software to MonetDB version 5, we had to freeze the code base. The content of the XQuery website is, as are the prime distribution packages. We do not fix any bugs or problems with MonetDB/XQuery.
Links:
[1] http://www.pathfinder-xquery.org/
[2] http://www.monetdb.org/drupal/Home/Features
[3] http://trac.osgeo.org/geos/
[4] http://www.openjump.org
[5] http://www.opengeospatial.org/
[6] http://code.google.com/p/jaql/
[7] http://json.org/