The DataCell Architecture
The DataCell approach is easily understood using previously mentioned example. The
sensor program is a simulator of a real-world sensor which emits events at a regular interval, e.g. a temperature, humidity, noise, etc. The
actuator is a device simulator that is controlled using events received, e.g. a fire alarm. The sensors and actuators work independently. They are typically proprietary devices that communicate with a controlling station using a wired network. The only requirement in the DataCell is that devices can communicate using the UDP protocol to deliver events by default with the most efficient event message format CSV. Alternative message format handlers can readily be included by extending the formats recognized by the adaptors or as a simple filter between the device and the DataCell.
The basket is the key data structure of the streaming engine. Its role is to hold a portion of an event stream, also denoted as an event window. It is represented as a temporary main-memory table. Unlike other stream systems there is no a priori order or fixed window size. The basket is simply a (multi-) set of event records received from an adapter or events ready to be shipped to an actuator. There is no persistency and no transaction management over the baskets. If a basket should survive session brackets, its content should be inserted into a normal table. The baskets can be queried with SQL like any other table, but concurrent actions may leave you with a mostly empty table to look at.
emitter adapters are the interface units in the DataCell to interact with sensors and actuators. Both communicate with their environment through a channel. The default channel is a UDP connection for speed. By default the receptor is a passive thread, opening a channel and awaiting events to arrive. Contrary, the
emitter is an active thread, which immediately throws the events on the channel identified. Hooks have been created to change the roles, e.g. the
receptor polling a device and emitter to wait for polling actuators.
Events that can not be parsed are added to the corresponding basket as an error. All errors collected can be inspected using the table producing function
The continuous queries are expressed as ordinary SQL queries, where previously declared basket tables are recognized by the DataCell optimizer. For convenience they can be packed in a procedure, where the events from a basket can be delivered to multiple baskets. Access to these tables is replaced and interaction with the adapters is regulated with a locking scheme. Mixing basket tables and persistent tables is allowed. An SQL procedure can be used to encapsulate multiple SQL statements and deliver the derived events to multiple destinations.
Continuous queries often rely on control over the minimum/maximum number of events to consider when the query is executed. This information is expressed as an ordinary predicate in the where clause. The following pre-defined predicates are supported. They inform the DataCell scheduler when the next action should be taken. They don't affect the current query, which allows for a dynamic behavior. The window slide size can be calculated with a query. It also means that a startup query is needed to inform the scheduler the first time or set the properties explicitly using
||query is only executed when the basket B has at least size N|
||extract a window of at most size M and slide with size S afterwards|
||extract a window based on a temporal interval of size T followed by a stride Ts|
||next query is executed after a T milliseconds delay (excluding query execution time)|
The sliding windows constraints are mutually exclusive. Either one slide based on the number of events is consumed or the time window. For time slicing, the first timestamp column in the basket is used as frame of reference. This leaves all other temporal columns as ordinary attributes.