Query History

Query History mk Sat, 03/09/2013 - 17:36

The SQL implementation comes with a simple query profiler to detect expensive queries. It is centered around two predefined internal tables that store the definitions of all executed SQL queries and their execution time.

Query logging can be started by calling the procedure querylog_enable(), which saves some major compilation information of a query in the 'querylog_catalog' table:

sys.querylog_catalog
name type references description
"id" OID    
"owner" STRING   The SQL user who has executed this query.
"defined" TIMESTAMP   Time when the query was started
"query" STRING   The query that has been executed
"pipe" STRING   The optimiser pipe line that has been used
"plan" STRING   Name of its MAL plan
"mal" INTEGER   Size of its MAL plan in the number of statements
"optimize" BIGINT   Time in microseconds for the optimiser pipeline

Query logging can be stoped by calling procedure querylog_disable().

The query performance is stored in the table  'querylog_calls'. The owner of the query definition is also the one who is referenced implicitly by the 'id' of a call event. The key timing attributes are 'run', i.e. the time to execute the query to produce the result set, and 'ship', i.e. the time to render the result set and sent it to the client. All times are in microseconds.

The remaining parameters illustrate resource claims. The 'tuples' attribute denotes the size of the result set in the number of rows. The 'cpu' load is derived from the operating system statistics (Linux only) and is given as a percentage. The same holds for the 'io' waiting time.

sys.querylog_calls
name type references description
"id" OID sys.querylog_catalog.id  
"start" TIMESTAMP   time the statement was started
"stop" TIMESTAMP   time the statement was completely finished
"arguments" STRING   actual call structure
"tuples" BIGINT   number of tuples in the result set
"run" BIGINT   time spent (in usec) until the result export
"ship" BIGINT   time spent (in usec) to ship the result set
"cpu" INT   average cpu load percentage during execution
"io" INT   percentage time waiting for IO to finish

The view 'sys.querylog_history' includes some useful information from both tables:

create view sys.querylog_history as
    select qd.*, ql."start",ql."stop", ql.arguments, ql.tuples, ql.run, ql.ship, ql.cpu, ql.io
    from sys.querylog_catalog() qd, sys.querylog_calls() ql
    where qd.id = ql.id and qd.owner = user;

The following code snippet illustrates its use:

sql>call sys.querylog_enable();
sql>select 1;
sql>select 1;
sql>select * from sys.querylog_catalog;
sql>select * from sys.querylog_calls;
sql>select * from sys.querylog_history;
sql>select id, query, avg(run) from sys.querylog_history group by id,query;
sql>call sys.querylog_disable();
sql>call sys.querylog_empty();

The 'sys.querylog_enable()' function also take a parameter, 'threshold', which is an integer in millisecond. When the query log is enabled with this parameter, it will only log those queries whose execution times are longer than the threshold.  This feature can be handy to prevent the database from being swarmed by too many short running queries, hence reduce the overhead incurred by the query log (see below), while helping the DBA detecting expensive queries.

Disabling the query log will not remove existing query logs; it only prevents subsequent queries to be logged. Once the query log is re-enabled, information of subsequently executed queries will be appended to the existing query logs.

Query logs are stored in persistent tables, ie they will survice a MonetDB server restart. They can only be removed 'sys.querylog_empty()'. A downside of this implementation is its relative high overhead because every read query will trigger a write transaction.