Query timing

Query timing mk Sun, 10/13/2013 - 14:28

Timing a query execution is supported in multiple ways, but largely depends on what you want to measure. Point-to-point wall clock time, or the actual behavior of kernel operations.

(1) The baseline is to use simple command line tools, such at TIME on Linux to assess the performance of running a script against mclient. Beware that /bin/time and /usr/bin/time are not the same, they mainly measure and report the wall-clock time spent by the given command/process. See their respective man pages for details.

(2) The next approach is to use the "--interactive" option of the mclient tool, which will report on the timing of each individual SQL query in a script in easy human consumable terms. It returns the wall-clock time between sending the query to the server and receiving the first block of answers. Its rendering can be controlled (see mclient).

(3) The query history can also be maintained in a separate log for post analysis. (see description)

Thus, (1) includes everything from loading the mclient binary and starting the mclient process, parsing the query in mclient, sending to the server, having the server execute the query and serialize the result, sending the result back to the client, to the client receiving, parsing and rendering the result, and sending the result to /dev/null ("for free"), to a file (I/O), or to a terminal (scrolling). (2) merely includes the time the server spends on receiving and executing the query and creating the result. The abovementioned costs on the client side to receive, parse, render, etc. the result are excluded. The same holds for (3)

A detailed time of an SQL query can be obtained with prepending the query with the modifier TRACE. It will produce a queryable table with a break down of all relational algebra operations (see TRACE command). The profiling tools stethoscope and tomograph provide further details for those interested in the inner working of the system. It provides a hook to many system parameters, e.g. input/output, CPU cycles, and threads' activities.

Timing a database query should be done with care. Often you will notice differences in response time for the same query ran multiple times. The underlying cause can be that the data itself resides on disk (slow) or is already avaiable in the memory caches (fast), a single user runs queries (fast) or has to compete with other users (slow), including competing with other processes on your box fighting over cpu, memory, and IO resources. As a precaution you might want to flush the system caches. The Windows tool flushes the cache. You'll need to press the "Flush Cache WS" and "Flush All Standby" buttons. On Linux you have to create a little job that consumes all memory.

For more general information on running experiments and measuring time, see our performance tutorial.

Query History

Query History mk Sat, 03/09/2013 - 17:36

The SQL implementation comes with a simple query profiler to detect expensive queries. It is centered around two predefined internal tables that store the definitions of all executed SQL queries definitions and their execution time.

Query logging can be started by calling the procedure querylog_enable(), which saves some major compilation information of a query in the 'querylog_catalog' table:

table sys.querylog_catalog(
   id oid,
   owner string,
   defined timestamp,
   query string,
   pipe string,
    -- Query pipeline
   "plan" string,  -- Name of MAL plan
   mal integer,    -- size of MAL plan in number of statements
   optimize bigint -- time in microseconds for optimizer pipeline
)

Query logging can be stoped by calling procedure querylog_disable().

The query performance is stored in the table  'querylog_calls'. The owner of the query definition is also the one that will be referenced implicitly from the call events. The key timing attributes are 'run', i.e. the time to prepare  the result set , and 'ship', i.e. the time to render the result set  and sent it to the client. All timing in microseconds.

The remaining parameters illustrate the resource claims. The 'tuples' attribute denotes the size of the result set in number of rows. The 'space' depicts the total  size of all temporary columns created during query execution. Note, the space is allocated and freed during the query execution, leading to a much less demanding actual memory footprint. The 'cpu' load  is derived from the operating system system statistics (Linux only) and is given as a percentage. The same holds for the io waiting time.

table sys.querylog_calls (
    id oid, -- references query plan
    "start" timestamp,-- time the statement was started
    "stop" timestamp,-- time the statement was completely finished
    arguments string, -- actual call structure
    tuples bigint, -- number of tuples in the result set
    run bigint, -- time spent (in usec) until the result export
    ship bigint, -- time spent (in usec) to ship the result set
    cpu int, -- average cpu load percentage during execution
    io int -- percentage time waiting for IO to finish
)

create view sys.querylog_history as
    select qd.*, ql."start",ql."stop", ql.arguments, ql.tuples, ql.run, ql.ship, ql.cpu, ql.space, ql.io
    from sys.querylog_catalog() qd, sys.querylog_calls() ql
    where qd.id = ql.id and qd.owner = user;

The following code snippet illustrates its use.

sql>call sys.querylog_enable();
sql>select 1;
sql>select 1;
sql>select * from sys.querylog_catalog;
sql>select * from sys.querylog_calls;
sql>select * from sys.querylog_history;
sql>select id, query, avg(run) from sys.querylog_history group by id,query;
sql>call sys.querylog_disable();
sql>call sys.querylog_empty();