Hi all & Stefan,

 

I mentioned my intra-query checkpointing work in my query about database sizing, Stefan expressed interest so I thought I’d give a brief overview in a separate topic for anyone interested…

 

This work was inspired by:

-          the observation that current database HA doesn't do much to protect in-flight queries, and

-          the existence of databases such as MonetDB, and SAP HANA that materialize their results and have graph-like query plans.

My work aims to leverage potential lower cost of checkpointing these materialized intermediates (since there is no materialization required), as well as the graph like structure of query plans to provide HA for queries.

This is intended to benefit long running queries, since the cost to simply repeating short queries is low.

 

The first aspect of my work is algorithms to produce checkpoint plans from query plan graphs.

Currently this planning is an offline step, based on the output of query profiling.

This work so far has discovered some interesting counter-intuitive but useful placement of checkpoints.

 

The second aspect is my implementation for evaluating the general mechanism, and specific algorithms.

I have implemented this in MonetDB. Some details:

-          I produce the graph structure for my algorithms from the exported MAL for a query.

-          Clients connect through a load balancer ("Persephone") that I have written for MonetDB.

-          This load balancer also orchestrates the mechanism it:

o    injects checkpoint plans into the database

o   receives checkpoints

o   detects failures

o   initiates replays on another database node by sending it a replay plan (checkpoints + skipped operations)

-          This process is transparent to clients and they require no modifications.

-          My modifications to MonetDB: (which shift most of the decision making to Persephone)

o   add a side-channel connection facilitating this orchestration.

o   Modify the mal interpreter:

§  When a query matches a checkpoint plan, the selected intermediates will be transmitted using the side-channel connection to Persephone.

§  To follow replay plans: using previously checkpointed intermediates, and entirely skipping operations that are now redundant due to checkpoints

 

Hopefully this answers some of your questions Stefan.

I’m happy to answer questions if anyone is interested.

 

Thanks,

~ Daniel.