Astronomy: Bulk Source Association
In the near future several optical and radio telescopes will produce large field-of-view images at second to minute cadence. One of their common and main goals is to find transient and variable events on these short time scales. As a consequence of the large images the source densities are larger than ever before and may peak up to 300,000 sources per image. Building time series (light curves) and finding changing patterns the sources are cross matched with potential counterparts in known catalogues of about 500 million to 1 billion sources.
It is this last part, the bulk association, that is the most challenging query from a database point of view. We want to keep the processing time as short as possible, because more queries need to run and the next image comes in pretty soon, and the next and the next... A typical bulk association time might should be below the 10% of the overall time to process the image.
Solutions exist, both implemented in C and SQL, whereas the latter works for MonetDB using the zone algorithm and for PostgreSQL using GiST indexing. However, the C function, using kdtree indexing is roughly an order of magnitude faster.
An example the MonetDB SQL implementation
SELECT runcatid ,xtrsrcid ,3600*DEGREES(2*ASIN(SQRT(dist)/2)) AS dist_arcsec FROM (SELECT z0.id AS runcatid ,t0.id AS xtrsrcid , (z0.x - t0.x) * (z0.x - t0.x) + (z0.y - t0.y) * (z0.y - t0.y) + (z0.z - t0.z) * (z0.z - t0.z) AS dist FROM rc_zone z0 ,(SELECT id ,decmin ,decmax ,ramin ,ramax ,x ,y ,z FROM xtrsrc_548 ) t0 WHERE z0."dec" BETWEEN t0.decmin AND t0.decmax AND z0.ra BETWEEN t0.ramin AND t0.ramax ) t1 WHERE t1.dist < isint2 ;