Astronomy and technology
Astronomy and technology inspire each other. Thanks to the latest advances in technology we can build instruments that are not only better than ever before but also totally new in design. This allows us to observe the skies and heavens "from new angles" with unprecedented space and time resolutions. No doubt that technological developments in many fields thrive on the scientific discoveries that these advanced instruments will make. In this regard, the entanglement of the two disciplines has delivered us a lot over the past 400 years.
And so will the Large Synoptic Survey Telescope (LSST). With its large primary mirror diameter of 8.4m and the world's largest digital camera, LSST can make snapshot images nearly 50 times the area of the Moon every minute. LSST's mission is to find transient and variable sources, find the origins of the mysterious dark energy that is driving the acceleration of the cosmic expansion, find out what dark matter is, how it is distributed, and how its properties affect the formation of stars, galaxies, and larger structures and to study the formation of the Milky Way, just to name a few.
A key idea to make this a succes is to monitor the billions of objects, stars and galaxies, that will be (re)detected in six alternating colour bands every minute over the course of many years. Therefore a robust and fast database framework is needed that is able to archive and access the zillions of entries.
Moreover, the database should ease scientific, e.g., time series, analysis and data mining algorithms and serve as the back-end for visualisation tools.
This blog focuses on the performance of the set of existing LSST baseline queries running on simulated LSST data stored in the column-oriented database system of MonetDB. It is compared with the performance in MySQL, a row-oriented database system.
LSST scientists and engineers have carefully estimated the expected data volumes and they are very impressive. Nearly 50 billion distinct objects will be found, which will be redetected little less than 1000 times, accumulating into about 30 trillion observations. All these need to be stored and catalogued for current or later access and analytical processes. It boils down to a total of 83 PB of compressed data. The largest database table size is about 5 PB and has about 50 trillion rows. Furthermore, this database should be robust to ad-hoc and visualisation queries. One can imagine, that it will be a gold mine for finding new, unexpected, never thought of, astrophysical sources and phenomena. So far, no database system is around that can manage these amounts of data.
However, bringing a system that is capable of working with these sizes, realistic simulations should push database developments in the right direction and on the other hand cutting-edge database technologies will make it possible to carry out new scientific analyses on huge data streams. It is envisaged that a database system in the PB regimes needs to distributed over multiple nodes and tables need to be partitioned horizontally.
2 TB of artificial csv files
For some years now, the Data Management group of LSST generates large artificial csv datasets for their experiments. These csv files contain many properties of fake sources as if they were detected by LSST. Loading csv data into their database system and executing the most important baseline queries on the largest tables will gain better insight into the performance and possible bottlenecks.
One of the envisaged methods to structure the huge data volumes is to partition the sky into many (millions) tiles, where each tile is crowded with millions of sources. Data from the tiles will be distributed over multiple cluster nodes and the partitions match the tiling scheme. LSST uses the dedicated Qserv package to parse queries in a map-reduce way in a distributed relational database system, where data are spatially sharded with overlapping regions. Qserv UDFs can be used in geometrical query expressions and it is optimised for source neighbour and cross-match queries, because these are the most ubiquitous types of queries to be expected. More details can be found in the Qserv github repository.
Fabrice Jammes and his team from LSST-France, kindly provided us a 2 TB set of artificial LSST csv files. Although their Summer 2015 database size of 33 TB is 25 times larger, it is a good starting point to do an initial comparison analysis. Currently they are running query experiments with database sizes of about 100 TB.
Experimental set up
So, we have in total 1300 csv files of several GBs each, with a total of 2 TB. First, we propose to test the loading and querying performance on a single node in the CWI SciLens cluster. Later on, in other experimental setups, we will proceed doing this on larger data sets and on multiple, say 10, nodes in the SciLens cluster.
The single SciLens node we picked for our experiments was a stones node. Its main characteristics are 256 GB of RAM, 8.1 TB (3x SW RAID0) HDD, 2.6-3.4 GHz Intel Xeon E5-2650 v2 CPU, running Linux Fedora 4.7.9-200.fc24. Note that in the LSST MySQL hardware setup data was distributed over 50 nodes, each having 16 GB RAM, Intel Xeon E5-2603v2 1.80 Ghz 4 core processors and 8 TB of HDD storage.
All the code used is open and freely available at the CWI git reposity lsst_blog. It includes shell and python scripts to install MonetDB, create the MonetDB database schema, partition the tables, load the csv files and execute the 13 baseline queries in hot and cold modes and a python script to plot the resulting query run times. The actual 2TB of csv files, however, aren't included and should be requested from the LSST DM team.
Just like the LSST MySQL schema, we partition the largest tables into multiple subtables. The csv file extension determines the sky region and corresponds to the its respective subtable. In our 2 TB setup the master tables have 325 table partitions. MonetDB's merge table defintion combines the tables into one queryable table. Now this table can be queried using ordinary SQL. Under the hood the MonetDB kernel takes care of combining the results from the individual tables.
There are no further special table defintions. The only thing is that the tables, even the partitions have a lot of entries.
Below are the 13 LSST baseline queries. They were adapted from the LSST S15 Large Scale Tests. All could be reused, except for two which we had to rewrite to plain SQL to overcome the special Qserv UDFs. Note that Q13 in MonetDB uses the alpha function. It is based on the RA inflation function developed for the SDSS SkyServer database and takes into account that for conical searches RA increases with increasing (absolute) declination.
-- Q01 SELECT COUNT(*) FROM object; -- Q02 SELECT COUNT(*) FROM source; -- Q03 SELECT COUNT(*) FROM forcedsource; -- Q04 SELECT ra, decl FROM Object WHERE deepSourceId = 3306154155315676; -- Q05 /* QservSQL: SELECT ra, decl FROM Object WHERE qserv_areaspec_box(0.95, 19.171, 1.0, 19.175);*/ SELECT ra, decl FROM object WHERE decl BETWEEN 19.171 AND 19.175 AND ra BETWEEN 0.95 AND 1.0; -- Q06 SELECT COUNT(*) FROM object WHERE y_instFlux > 5; -- Q07 SELECT MIN(ra), MAX(ra), MIN(decl), MAX(decl) FROM object; -- Q08 SELECT COUNT(*) FROM source WHERE flux_sinc BETWEEN 1 AND 2; -- Q09 SELECT COUNT(*) FROM source WHERE flux_sinc BETWEEN 2 AND 3; -- Q10 SELECT COUNT(*) FROM forcedsource WHERE psfFlux BETWEEN 0.1 AND 0.2; -- Q11 SELECT COUNT(*) FROM object o ,source s WHERE o.deepSourceId = s.objectId AND s.flux_sinc BETWEEN 0.13 AND 0.14; -- Q12 SELECT COUNT(*) FROM object o ,forcedsource f WHERE o.deepSourceId = f.deepSourceId AND f.psfFlux BETWEEN 0.13 AND 0.14; -- Q13 /* QservSQL: SELECT COUNT(*) FROM Object o1 ,Object o2 WHERE qserv_areaspec_box(90.299197, -66.468216, 98.762526, -56.412851) AND scisql_angSep(o1.ra, o1.decl, o2.ra, o2.decl) < 0.015; */ SELECT COUNT(*) FROM object o1 ,(SELECT decl - 0.015 AS decl_min ,decl + 0.015 AS decl_max ,ra - alpha(decl, 0.015) AS ra_min ,ra + alpha(decl, 0.015) AS ra_max FROM object WHERE ra BETWEEN 90.299197 AND 98.762526 AND decl BETWEEN -66.468216 AND -56.412851 ) o2 WHERE o1.ra BETWEEN 90.299197 AND 98.762526 AND o1.decl BETWEEN -66.468216 AND -56.412851 AND o1.decl BETWEEN decl_min AND decl_max AND o1.ra BETWEEN ra_min AND ra_max;
The loading of the 2 TB of csv files was done in nearly 15 hours and 50 minutes by using the COPY INTO statement. Although this is acceptable, the fastest option is to load binary data with the use of data vaults, a MonetDB implementation of SQL management of external data (MED). It is also blazingly fast when it comes to loading (binary) FITS files, mainly because no data-type parsing is needed.
The 2 TB of csv files were reduced to 1.3 TB of database storage for MonetDB.
We ran the 13 baseline queries in cold and hot modes. Cold is when the database server is stopped, all system caches are emptied and the server is restarted. For hot runs we don't restart the server, but let it running so that queries can benefit from data available in cache. We took the average from ten such runs and compared it with the corresponding reported MySQL query run times. These were adopted from the S15 Large Scale Tests. Error bars in the bar chart represent the standard deviations.
Note that the comparison is not totally fair, because the MySQL queries ran on a 33 TB database spread over 25 machines, whereas the MonetDB database size was one 1.3 TB slice of this on a single beefy machine. We do not know whether the MySQL performance numbers come from hot or cold runs.
What we can see from the graph is that almost all, except two, queries run more than two orders of magnitude faster, albeit on a smaller set of data. Nevertheless, if we take into account the factor of 25 for the data ratio and assume we can scale (pestimistically) linearly for the performance, this is still a comparable and fast approach.
Most probably the slower performance of Q04 and Q05 is due to the many unions of the subtables, clearly visible in the cold runs.
Concluding remarks and outlook
The LSST baseline queries run "smoothly" on a single MonetDB database server node, where the three largest tables were partitioned into hundreds of (sub)tables. The current query run times are presented as a baseline result. In the near future we will enhance the experiments by distributing the data over multiple nodes and by increasing the database size, so that comparisons with the LSST Large Scale Tests become fairer.
No database code had be to rewritten in order to run the LSST baseline queries. On the SQL side two queries (Q05 and Q13) were rewritten to circumvent the Qserv specific UDFs. Although the size of the MonetDB database was 25 times smaller than the MySQL database size, most of the query run times in the column-store MonetDB were several orders of magnitude faster than in the row-store MySQL.
These results are promising in the context of high-cadence astronomical database. It will be interesting to see how the baseline queries perform in a distributed and partitioned database configuration. And how they will do in a 25 times larger database. The SciLens cluster is ideal to enhance our LSST large scale tests.