With the release of MonetDB/e, we have a new member in the family of embedded DBMSs on Windows optimised for data analytics. How does it compare to some long term members in this family, e.g. SQLite3 and Microsoft SQLserver Express? Let’s take MonetDB/e on another quick trip with the NYC taxi benchmark as demonstrated in a Linux setting before. On Windows, it works in the same way. With the two standard commands pip install monetdbe and import monetdbe in your Python program, you are good to go.
The NYC taxi benchmark is based on a single month of data from the New York City Taxi & Limousine Commission Trip Record with ~1.5GB of raw data and three popular statistical queries:
- Compute the number of unique vendors, passengers, trip distances, fare, etc. This query stresses a DBMS by requiring a fair number of DISTINCT over the whole data set.
- Compute the frequencies of events per time unit. This query tests the aggregation performance of a DBMS.
- Compute the regression of the fares. This query is a machine learning example which not only tests the aggregation and filtering performance of a DBMS but also its support for essential statistical functions, such as STDDEV_SAMP.
For this benchmarking exercise, we ran a Python programme with MonetDB/e (v 0.9.0), SQLite3 (v 3.30.0) and Microsoft SQLserver Express 2019 on a Windows 10 box with an Intel I5 6600k CPU, 8GB RAM and an SSD. The data loading and query execution times of all three embedded DBMSs are shown in the table above.
The numbers illustrate again the strength of MonetDB/e for analytical queries, whose total execution time is ~10x faster than SQLite3 and ~20x faster than SQLserver Express. Even with the data loading time added, MonetDB/e comfortably outperforms the reference systems by a considerable margin.
However, such a simple taxi ride provides only a glimpse of the functionality and performance one might experience when joining the ride. No one-size-fits-all database management system exists, and MonetDB/e is not a silver bullet either. When looking for an embedded DBMS solution for one’s data analytics applications, the following aspects should be considered:
- Do you want to process data sets larger than several MBs?
- Do you want to exploit the multi-core hardware to do the job efficiently?
- Do you want the breadth of functionality offered by a full-fledged SQL DBMS?
- Do you need analytical functions, fast grouping and fast data loading?
- Do you want a seamless path from prototype development using the in-memory mode to server-based production deployment?
If you answered any of these questions with “Yes”, then MonetDB/e is worth consideration. Try it out on a sample application or get inspired by more showcases in the monetdbe-examples repository.
Please drop us a note about your experiences and pass on this message to those who can benefit from the performance of the new taxi driver on the block.
The MonetDB team,
November 11, 2020