"SpaceTime gives over 17 times faster response time for 250 million records and it is expected to perform even faster as the data volume increases."
This is the conclusion of the novel independent benchmarking spatio-temporal databases conducted by Ericsson Nikola Tesla. Ericsson’s study benchmarked Mireo’s SpaceTime spatio-temporal database and analytical platform with GeoMesa, the most prominent open-source spatio-temporal database. The research has been presented on the 29th International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2021).
The motive for benchmarking spatiotemporal databases was to expand on the notable Data Reply 2018 evaluation of 6 relevant Big Data spatiotemporal technologies (Hive, MongoDB, GeoSpark, Elasticsearch, GeoMesa and Postgres-XL). To recap, the UK Ministry of Defence engaged Data Reply to benchmark six prominent Big Data technologies with geospatial features. In this study, GeoMesa excelled as the most performant technology when benchmarking spatio-temporal features strictly.
Ericsson Nikola Tesla expanded the Data Reply’s work by benchmarking the performance and scalability between Mireo SpaceTime and GeoMesa. Mireo is happy to announce that the independent analysis has shown that SpaceTime outperformed GeoMesa in almost all benchmarks.
Ericsson’s benchmarking can be summarized as follows:
Performance benchmarks contain different sets of spatial, temporal, and spatio-temporal queries to test indexing techniques and data clustering.
SpaceTime performs better than GeoMesa, as well as having more stable response times with lower deviation. SpaceTime is expected to perform even faster compared to GeoMesa as the data volume increases.
SpaceTime outperforms GeoMesa by average 8 times faster response times. SpaceTime exhibits more stable performance, while GeoMesa gives steeper increase in response times.
SpaceTime gives more stable and faster response times than GeoMesa. SpaceTime exhibits more stable performance independent from the specific geo-location - its performance is almost entirely dependent on the number of records. Despite the equal number of records, GeoMesa shows significant dependence on a geo-location.
Scalability benchmarks test databases’ distribution and resource usage.
SpaceTime outperforms GeoMesa on larger datasets, going from a few times faster to over 17 times faster execution times.
SpaceTime scales as expected with the number of CPU cores, i.e., response times increase as the number of cores goes down.
SpaceTime has much lower memory requirements than GeoMesa.
SpaceTime slightly losses performance once the allocated memory goes below 8 and 4 GB of RAM but continues to perform very well even with 1 GB of RAM.
GeoMesa, on the other hand, starts losing performance with 2 GB of RAM and completely degrades with 1 GB of RAM by giving almost 6 times longer response times.
SpaceTime performance does not depend much on the network speed. GeoMesa exhibits significant performance degradation giving over 3 times slower response times for 100Mbps network.
- Disk scalability
SpaceTime, when run on SSD, shows excellent and stable performance, which only increases as the data volume increases. Namely, SpaceTime heavily relies on disk speed, and the hard requirement for SpaceTime is a solid-state disk. When running on HDD, SpaceTime performs worse than GeoMesa. Nevertheless, benchmarking shows that SpaceTime utilizes caching capabilities much better than GeoMesa when run on HDD instead of SSD.
Ericsson benchmarking report - Performance of spatial, temporal or spatio-temporal benchmarks (mind the different Y axis scale)
"The results show that SpaceTime outperforms GeoMesa in all performed benchmarks by giving response times in a range of seconds, unlike GeoMesa that requires minutes.
SpaceTime excels when executing different types of queries (spatial, temporal, and spatio-temporal).
When testing scalability regarding the data volume, it gives over 17 times faster response time for 250 million records and is expected to perform even faster as the data volume increases.
It better utilizes more CPU cores, and it consumes significantly less memory. Its read queries are not as affected by the slower network as GeoMesa’s, and finally, it utilizes caching capabilities much better than GeoMesa when run on HDD instead of SSD.
The only downside to SpaceTime is when running on HDD, where it performs worse than GeoMesa. Therefore, the hard requirement for using SSD set by SpaceTime is confirmed."
Download the Ericsson Nikola Tesla Research Paper
Methodology and setup
Dataset used in this study contains anonymized telecom data from arbitrary base stations with a total of 1,162,400,595 records. The data is non-uniformly spatially and temporally distributed.
Both databases are deployed on two virtual machines running on separate physical workstations, initially configured with 12-cores CPU, 96 GB RAM, 500 GB SSD NVMe and gigabit network.
Why are today's Big Data tools ripe for change?
"Storing and querying datasets that contain objects in a geometric space have always required special treatment. The choice of data structures and query algorithms can easily make the difference between a query that runs in seconds or in days.“, Werner Vogels, Amazon CTO
Features inherent in spatio-temporal data, such as highly skewed spatial and temporal data distribution, are the reasons why even at low volumes of data, the time performance of today’s Big Data tools degrades to unsuitability. As the query response times vary from minutes to hours, the majority of off-the-shelf tools provide a click-and-wait experience, even for relatively simple queries and small datasets.
GeoMesa, GeoSpark, Amazon Redshift, Oracle Exadata, Hive, Databricks, and Snowflake are some of the most well-known Big Data tools with full or partial geospatial support. GeoMesa, GeoSpark, Hive, Databricks, and Snowflake are all Hadoop-based, meaning they suffer from the same issues and are per Data Reply's study proven to be incredibly slow and ineffective in spatio-temporal analyses.
None of the abovementioned mainstream databases can deliver even tolerable performance when the data size exceeds a couple of billion records. To be precise, mainstream Big Data tools are destined to break apart when data size starts exceeding dozens of billions of data records (dozen of billion records equal to 1-year worth data of 100,000 vehicles). At a 20 billion records mark, all mainstream tools effectively stop working and cannot be fixed.
Here at Mireo, we've been tackling these problems successfully for some time now. As a technological enabler, our SpaceTime Analytics Platform bypasses the gap between the any moving IoT data and parties interested in extracting such valuable data.
Mireo SpaceTime analytical platform provides all the necessary components to store and analyze both real-time and historical data from hundreds or millions of moving IoT data, on commodity hardware, at a fraction of the price of aspiring mainstream solutions.
If you would love to see the numerous, on-the-fly calculated analytics on more than 200,000 vehicles, check out our free SpaceTime online demo.