SpaceTime outperforms GeoMesa by 17 times, study finds

"SpaceTime gives over 17 times faster response time for 250 million records and it is expected to perform even faster as the data volume increases."

This is the conclusion of the novel independent benchmarking spatio-temporal databases conducted by Ericsson Nikola Tesla. Ericsson’s study benchmarked Mireo’s SpaceTime spatio-temporal database and analytical platform with GeoMesa, the most prominent open-source spatio-temporal database. The research has been presented on the 29th International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2021).

The motive for benchmarking spatiotemporal databases was to expand on the notable Data Reply 2018 evaluation of 6 relevant Big Data spatiotemporal technologies (Hive, MongoDB, GeoSpark, Elasticsearch, GeoMesa and Postgres-XL). To recap, the UK Ministry of Defence engaged Data Reply to benchmark six prominent Big Data technologies with geospatial features. In this study, GeoMesa excelled as the most performant technology when benchmarking spatio-temporal features strictly.

Ericsson Nikola Tesla expanded the Data Reply’s work by benchmarking the performance and scalability between Mireo SpaceTime and GeoMesa. Mireo is happy to announce that the independent analysis has shown that SpaceTime outperformed GeoMesa in almost all benchmarks.

Ericsson’s benchmarking can be summarized as follows:

Performance benchmarking

Performance benchmarks contain different sets of spatialtemporal, and spatio-temporal queries to test indexing techniques and data clustering.

  1. Spatial
    SpaceTime performs better than GeoMesa, as well as having more stable response times with lower deviation. SpaceTime is expected to perform even faster compared to GeoMesa as the data volume increases.
  2. Temporal
    SpaceTime outperforms GeoMesa by average 8 times faster response times. SpaceTime exhibits more stable performance, while GeoMesa gives steeper increase in response times.
  3. Spatio-temporal
    SpaceTime gives more stable and faster response times than GeoMesa. SpaceTime exhibits more stable performance independent from the specific geo-location - its performance is almost entirely dependent on the number of records. Despite the equal number of records, GeoMesa shows significant dependence on a geo-location.

Scalability benchmarking

Scalability benchmarks test databases’ distribution and resource usage.

  1. Scalability
    SpaceTime outperforms GeoMesa on larger datasets, going from a few times faster to over 17 times faster execution times. 
  2. CPU
    SpaceTime scales as expected with the number of CPU cores, i.e., response times increase as the number of cores goes down.
  3. Memory
    SpaceTime has much lower memory requirements than GeoMesa.
    SpaceTime slightly losses performance once the allocated memory goes below 8 and 4 GB of RAM but continues to perform very well even with 1 GB of RAM.
    GeoMesa, on the other hand, starts losing performance with 2 GB of RAM and completely degrades with 1 GB of RAM by giving almost 6 times longer response times. 
  4. Network
    SpaceTime performance does not depend much on the network speed. GeoMesa exhibits significant performance degradation giving over 3 times slower response times for 100Mbps network. 
  5. Disk scalability
    SpaceTime, when run on SSD, shows excellent and stable performance, which only increases as the data volume increases. Namely, SpaceTime heavily relies on disk speed, and the hard requirement for SpaceTime is a solid-state disk. When running on HDD, SpaceTime performs worse than GeoMesa. Nevertheless, benchmarking shows that SpaceTime utilizes caching capabilities much better than GeoMesa when run on HDD instead of SSD.

SpaceTime outperforms GeoMesa by 17 times

Ericsson benchmarking report - Performance of spatial, temporal or spatio-temporal benchmarks (mind the different Y axis scale)

Comparison summary

"The results show that SpaceTime outperforms GeoMesa in all performed benchmarks by giving response times in a range of seconds, unlike GeoMesa that requires minutes.

SpaceTime excels when executing different types of queries (spatial, temporal, and spatio-temporal).

When testing scalability regarding the data volume, it gives over 17 times faster response time for 250 million records and is expected to perform even faster as the data volume increases.

It better utilizes more CPU cores, and it consumes significantly less memory. Its read queries are not as affected by the slower network as GeoMesa’s, and finally, it utilizes caching capabilities much better than GeoMesa when run on HDD instead of SSD.

The only downside to SpaceTime is when running on HDD, where it performs worse than GeoMesa. Therefore, the hard requirement for using SSD set by SpaceTime is confirmed."

Comparing Open Source and Proprietary Database Solutions for Querying Spatio-Temporal Data: GeoMesa vs. SpaceTime

Download the Ericsson Nikola Tesla Research Paper

Methodology and setup

Dataset used in this study contains anonymized telecom data from arbitrary base stations with a total of 1,162,400,595 records. The data is non-uniformly spatially and temporally distributed.

Both databases are deployed on two virtual machines running on separate physical workstations, initially configured with 12-cores CPU, 96 GB RAM, 500 GB SSD NVMe and gigabit network.

Why are today's Big Data tools ripe for change?

"Storing and querying datasets that contain objects in a geometric space have always required special treatment. The choice of data structures and query algorithms can easily make the difference between a query that runs in seconds or in days.“, Werner Vogels, Amazon CTO

Features inherent in spatio-temporal data, such as highly skewed spatial and temporal data distribution, are the reasons why even at low volumes of data, the time performance of today’s Big Data tools degrades to unsuitability. As the query response times vary from minutes to hours, the majority of off-the-shelf tools provide a click-and-wait experience, even for relatively simple queries and small datasets.

GeoMesa, GeoSpark, Amazon Redshift, Oracle Exadata, Hive, Databricks, and Snowflake are some of the most well-known Big Data tools with full or partial geospatial support. GeoMesa, GeoSpark, Hive, Databricks, and Snowflake are all Hadoop-based, meaning they suffer from the same issues and are per Data Reply's study proven to be incredibly slow and ineffective in spatio-temporal analyses.

None of the abovementioned mainstream databases can deliver even tolerable performance when the data size exceeds a couple of billion records. To be precise, mainstream Big Data tools are destined to break apart when data size starts exceeding dozens of billions of data records (dozen of billion records equal to 1-year worth data of 100,000 vehicles). At a 20 billion records mark, all mainstream tools effectively stop working and cannot be fixed.

Here at Mireo, we've been tackling these problems successfully for some time now. As a technological enabler, our SpaceTime Analytics Platform bypasses the gap between the any moving IoT data and parties interested in extracting such valuable data. 

Mireo SpaceTime analytical platform provides all the necessary components to store and analyze both real-time and historical data from hundreds or millions of moving IoT data, on commodity hardware, at a fraction of the price of aspiring mainstream solutions.

If you would love to see the numerous, on-the-fly calculated analytics on more than 200,000 vehicles, check out our free SpaceTime online demo

SpaceTime Interactive Demo

If you liked this, here’s what to read next: