"Geospatial databases, at a basic computer science and implementation level, are unrelated to more conventional databases. The surface similarities hide myriad design challenges that are specific to spatial data models."
J. Andrew Rogers, Why are Geospatial databases so hard to build?
In this blog post, we shall present Mireo SpaceTime – a massively scalable, distributed, disk-based storage and query execution engine designed for extremely fast insertion and querying of non-scalar data types. When applied to spatial or spatiotemporal data, the SpaceTime database provides unprecedented analytical tools speed, sometimes outperforming other state-of-the-art solutions by three orders of magnitude.
Someone may wonder why would a normal person create its own spatiotemporal database. In short, we were faced with the challenge of efficiently storing and searching an enormous amount of non-uniformly distributed spatiotemporal data continuously arriving from a large number of vehicles. It turned out that none of the off-the-shelf solutions were suitable for this task due to the pathological properties of their multidimensional indices. Detailed reasons that led us on the adventure of creating our own database can be found on the linked blog post.
SpaceTime database is a relational, horizontally scalable, SQL cloud database that possesses ACID (Atomicity, Consistency, Isolation and Durability) set of properties. Most modern databases are either relational or non-relational (NoSQL). However, SpaceTime belongs to the NewSQL class of databases which is a common name for the family of web-scale distributed relational SQL databases designed to overcome hard limitations of traditional SQL databases in the context of Big Data processing. A few notable members of NewSQL sort of DBMS are ClustrixDB, CockroachDB, VoltDB and MemSQL.
The index on spatiotemporal records is by far the most important ingredient of an efficient spatiotemporal database processing system. As discussed in our previous post, none of the multidimensional index types used in major spatial systems (R-tree family, kd-tree family and space-filling curves) are suitable for indexing a huge amount of real-world spatiotemporal data, so we had to come up with something new.
SpaceTime database is built on a unique, multidimensional data index that automatically adapts to mutable and possibly highly skewed data distribution (which is usually the case with data coming from moving objects). There is one single multi-columnar index that indexes all associated columns of records at once. There are no secondary indices!
The physical model intertwines index and record data – as a consequence, records that are logically close (based on id, position and timestamp) are at the same time also physically close. In turn, this maximizes the throughput of disk reads.
The main ideas for inventing SpaceTime index came from several inspiring scientific papers. It is similar to the kd-tree family of indices, but with two major improvements: first, the index tree in SpaceTime is built using the bottom-up approach (as opposed to the top-down kd-tree construction) and second, the process of index creation adapts to particular space-time distribution of data. Kd-trees work well on a large scale only with static datasets; our bottom-up approach overcomes this.
As mentioned before, SpaceTime database, like the other NewSQL databases, is designed from the ground up as a scale-out architecture. This means that partitioning, sharding and clustering techniques are built into the core of its engine, and are ensured to maintain the ACID guarantees. Unlike many similar systems, all nodes are equal and mutually independent – there is no single master at any time.
Index and record data are partitioned into separate subsets called partitions or shards which are uniformly distributed among the nodes by using consistent hashing algorithm. Technically, these subsets are organized as fixed-grid hypercubes.
Figure 1: SpaceTime fixed-grid hypercubes
Fixed-grid hypercubes prevent two major potential problems from happening: an index tree growing too large and the index degeneration due to the sudden change in the data range. The latter can happen if, for example, we operate with one million vehicles for a long time and then the data from an additional million vehicles starts arriving.
High availability is achieved by a massive parallelization of query and insertion tasks, even on a single node. The limit is only imposed by the Internet connection bandwidth.
Fault tolerance is implemented by using real-time block device replication (DRBD) with Apache Zookeeper as a coordination service. The typical quorum based synchronization between database nodes is completely avoided by the combination of data partitioning, low-level block device replication and high-level synchronization through Zookeeper. There is no single point of failure anywhere in the system—at least two copies of data on two different nodes always exist. The system can handle failures of up to 50% of existing nodes.
One valuable lesson learnt is that DRBD is much faster than all popular distributed filesystems (GlusterFS, Ceph, Bee FS...) because they introduce a lot of back-and-forth communication and network latencies during the process of node synchronization. By using DRBD low-level replication, SpaceTime engine ensures that all data are always read directly from the local disk storage, completely bypassing the network stack. At the same time, writes to the disk are atomic and distributed by the kernel logic in DRBD, thus providing consistency and durability of data.
As many other distributed systems, SpaceTime database is natively horizontally scalable. In other words, it supports a virtually unlimited number of records in the database. In order to process and store more data, one should just add additional nodes and the system performance would be preserved.
API and queries
SpaceTime exposes an extremely simple REST API that contains just three methods: insert, k-nearest neighbors query and MSQL query. MSQL stands for "Mireo SQL" and it is developed in parallel with SpaceTime. MSQL syntax covers most of the standard SQL and also includes the support for geographic objects, thus allowing location queries to be run. More specifically, MSQL supports (to a certain degree) OpenGIS Implementation Standard for Geographic information - Simple feature access - Part 2: SQL option (link), similarly to e.g. PostGIS extension of PostgreSQL or Oracle Spatial.
SQL queries in Mireo SpaceTime database closely resemble the MapReduce computational model. As a first step, we perform parsing and AST optimization with the help of Apache Calcite. More complex computation, aggregation and filtering are performed by on-the-fly generated assembly code through which pre-filtered records flow in push form (as opposed to the traditional RDBMS "Volcano" iteration model). Queries are translated to assembly code using the LLVM compiler and spread across all nodes in the system. The results of these "micro tasks" are aggregated and returned to the user in the same way as with the MapReduce paradigm.
Show me the numbers!
SpaceTime database code (written in C++) strictly follows shared-nothing multithreading philosophy. The synchronization among threads is reduced to an absolute minimum and worker threads are pinned to fixed CPU cores. This approach eliminates expensive processor cache synchronization steps and maximizes CPU usage. As a benchmark, SpaceTime fetches from the index and processes roughly 50,000,000 records per second on a single quad-core Intel i5 machine clocked at 3 GHz, equipped with Samsung 960 Pro NVMe. Therefore, worker threads in SpaceTime engine easily saturate NVMe SSD, getting about 360,000 IOPS routinely.
Regarding the hardware requirements, SpaceTime is very cost-effective and is designed to run on commodity hardware. The only strict requirement is to use local SSD storage. As an illustration of the execution speed, a query that produces the report shown below takes on average only 6.29 seconds to execute. This query retrieves all road segments that any of 93,000 vehicles drove through in the wider metropolitan area of Milan (Italy) during a period of 1 month, aggregates the total mileage for each vehicle, sorts all calculated results by the mileage and returns the top 100 of them. The query is executed on a cluster of 4 commodity machines that handle SpaceTime database containing 20 billion records.
Figure 2: Selecting a polygon and a time-frame for the report "mileage in area"
Figure 3: The results of "mileage in area" report
Is SpaceTime for me?
Our motivation for developing the SpaceTime database was pretty straightforward – we had to find a way to efficiently index and search GPS positions arriving in real-time from 2 million trucks. It should be kept in mind, however, that SpaceTime database is not a general-purpose SQL RDBMS. Its main purpose is extremely efficient storing and querying of non-scalar (e.g. spatiotemporal) data types at web-scale.
Some common examples include smart cities ("improve public transport by analyzing commuting patterns"), connected cars ("detect road segments with the excessive usage of ABS and immediately warn other drivers!"), connected insurance ("estimate driver's risk level by quantifying its total mileage, number of left turns and proportion of night drives"), pandemic control ("immediately detect who violates self-isolation"), social networks data processing ("which of my friends are currently nearby?") and mobile network analytics ("quality of service suddenly dropped near the football stadium due to 80,000 visitors, reallocate the resources!").
To wrap-up, if you are dealing with data sources that continuously generate non-uniformly distributed data containing latitude, longitude and timestamp at extremely high rates, and you need blazing fast Big Data Analysis of both live and historical georeferenced data, SpaceTime is most probably the right tool for you. In that case, feel free to contact us.
On the other hand, if you want to see the system in action immediately, you can book a demo.
If you want to find more detailed information about the SpaceTime solution, visit our definitive guide through an unusual and fundamentally new approach of using spatiotemporal data in our everyday routine.