The Cost of Real-time Performance in a Massively Connected World
The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision, has decreased dramatically.
Industries such as telecommunications, telematics, M2M and the Industrial Internet are starting to generate high velocity data streams at rates of millions of records per second. Gaining actionable insight from data of this magnitude and speed may be technically feasible to a point for the elastic scale-out architectures of the leading Big Data and Cloud technologies, but at what point does it cease to be financially viable?
As data management architectures evolve, each new technology looks to increase real-time visibility and reduce the latency to action and decision-making. Yet each technology has an associated cost and a practical limit beyond which real-time, distributed data management becomes commercially infeasible and then technically impossible.
Big Data technologies emerged due to the combination of lower cost commodity hardware and Cloud infrastructure, coupled with the cost and the technical challenges of processing massive volumes of unstructured machine data (server logs, sensors, IP network data for example) rapidly using conventional “store-first, query second” relational database technology. This lead a charge towards new elastic scale out processing frameworks based on HDFS and Map-Reduce. However, as Hadoop-based and other similar technologies mature, the true cost of real-time and Big Data for Hadoop is starting to emerge.
For low latency operational intelligence applications, Big Data storage-based solutions are proving to be costly and complex to deploy, and technically challenging to scale for high velocity / low latency performance. Furthermore, today’s Cloud multi-tenant and shared infrastructures introduce unacceptable latency to real-time business analytics, and the cost ramps up significantly as ingress and egress data velocities increase.
The ROI and technical tipping points for storage-based (Big Data and traditional RDBMS-based architectures) is very much at the low end of the scale, typically in the order of 10,000 records per second. This is sufficient to process a Twitter firehose at full speed, but falls significantly short for industry applications for real-time operational intelligence. For example:
- Operational performance & QoS monitoring in a 4G cellular network generates cell and call detail information at many 10s of millions of events per second.
- Telematics applications are generating data rates of between 5 million and 20 million records per second for a small installation.
- For IP service networks, IP probe data regularly exceeds data rates of 10 million records per second.
These data rates are well within the technical capabilities of SQLstream’s Big Data streaming platform, and importantly, SQLstream has a proven ROI for high velocity data applications. That’s not to say in-stream analytics is the only platform required, and in fact SQLstream deploys Hadoop HBase as its default stream persistence platform, and a typical architecture deploys SQLstream for the in-stream analytics with continuous feeds of raw and operational intelligence from SQLstream to existing operational control and data warehouse platforms. The resulting architecture offers an attractive overall ROI and eliminates the technical tipping points for low latency, high velocity data management.
More on the cost of real-time performance in a Big Data world to follow as part of this series in subsequent blogs.