Why data stream processing performance matters when Fast Data meets Big Data.

This week saw the publication of the results for a comparative real-time performance benchmark between Apache Storm and Guavus SQLstream. Using the WordCount example shipped with Hadoop and Apache Storm, we were interested to see just how quickly each could process records.

As it turns out, pretty quickly for Guavus SQLstream, around 4.6 million records per second per 8-core server, less so for Apache Storm. The 4.6 million number is more or less equivalent to processing the Complete Works of Shakespeare (a little under 1 million words according to the Internet) fives times every second. Although why you might want to do that, I’m not sure. The difference in performance between Guavus SQLstream and Apache Storm was even more impressive as the WordCount benchmark does not expose one Storm’s major weaknesses – no native concept of time-based processing over time windows. Which happens to be a particular strength of SQLstream, and all real-world use cases for streaming analytics require this capability.

That said, the key point here is not about the fastest performance, it’s about what that means in real terms. The cost of Big Data and real-time systems has come into focus more over the past year. Although Big Data technologies utilize commodity hardware, when you need 100+ boxes to get any reasonable performance, that can become expensive. Faster performance per server means less servers.

The Cost of Performance for stream processing

Throughput performance offers confidence of future scalability but also translates directly into the monthly and lifetime costs for the solution. The average bare metal cloud server is around $500 – $2000 per month depending on the specification and I/O bandwidth.

ROI Metric Apache Storm Guavus SQLstream Guavus SQLstream ROI
Hardware Cost / Month
Processing 5 million words per second using bare metal 8-core Cloud servers at $500/month
121 servers, $60,500 / month 2 servers, $1000 / month Guavus SQLstream 60X reduction in infrastructure cost versus Apache Storm.

Even though storage is less of a consideration with a stream processing platform, why deploy 100+ servers when the same work could be carried out on two or three? This means a significant reduction in server costs (60X reduction even considering just the simple WordCount benchmark), but also much simpler manageability and platform stability going forward.

Time to Value.

Time to value, system stability and agility for new requirements are also front of mind for CIOs when deploying Big Data platforms. Time to value is the time taken to get to an operational system from scratch. It’s swings and roundabouts with Hadoop and stream processing frameworks such as Storm. On one hand, anything is possible, an important consideration when processing unstructured machine data, but on the other, it takes time and money to build from scratch, often at the expense of solution reuse and the ability to change for new requirements. The WordCount example is trivial in both Guavus SQLstream and Storm, but in real-world scenarios, such as covered in a previous independent benchmark between SQLstream and Storm, time to value is an important consideration.

ROI Metric Apache Storm Guavus SQLstream Guavus SQLstream ROI
Development Effort
From download to operations, based on a customer 4G network performance monitoring app.
6 months
(180 days)
1 week
(5 days)
Guavus SQLstream delivers robust operational systems 30X faster.

This is where the power of SQL for stream processing comes to the fore – powerful analytics, quickly, and a stable platform – but also pre-built adapters, integrated real-time dashboards for streaming analytics, out of box integration for continuous ETL and stream persistence with Hadoop HDFS and HBase, plus a range of other RDBMS and data warehouses.

In summary, performance does matter. Stream processing offers scalability for systems at the junction of fast data and big data. But stream processing platforms are not all equal. As Shakespeare said “Time travels at different speeds for different people” (As You Like It). For SQLstream’s customers, it gallops, and we can keep up.