Stream processing enables operational intelligence from real-time Big Data

Bloor Group’s Robin Bloor hosted SQLstream’s CEO Damian in The Briefing Room in a webcast entitled “Windows of Opportunity: Big Data on Tap”. The webinar focussed on the emergence of both SQL and the stream processing as a key enabler for real-time Big Data systems in an ever-maturing marketplace. You can watch the full webinar from the link below, but I’m going to focus on some of the topics arising from the online discussion between Robin, Damian and the audience. It was an interesting discussion, covering Big Data, stream processing, streaming analytics and operational intelligence, all within the context of enterprise deployments. A number of important points were raised.

Hadoop is a data reservoir, not a real-time platform.

Many believe incorrectly that Hadoop is a platform for real-time low latency analytics.  It’s not. Hadoop is a multi-purpose engine but not a real-time, high performance engine. The parallelism of Hadoop is great for processing the data once it’s stored, but has high throughout latency.  However, with the integration of a streaming data platform for continuous data collection, analysis and streaming integration, Hadoop can be used as the active archive for a true real-time, streaming Big Data system.

Operational intelligence needs a Streaming Big Data Platform

The bulk of real-time operational intelligance today is derived from log and machine data, data generated by the Internet, Cloud infrastructure and applications for example. There are many log monitoring tools out there, and while very capable, we’re finding that SQLstream with our real-time streaming Big Data platform is being used to solve the high volume, high velocity, complex data problems that log monitoring tools are unable to address at an affordable price point.

The emergence of SQL for Big Data

The first phase of Hadoop and Big Data platforms saw the emergence of NoSQL data storage platforms, looking to overcome the rigidity of normalized RDBMS schemas. However, as the technology hits mainstream industry, the need for simpler, high performance and reliable queries is driving a resurgence in SQL as the de facto language for Big Data processing (see Cloudera Impala for example). What’s not apparent is that SQL is the ideal language for processing data streams using real-time, windows-based queries. The issue with normalization and rigid schemes is a non-issue for a streaming data platform – there are no tables, no data gets stored!

So in summary, stream processing is the emerging Big Data technology for 2013. And SQL is the (re-)emerging technology as Big Data hits mainstream industry.  Processing real-time log and machine data streams is a key requirement today, but industry with sensor, M2M and telematics applications are catching up fast.