The Difference Between Streaming and Batch Processing

With IoT getting popular via all-things-smart, the analysts and the press are getting hungry for data, and consumers—for information. High-volume, high-velocity data is produced, analyzed, and used to trigger action almost as it’s being produced, the processing of it producing more data in turn; it’s a never-ending cycle (albeit, short-lived) that makes it all possible and difficult from the start.

Using, being a subject of, or writing about IoT and its applications has become as exciting as its value-generating potential, but navigating the terms of what powers it? Not so much.

Complex event processing vs. event processing, streaming analytics vs. real time data analytics, data ingestion and data ingestion frameworks, streaming analytics platforms vs. big data processing frameworks, what is spark streaming, streaming SQL, no-batch vs. batch processing, and so on are search terms the public most oftenly looks for. And the answers are as varied as they come.

To clarify things a bit, here’s a simple description of the streaming model for data processing.

Unlike batch processing or traditional big data processing frameworks, a true streaming model is built on independently streaming elements that run concurrently, continuously, and in real time (both in report to the moment when data is born, and to each other), as follows:

 

Streaming Ingestion (SI)

To use data, a system needs to be able to discover, integrate, and ingest all available data from the machines that produce it, as fast as it’s being produced, in any format, and at any quality.  

A streaming data ingestion framework doesn’t simply move data from source to destination like traditional ETL solutions. With SI, all data formats including log files, web logs, sensor or IoT data, machine data, and CDC from databases, may be ingested, filtered, corrected, parsed, and simultaneously enriched with structured, stored data, from any or many sources for analysis.

 

Streaming Analytics

In order to provide accurate and relevant business intelligence, a system needs to be able to analyze all data available continuously, concurrently, and in real time.

Streaming analytics differs from batch processing in that results are updated continuously as more data enters the system.  This record-by-record model is more accurate and relevant because it analyzes more data, faster.

 

Streaming Action

The analytics results can trigger the correct business process or operation on a per-record basis at the point of insight.

Unlike traditional models, streaming technologies employ a continuous load process. That allows the systems to mitigate anomalies, report results, and seize opportunities when and where they happen. Automating actions (whether pushing data into streaming applications, or into storage for future use) also means avoiding systems overload.