Real time vs. streaming—a short explanation

Real time talks about reactions to data.

More often than not, a system can be called real time if it can guarantee a reaction within tight deadlines, and depending on the interest of those to define it, “tight” can be a matter of minutes, seconds, or even milliseconds.

A handy example comes from the stock market: if a stock quote comes within 15ms of an order being placed, the system can be considered a real time because there’s a real-world guarantee that the reaction meets a low enough deadline. The software architecture does not matter; neither does the technical basis for creating an order in the first place.

Many vendors claim their software solution is real time, and some support their claim through response times that are guaranteed within hard real-world deadlines. What they omit from their sales pitch is that the road from data creation to results is a sum of data processing actions, each requiring a processing time that can be very far from “instant”.

It’s like saying “this stew is organic because the carrots we used are organic”. Or, like a great Quora answer put it:  it’s like claiming that your TV, which is, in theory, a real-time processing system (because given an input, a pixel will light up on the screen right away), is producing real-time content—forgetting that what you see could have been created a long time ago.

 

Streaming talks about actions taken on data

In contrast, streaming defines a method of continuous computation that happens as data flows through a system, with no time limitations other than the pure power or the technology solution employed and the business tolerance to latency, whether it needs specific results in real time or not.

There is no pre-fixed time deadline for the output/reaction of the system when data comes to the door: the data is processed as it comes, although, due to high volumes, sometimes there’s a backup awaiting processing. The success of this model depends on two things: over the long term, output rates need to be at least equal to input rates, and it must have enough memory to store queued inputs required for the computation.

Once action is imposed on the data, no matter what part of the process we’re in, the system’s reaction time needs to be measured against the scope that initiated the action in the first place, alongside depth, completeness, accuracy and other indicators that become relevant as the process moves along.

To conclude, real time can be used to describe streaming-at-large, but in order to claim true streaming performance, a solution needs to BOTH:

  1. Provide minimal latency at all stages of data processing it covers, from data acquisition, to analytics and visualization, to actions, AND
  2. Do it continuously, with no backups or downtime.