Real-time data and streaming data – what’s the difference?

I have a difficult time restraining my impulse to interrupt customers, vendors, employees, the janitor etc., when I hear them refer to ‘real time data’ when what is meant is the ability to ingest massive amounts of data and to interpret or analyze this in a short amount of time. Invariably they will say, “Well, this is clearly a ‘real time’ data problem” or the like. In short, they have confused acceptable low latency with streaming flows of data.

As the character Luke (portrayed by Steve McQueen, who many moons ago drove Vic Hickey’s Baja Boot in one of our Baja 1000 excursions, the Baja Boot being the precursor to the Humvee – but that’s another story) says in the classic 1967 movie Cool Hand Luke, “What we’ve got here is failure to communicate” – I submit  ‘real time’ can never be actually achieved or determined; it is very much akin to the intersection of Brownian motion with Heisenberg’s Uncertainty Principal – assuming you can catch / read / interrogate something in ‘real time’, both time has passed and the mere action of observing the object has caused the object to move away. That is, ‘real time’ can never be truly achieved or realized, an event is over by the time one tries to measure or quantify it. Low latency, or rather low latency in a time frame of relevance, is actually what these folks mean.

Streaming data, however, is constant, flowing, measurable, and accessible should one have the means to do so, SQLstream s-Server being a great tool for such purposes. Results from analysis of such streams are streams themselves, or can be quantized into discrete events which can then be used to trigger or visualize events of import, leading systems, administrators, executives, etc., to take appropriate actions. This needs to happen in a low latency time constraint; however, ‘low latency’ itself may have different meanings for different applications of these technologies.