Stream processing eliminating the chasm between analytics and operations

Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and Jim Ericsson (ex-Information Management, today was his last DM Radio after 5 years at the helm), and supported by Philip Russom (TDWI). SQLstream’s Damian Black was on the panel, with representatives from Denodo and Dell.

Extending the reach of Hadoop to the edge of your business.

A key observation emerged from the discussion, that the world of ETL and data integration is changing with the need for lower latency business operations. Most businesses are now global. There is not the window of opportunity for batched data management processes. Exploiting streaming data enables organizations to extend the reach of their Big Data storage platforms and data warehouses out to the edge of their business, connecting these platforms directly to the data sources. Streaming data can be filtered, aggregated and integrated with minimal latency, plus real-time operational intelligence extracted as the data stream past. This means businesses can be responsive to their real-time data without having to wait for data to be stored and processed.

Blurring the boundary between analytics and operations.

The emergence of low latency business operations is also blurring the boundary between traditional BI analytics and the world of business operations. The emergence of operational intelligence is the application of analytical techniques in real-time. The two are no longer distinct business functions, rather a seamless continuum from streaming data to real-time business value. The benefit is that businesses can now extract the information they need in real-time from arriving data streams, but still populate existing Big Data storage platforms and data warehouses, enabling existing business processes to function as normal.

The rising power of SQL queries over Hadoop.

Integrating data between systems implies a degree of structure, a common understanding of sink and source formats. The majority of integration platforms today assume SQL as the data management language. However, SQL as a language does not necessarily mean that the underlying data store has to be relational. And this is the game changer for Hadoop that will open up the existing enterprise market – the emergence of a true standards-based SQL query interface. Cloudera is already some way down this road with Impala.

So what’s the answer: SQLstream = Real-time data integration + operational intelligence + SQL.