The Internet of Things, Big Data and a Real-time Data Hub

There’s certainly a vast range of different IoT API, connection protocol technologies and data formats. At first glance, this makes device to device communication tricky, particularly as the Internet of Things encompasses all vendors and technologies. However, the Internet of Things is not necessarily a direct vendor to vendor issue. Rather the need for communication through the real-time data hubs on which Internet of Things applications are being built. It’s therefore the data hub that provides the device to device intelligence and communication – both read and update.

If we look at other industries, particularly telecoms, the concept of a multi-vendor data hub at any reasonable scale has proven difficult to realize, and therefore the industry has remained heavily siloed (from an IT perspective). In part this can be put down to cross-vendor standards that were too abstracted to be useful, in part the complex relationships inherent in network and service data, and in part the issues with building cross-silo solutions on enterprise architectures based around many different data warehouses with inflexible data models.

However, technology and attitudes to integration have moved on. As too the nature of the data which is now mostly unstructured, and with the rise of Big Data technology (Hadoop storage-based and stream processing), enables data to be processed, joined and analyzed more easily in their original formats. And with stream processing, to do so in real-time.

We’re seeing intelligent sensor platforms with data interfaces from many different vendors as XML and JSON streams (read and write), even CSV, and delivered over sockets or even rotating log files. Although the payload and structure differs, these streams are now much simpler to process in a multi-vendor environment – by utilizing stream processing as a real-time data hub, often sitting on top of Hadoop as the enterprise data hub (for unstructured data).

At SQLstream we provide data collection agents that transform any data source (using Change Data Capture technology as appropriate for the target stream, log format or platform) into streams of tuples (name:value pairs). These tuples are essentially a common data format for processing, in our case, using continuous SQL queries. It’s possible to JOIN and UNION any number of event streams from different vendors in a single query, and to execute SQL or Java-based analytics and alerts over the data. And as a stream processing platform,to realize the concept of a real-time data hub and integrate with operational platforms and systems in real-time in order to automate processes.

Vendor standardization for the Internet of Things of course remains important, particularly for update APIs and driving automated updates and changes. This is where the Internet of Things may have the largest impact on IT – not the move from the traditional EDW to real-time Big Data and stream processing – but the move away from considering analytics as traditional business intelligence, to using real-time streaming analytics to drive automated actions and operational processes. The union of analytics and real-time process automation which defines the concept of the real-time data hub for the Internet of Things.