Self-Service for Real-time Analytics from Machine Data Streams

Just about everyone has heard the term Big Data, even my most non-technical of friends. Significantly fewer people outside the industry are quite so familiar with the term “self service analytics”, although most can make a stab as to its meaning. Self-service analytics is not a new phenomenon but it has quietly established itself as the capability that may unlock Big Data as a mainstream enterprise technology. Big Data applications from Internet of Things appliances through to security intelligence, call analysis in telecomms and even log monitoring in data centers, are heading down the path of high speed data architectures with users empowered to built their own views of the data.

An industry blinded by Hadoop

Anyone who’s been around the BI industry for the past few year’s may have been forgiven in believing that Hadoop is all that’s required to move from the old world of structured data to a new world of unstructured data. Mention of visualization technologies, or any other data management technology or capability for that matter, has been submerged under the hype of Hadoop. That’s now changing as the dust starts to settle. SQL was the first to emerge. Then the realization that issues such as data integration, data quality and reuse remained at least as important as before. And now we’re see new momentum in visualization technologies, and in self-service analytics in particular.

User empowerment is the priority

Prior to the available of stream processing as an IT technology platform, self-service analytics was only applicable in the traditional BI/DW world, where users would tolerate 24 hour delays in getting access to their data, and would expect to submit a lengthy change request should they require a new report. A day’s wait for new data is not longer acceptable, and with unstructured data from so many sources, the range of useful reports is too great to provision in advance. The value in Big Data lies in the ability to extract value from real-time (for stream processing) to within a few hours (for storage-based reporting on Hadoop and NoSQL).

The Internet and Things are in the driving seat

A successful appliance for the Internet of Things requires more than just the Internet and the Things. Appliance architectures are required to support high speed data, real-time integration and automated processes, but also support a wide range of user reporting and analytics. A platform for IoT services is required to support multiple appliances and use cases. Supporting a wide and varied user base is impractical with pre-defined reports alone, and given that end users will be more operational than IT, ease of report development is key. Self-service analytics is essential as a component of the overall solution architecture.

Real-time IoT appliances and stream processing

Stream processing is central to the emerging high speed data architectures for IoT appliances. A data stream management platform delivers the real-time requirements for sensor data mediation, streaming analytics, continuous integration and operational process automation. Any enterprise architecture requires a blend of both data stream management and database management, and a blend on in-memory processing and storage-based processing. However, it is the real-time appliances that operational users will interact with most, and therefore self-service analytics must also be available for the data streams as well as the data stores.

Self-service analytics for data streams

SQLstream’s StreamLab is a graphical platform for the intelligent guided discovery and visualization of data streams. Users connect, explore and visualize machine data from log files and sensors in real-time as the data are flowing. A graphical interface provides interactive formatting of flowing data, and the connection of streaming analytics to real-time dashboards.

At first glance, StreamLab is much like any other self-service analytics platform – non-technical users can connect to data sources, apply some analytics and visualize the results. However under the hood there are some significant differences.

  • First, each user action in StreamLab is translated into the required SQL code that is deployed and executed dynamically on the underlying s-Server platform – queries are updated and new results generated withput having to stop and restart the server.
  • Second, outputs are as important as inputs in a stream processing architecture, therefore StreamLab enables streams to be connected to external systems as well as being displayed on real-time dashboards.

The differences between StreamLab and a traditional platform for self-service analytics over stored data are presented in the following table.

Data Operation Self-service on Stored Data StreamLab for Streaming Data
Data Sources Reads from stored database tables. Processes real-time unstructured data streams.
Data Discovery Not applicable. Intelligent, guided data stream discovery.
Analytics Applied as static queries over stored data. Applied as updates to continuous queries over streaming data.
Visualization Pull-based queries over stored data. Query repeated for new data. Continuous, push-based updates of dashboards as new data arrive.
Enterprise Connectivity No. Reporting only. Continuous enterprise integration by selecting external destinations for data streams.