Just back from the Silicon Valley Comes to Oxford (SVCO), an invite-only event at the Said Business School, University of Oxford University in the UK. The aim of the event is to provide insight to the Business School graduates on how to start, scale and run high-growth companies. The speakers include prominent entrepreneurs, innovators and investors from Silicon Valley. Other speakers included Phil Libin CEO of Evernote and Mike Olson, co-founder of Cloudera.
As well as running a masterclass on entrepreneurship in Silicon Valley, my main presentation was to the business school address the mismatch between Big Data hype and reality. The talk entitled “The Clash of Big Data and Consumer Expectation”, examined how the Big Data movement, as epitomized by Hadoop and NoSQL platforms, is failing to keep up with consumer expectations. In large part this is down to the solution cost and immaturity of the Hadoop-based technologies, but in the face of dramatically increasing data rates from telecoms and the IoT, means that Hadoop and related Big Data technologies are unable to deliver the real-time, low latency performance at any reasonable cost. We’re also talking about a world of sensors and other data where much of the data is business as usual and no need to be stored.
Many industries are implementing ‘real-time’ applications, although the reality struggles to make it past the Marketing departments. For example, travel information remains woefully inadequate and inaccurate, real-time ad placements are still based on simplistic measures such as recent site visits, telecoms service providers are struggling with the new Big Data streams from 4G/ IP location-based services, and micro-location services (such consumer behavior in shopping malls) are still in their infancy.
There are two issues at play. The collection of many different types and formats of machine-generated data from applications, web servers and devices, and the ability of the Big Data platforms to deliver real-time, low-latency responses.
Fast, as-hoc queries over large volumes of stored data is not an issue for Big Data technology platforms, but deliver low latency answers is either not possible or prohibitively expensive in terms of the hardware and software resources required. For example, most Hadoop and NoSQL solution vendors have a social media demo based on Twitter. Yet Twitter as its peak is around 10k tweets per second. Applications in telecoms, cybersecurity, telematics and transportation are increasingly in the area of several million records per second.
Therefore delivering cinsumer expectation requires a rethink. How to deliver automated, low latency actions from multiple high velocity data streams, often with poor quality data, low information content and different data formats. This is where streaming data management comes into play. Streaming in the past has often been associated with event processing or BPM-based middleware platforms, but has now moved on to the next generation of streaming platforms, capable of executing many applications on the same core platform, and scaling to the data rates generated by IoT and telecoms applications.