This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”. The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications. Real-time streaming analytics and Big Data applications presented included oceanography biology genetics, reading handwriting, astrophysics, particle physics, recommendation engines for social media, and inevitably, real-time fraud detection from live data feeds.
I presented on a deployment of SQLstream as a Dynamically Scalable Cloud Platform for the Real-Time Detection of Seismic Events. Based on work with UCSD seismologists, SQLstream has been deployed to detect significant events in data collected from a large grid of seismic sensors. A large-scale data infrastructure (the OOI/CI) provides raw signal data over an AMQP message bus.
SQLstream monitors live seismic data feeds in real-time, applying heuristic algorithms that look for patterns indicating earthquakes. The live system scales dynamically across multiple servers in a cloud environment based on the current demand. You can view the presentation here. I also blogged previously on the application here.
In conclusion, I have two main observations from the conference so far (it continues until Friday). The first is that the majority of fields in science and technology appear to have a Big Data and often a real-time Big Data problem. Secondly, the extent of the innovation and computer science resources dedicated to solving these problems. In particular for this conference, developing algorithms for data analysis and machine learning (that is automatic pattern recognition) that work on streams of flowing data. It’s clear that traditional data management and even Big Data batch-based methods don’t work when you need continuous results from dynamic data. And the amount of data is huge.