Streaming Big Data, SQLstream and Oracle OpenWorld

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete from the recent London 2012 Olympics to endorse a car company. This seemed to strike a chord with the audience. How many organizations employ a marketing analytics company to spend a vast amount of time poring over data to work out the top candidates for a marketing campaign? That said, would the CMO really go with a query result, or chose their favorite in any case?

Cloud and business applications were a focus, although as others have blogged elsewhere, despite 80+ acquisitions in the past few years, Oracle remains a database company. Major announcements / news included:

  • Release of Oracle 12c (the ‘c’ for ‘cloud’), and the announcement of its first multi-tenanted and ‘pluggable’ databases got a few ripples of applause from the audience.
  • Exadata X3 box, the in-memory machine with 22 raw TB of memory and a claimed 10X compression making a total of 220TB of ‘memory’ in a rack. Oracle claims this is 100 times faster that the Exadata Oracle launched in the last few years.

Streaming analytics and Twitter

Back to Larry’s Twitter example. Of course, this can be achieved easily as a streaming application in real-time. Semantic streaming is something SQLstream’s been doing for some time, taking unstructured data such as twitter, emails and texts and determining sentiment and aggregated scoring in real-time. Use cases include identifying traffic incidents on the road networks to augment geospatial analytsis of vehicle GPS data, and also in telecommunications, to better determine in real-time a customer’s true perception of their quality of experience for delivered services.

The numbers seemed impressive – Larry crunched nearly five billion tweets and 27 billion social media relationships. But breaking this down, is this really a Big Data problem? Five billion tweets, even over a one day period (I believe the demo was 10 days), is only 58,000 tweets per second. This is well in access of Twitter’s top peak loads during major events such as the Superbowl. But well within the capability of SQLstream’s real-time streaming Big Data platform, even on an entry level single server, 2-core machine. Of course, the complete solution architecture may include data storage platforms such as Hadoop or Oracle, where aggregated streaming results can be loaded and persisted in real-time, further crunched in the data warehouse, and historical analysis joined back with the real-time streams to help identify better any moving trends.

It was an interesting demo nonetheless, and one that really should be completed in real-time as a streaming problem. SQLstream’s ability to analyze and aggregate streams across in this case keywords and hashtags, provide geospatial and clustering analysis, as well as delivering raw and aggregated data as continuous streams to the backend storage platforms, makes this very achievable today.

On the show floor

Oracle OpenWorld Speaking RobotApart from the heavy footfall at the SQLstream booth, perhaps most notable was the increasingly uninventive marketing mechanisms used to persuade unsuspecting attendees to listen to product pitches based on the promise of winning a piece of Apple hardware. Surely marketing managers can think up something a bit more inventive than an iPad? The exception was the the speaking robot. Not sure if this was an exhibit floor attraction, although I saw it ‘chatting’ to passersby on the Wipro booth.

Contact us if you’d like to find out more about SQLstream and our streaming Big Data management platform.