In the summer of 2016 SQLstream collaborated with a market-leading ad-tech company to explore the benefits of using streaming SQL to complement an existing Hadoop batch-based analytics framework. This framework runs on 100s of servers and has an effective latency measured in many hours.

Summary of Results

The batch file / Hadoop framework was complemented with a real time stream processing infrastructure (using Kafka as the message backbone and SQLstream Blaze for analytics) that provides real-time data processing capabilities to deliver information to users and applications consistently, reliably, and expeditiously

  • * Demonstrated end to end transformations for 4 representative use cases
  • * Achieved Kafka ingest throughput for complex Protobuf data structures scaled up to 1.8M recs/second (between 6 and 7.5Gbps or 155B recs/day) on a single (6 core / 12 thread) server. Essentially this is limited by 10 Gbps network bandwidth.
  • * Operated the business use-cases over a continuous 24 hour period at over 650k recs/sec/server – using 85% cpu.
  • * Demonstrated performance capabilities that can be scaled up and scaled out to allow processing of up to 440 billion records/day ¬†– up to 6.7M rows/sec in the peak hour – on a relative handful of servers
  • * Demonstrated end to end analytic latency reduction from several hours down to minutes
  • * Demonstrated an approach to scale out – spreading the ingest across multiple servers in a federated SQLstream Blaze over Kafka infrastructure

Conclusion

The feasibility testing showed that it is possible to run the predicted future load of up to 440B records/day (peak hour 6.7M recs/sec) on a cluster of just 12 equivalent servers.

Curious how it works?

DOWNLOAD AND TRY