In the summer of 2016 SQLstream collaborated with a market-leading ad-tech company to explore the benefits of using streaming SQL to complement an existing Hadoop batch-based analytics framework. This framework runs on 100s of servers and has an effective latency measured in many hours.
Summary of Results
The batch file / Hadoop framework was complemented with a real time stream processing infrastructure (using Kafka as the message backbone and SQLstream Blaze for analytics) that provides real-time data processing capabilities to deliver information to users and applications consistently, reliably, and expeditiously
- * Demonstrated end to end transformations for 4 representative use cases
- * Achieved Kafka ingest throughput for complex Protobuf data structures scaled up to 1.8M recs/second (between 6 and 7.5Gbps or 155B recs/day) on a single (6 core / 12 thread) server. Essentially this is limited by 10 Gbps network bandwidth.
- * Operated the business use-cases over a continuous 24 hour period at over 650k recs/sec/server – using 85% cpu.
- * Demonstrated performance capabilities that can be scaled up and scaled out to allow processing of up to 440 billion records/day – up to 6.7M rows/sec in the peak hour – on a relative handful of servers
- * Demonstrated end to end analytic latency reduction from several hours down to minutes
- * Demonstrated an approach to scale out – spreading the ingest across multiple servers in a federated SQLstream Blaze over Kafka infrastructure
The feasibility testing showed that it is possible to run the predicted future load of up to 440B records/day (peak hour 6.7M recs/sec) on a cluster of just 12 equivalent servers.
Curious how it works?