PRODUCT | Guavus SQLstream achieves record performance of 1.8M records/second/server

In the summer of 2016 SQLstream collaborated with a market-leading ad-tech company to explore the benefits of using streaming SQL to complement an existing Hadoop batch-based analytics framework. This framework runs on 100s of servers and has an effective latency measured in many hours.

Summary of Results

The batch file / Hadoop framework was complemented with a real time stream processing infrastructure (using Kafka as the message backbone and Guavus SQLstream for analytics) that provides real-time data processing capabilities to deliver information to users and applications consistently, reliably, and expeditiously

  • Demonstrated end to end transformations for 4 representative use cases
  • Achieved Kafka ingest throughput for complex Protobuf data structures scaled up to 1.8M recs/second (between 6 and 7.5Gbps or 155B recs/day) on a single (6 core / 12 thread) server. Essentially this is limited by 10 Gbps network bandwidth.
  • Operated the business use-cases over a continuous 24 hour period at over 650k recs/sec/server – using 85% cpu.
  • Demonstrated performance capabilities that can be scaled up and scaled out to allow processing of up to 440 billion records/day  – up to 6.7M rows/sec in the peak hour – on a relative handful of servers
  • Demonstrated end to end analytic latency reduction from several hours down to minutes
  • Demonstrated an approach to scale out – spreading the ingest across multiple servers in a federated Guavus SQLstream over Kafka infrastructure

Conclusion

The feasibility testing showed that it is possible to run the predicted future load of up to 440B records/day (peak hour 6.7M recs/sec) on a cluster of just 12 equivalent servers.

Curious how it works?

DOWNLOAD AND TRY