Business success relies heavily on taking the right action, at the right time, all the time. And actions are dictated by data.
But the Big Data save-first model fails to process and analyze data fast enough, and results are incomplete because they do not make use of real-time data. Similarly, other Fast Data technologies can’t include historical data in analysis fast enough, making the results either inaccurate or irrelevant. In both cases, business intelligence is not up-to-date, and cannot correctly inform true real-time, contextually intelligent applications and services.
Streaming analytics- definition
STREAMING ANALYTICS is the acquisition and analysis of all data available (live and historical) to make it actionable the moment it streams into the system.
- Acquisition includes: capture, consolidation, filtering, aggregation, cleaning, repair, ingestion
- Analysis includes: integration, filtering, parsing, enriching, transformation, windowing, correlation, time-based and geospatial queries
- Actions include: visualizations, automated actions, continuous load
- Best decisions: insight is collected from all the data available to a business (not just old, historical/stored) data) and delivered to decision-makers faster
- Savings: using less hardware and skills
- Real-time crisis control: responsive systems can detect threats, errors and health issues and address them before they happen
- More opportunities: new applications, products and services can provide a solid competitive advantage
Stream processing- definition
Stream processing is the real-time processing of data continuously, concurrently, and in a record-by-record fashion. SP treats data not as static tables or files, but as a continuous infinite stream of data integrated from both live and historical sources.
- Accessibility: live data can be used while still in motion, before being stored
- Completeness: historical data can be streamed and integrated with live data for more context
- High throughput: high-velocity, high-volume data can be processed with minimal latency.
Using SQL for streaming analytics
SQL is the standard for data management, because it is declarative (applications can be written in just a few lines of high-level queries), it is reusable and portable across data management platforms, and most importantly, distributive: queries can be automatically optimized and distributed by the underlying streaming analytics platform.
Query-then-(maybe)-store vs store-then-query
Streaming applications treat data not as static tables or files, but as persistent, infinite streams. In database terms, instead of running a query on data collected in a database, streaming analytics uses queries that execute forever over the arriving data, avoiding the delays of traditional ETL processes, eliminating the need to ever re-execute queries, and reducing the amount of memory used to process millions of records per second.
BENEFIT: The results are incrementally generated as a continuous operation, which provides a) true real-time results and b) the option to not store the data, therefore saving on costs and sheer processing power.
Stream processing vs. microbatch
Due to batching and intermittent query access, the microbatch design is vulnerable to data loss and vital delays. That creates the risk of incorrect, incomplete or irrelevant analysis, conditional fault tolerance and reliability, and out-of-order timing issues.
BENEFIT: the record-by-record, distributed model employed by stream processing guarantees accuracy, completeness, and continuity. Ad-hoc query access allows operators to update or create streaming applications in minutes, without interrupting execution.
Streaming analytics vs. Complex Event Processing (CEP)
Streaming analytics is designed to process data streams with a high ratio of event throughput versus number of queries. In contrast, CEP utilizes event-by-event processing and aggregation often with large numbers of rules or business logic. CEP engines are optimized to process discreet business events- for example, to compare out-of-order or out-of-stream events, applying decisions and reactions to event patterns, and so on.
BENEFIT: Streaming analytics presents lower latencies when volumes increase.
SQLstream Blaze connects to a wide range of technologies to attain streaming ingestion, streaming analytics, and live actions.
IoT | Primex – Boosting IoT Profits in the Cloud
IoT companies approach the market with a capability and an integrated product, but seldom appreciate the lifecycle costs of supporting a sensor or controller for years. Successful products grow in count as happy customers purchase more and more of them. As more of the products are purchased the load on the cloud hosting and workflow expands at an explosive rate. Failure to understand and architect for this load places the company’s margins and profitability in jeopardy, as the ongoing costs strip away the one-time margin achieved when the consumer purchased the original product.
Primex manages a robust native-architected IoT platform upon AWS, managing over 20 million sensor reading updates per day from over 200k integrated products. The company had long taken pride in the efficiency of its cloud platform, but the ongoing costs continued to pressure the margin goals of the business.
SQLstream changed the game by enabling Primex to reimplement their AWS Lambda functions as continuous SQL queries, significantly reducing infrastructure and processing costs and restored Primex’s weather monitoring service to profitability:
- Reducing AWS Kinesis infrastructure from 600 shards down to 34
- Reducing associated AWS costs by more than 60%
- Savings estimated at $116K/yr including cost of SQLstream technology
It’s important to consider that many consumer sensor devices have a one-time, upfront cost while IoT service providers have to support the ongoing, monthly costs for processing the data from these sensors. Thus, minimizing these costs is directly related to the long-term profitability of the business.
Primex is a global market leader of synchronized time, environmental monitoring, and facility compliance technologies for Healthcare, Education, Manufacturing, and other industries.
In particular, Primex’s AcuRite division makes several models of Internet-connected weather stations with best-in-class mobile and web-based apps. The AcuRite weather stations are available with tons of sensors including rainfall, wind speed and direction, temperature, humidity, barometric pressure, and recently introduced lightning sensors.
These sensor data are then uploaded to the cloud to calculate derived values such as dew point, wind chill, heat index, feels like, rain rate and wind speed average before streaming these updates to the user’s web and mobile devices. In 2016, the AcuRite team decided to replace their legacy, hosted architecture with a more efficient and flexible native cloud architecture. They chose Amazon Web Services (AWS) as their platform of choice for their agile team of engineers because they didn’t want to dedicate people to maintaining an open source solution like Apache Spark. The Primex team designed and implemented a serverless architecture for Primex including AWS Kinesis Streams, AWS Lambda, and AWS CloudWatch to minimize maintenance efforts and focus on continued new product development.
When Primex cutover to the new AWS architecture, challenges emerged when operating AWS’s lambda service at Internet scale. With more than 154,000 internet-connected devices in the field, Primex was processing close to 67,300 AWS Lambda requests every 5 minutes.
Architectural inefficiency was a problem. While a Kinesis Stream shard can handle up to 1 MB/sec, a single AWS Lambda function cannot process events that fast and so it quickly built up a large backlog of events in the queue. To mitigate the Lambda performance problem, Primex started provisioning more and more Kinesis shards so that each could have their own Lambda functions operating in parallel. By closely monitoring the latency of each Lambda queue, Primex continued to increase the number of shards and parallel Lambda functions to 600+ before event processing throughput was satisfactory.
These “workarounds” came with significant costs. Each additional Kinesis shard comes with a ‘per Shard Hour’ cost – regardless of how much data is flowing through the pipe plus the cost for each Lambda function execution (almost 20 million/day) and for the CloudWatch monitor service (with 3 log messages per Lambda invocation). This portion of the AWS serverless architecture was costing Primex $565 per day (or $206,225/year), pressuring product sales margins.
Instability had also become a major pain point. Amazon’s DNS services caused months of grief with frequent outages and lengthy recovering times. After one 4-hour outage, it took the Kinesis- and Lambda-based message queuing component of the solution more than 20 hours to process the backlog of sensor updates. Beyond the system impacts, there was the operational impact caused by the hours of lost sleep by the operations team who was forced into 24-hour shifts to resolve these recurring issues. Primex realized they needed a more stable and efficient solution.
In 2016, Primex Chief Architect, Kevin Runde, attended Amazon’s re:Invent show in Las Vegas seeking a solution to escalating costs and platform instability and was intrigued by SQLstream’s unparalleled performance, high-volume ingest, and low-latency response times. If SQLstream could really handle 1.8M events/second/core, then it could completely replace all Lambda functions at a lower fixed cost rather than paying per transaction.
Immediately after the AWS re Invent show, Kevin went home and downloaded SQLstream’s Blaze community edition and learned the product with hosted tutorials and technical webinars. Kevin quickly built a prototype with SQLstream Blaze performing the calculations currently handled with AWS Lambda and they were impressed by the performance.
SQLstream engineers helped Kevin complete the project, lending their experience in building high-performance applications and helping Primex embrace some paradigm shifts in stream processing. As a result, Kevin was able to exceed performance expectations, while running SQLstream Blaze on a single AWS instance and connecting to a live stream on Kinesis. Primex went live with their new solution in October 2017.
As a result of moving their stream processing from AWS Lambda to a single AWS hosted instance of SQLstream Blaze, Primex was able to take full advantage of AWS Kinesis throughput and run each shard unthrottled and at full capacity. Instead of using 600+ Kinesis shards to trickle out events to on-demand Lambda functions, they were able to reduce Kinesis Streams down to just 14 input shards and 20 output shards.
Subsequently, Primex was able to reduce their AWS costs by 60% or $565 per day ($206,225/year) with a net savings of about $320 per day (or $116,800/year).
Cost savings came from 3 primary areas:
- Reduced Kinesis “Shard Hour” fees – directly related to SQLstream Blaze enabling them to reduce the number of Shards from 600+ to 34 (14 input and 20 output shards)
- Reduced Lambda fees – due to moving all of the calculations and aggregation of the readings to SQLstream Blaze
- Reduced CloudWatch fees – due to reduced Kinesis Shard Metrics and reduced amount of Lambda Logging.
Moving forward, Primex is looking to expand their remote sensor monitoring business into new areas after having resolved one of their biggest operational cost issues.
Cloud | Streaming ingestion and analytics for automated ad service
A robust advertising technology platform, Rubicon— also one of the biggest cloud and Big Data computing systems in the world—leverages over 50,000 algorithms and analyzes billions of data points in real-time to deliver the best results for sellers and buyers. And with over 65,000+ CPUs and 5.0 Petabytes of storage working at 100 gigabits data/second, they needed an efficient solution that would permit adaptive scaling without the added cost.Learn More
Smart cities | Streaming analytics for real-time traffic control
The Roads and Maritime Services needed a streaming ingestion and analytics solution to solve the changing data puzzle including its network control systems, databases, data warehouses, and vast vehicle and road sensor network covering the entire Sydney metropolitan area. SQLstream built a Smart City solution that acquires and analyzes all this data in real time, loading the results into a live platform controlling traffic flow, public transportation, and incident response applications. At the same time, results are streamed into the RMS data storage solutions, where it is saved for later use.Learn More
Financial | Streaming analytics for cybersecurity
InfoArmor, a data security company, employed SQLstream Blaze to power its real-time security intelligence platform servicing over 600 banks and financial institutions, thus improving its services and increasing its customer base capacity from 600k to over 10M. The SQLstream analytics solution runs continuous analytics on all data streams and automatically deploys user-authorization revisions and real-time alerts when network or user security is at risk.Learn More
Telco | Streaming IP Network and CDR Analytics for fraud detection, real-time rating and least-cost routing
Veracity Networks needed to real-time the analysis of their CDR records and maximize the capability of their SIP access and VoIP platforms to boost operational efficiency and maintain market share. The SQLstream streaming solution provides continuous acquisition and analysis of network and call data, enabling powerful time-based and geospatial queries for instant reductions in churn rates and infrastructure costs, plus automated protection measures against fraudulent activities and network attacks.Learn More
Telco | Streaming integration for real-time 9-1-1 responses
ECaTS, a Cloud-based 9-1-1 Data Analytics and Management Information System, used Blaze and the SQLstream Emergency Response StreamApp to deliver ECaTS Dashboard, a streaming analytics cloud solution that informs emergency responders on what is happening on the ground as it’s happening, live. The Dashboard can now capture, filter and analyze all emergency calls in California, and use the results to identify anomalies in 9-1-1 call patterns, isolate and monitor emergencies as they are developing, and manage team and ground response continuously and in real time. The implementation resulted in adaptive scalability (the system is now able to process country-wide data on a per-need basis) and response accuracy of less than 1s, proven to be vital in managing emergencies like natural disasters or the San Bernardino terrorist attack.Learn More
Product | SQLstream Blaze
SQLstream Blaze is a streaming analytics platform enabling anyone to create streaming applications in minutes, that deliver streaming ingest, analytics, and live action.Learn More
Cloud | SQLstream/Amazon Kinesis Analytics
The combination of Amazon Kinesis Analytics and SQLstream Blaze makes it easier than ever to cost-effectively ingest, analyze, and manage streaming data on and between cloud and on premises.Learn More
Partnerships | Blaze integrates with Teradata Unified Data Architecture
Businesses can react to their live data by processing each new record as it arrives in SQLstream Blaze, with the integration with Teradata enabling Blaze to join data at rest with arriving data in motion, and to maintain the accuracy of data at rest in the Teradata Database through continuous dataload and ETL.Learn More
Streaming Analytics at the edge and in the cloud
IoT applications, mobile devices, wearables, industrial sensors, as well as many software applications and services can generate staggering amounts of streaming data – sometimes TBs per hour – that need to be ingested, analyzed, and managed as the data arrive, to enable valuable business actions continuously and in real time. The combination of Amazon Kinesis Analytics and SQLstream Blaze offers the best of on-premises and cloud advantages.Learn More
Teradata & SQLstream Blaze Unified Data Architecture
The integration with Teradata enables SQLstream Blaze to join data at rest with arriving data in motion, and to maintain the accuracy of data at rest in the database through streaming ingestion.Learn More
SQLstream StreamLab is a drag-and-drop platform for building real-time dashboards and applications over streaming and time-series data, accessible to anyone (no need to code) and continuously (no need to ever interrupt execution).Learn More
The Forrester Wave™: Big Data Streaming Analytics, Q1 2016
Forrester evaluates 15 leading streaming analytics vendors and places SQLstream in the leaders' circle.Learn More
Streaming ingestion | Webinar recording | Guests: Teradata and ECaTS
SQLstream and our guests explore, explain, and demo streaming ingestion, continuous integration, and continuous load cases.Learn More
Streaming analytics | Webinar recording | Guest: Forrester, Inc.
SQLstream and Forrester discuss and exemplify the benefits of streaming analytics.Learn More