IoT companies approach the market with a capability and an integrated product, but seldom appreciate the lifecycle costs of supporting a sensor or controller for years. Successful products grow in count as happy customers purchase more and more of them. As more of the products are purchased the load on the cloud hosting and workflow expands at an explosive rate. Failure to understand and architect for this load places the company’s margins and profitability in jeopardy, as the ongoing costs strip away the one-time margin achieved when the consumer purchased the original product.
Primex manages a robust native-architected IoT platform upon AWS, managing over 20 million sensor reading updates per day from over 200k integrated products. The company had long taken pride in the efficiency of its cloud platform, but the ongoing costs continued to pressure the margin goals of the business.
SQLstream changed the game by enabling Primex to reimplement their AWS Lambda functions as continuous SQL queries, significantly reducing infrastructure and processing costs and restored Primex’s weather monitoring service to profitability:
- Reducing AWS Kinesis infrastructure from 600 shards down to 34
- Reducing associated AWS costs by more than 60%
- Savings estimated at $116K/yr including cost of SQLstream technology
It’s important to consider that many consumer sensor devices have a one-time, upfront cost while IoT service providers have to support the ongoing, monthly costs for processing the data from these sensors. Thus, minimizing these costs is directly related to the long-term profitability of the business.
Primex is a global market leader of synchronized time, environmental monitoring, and facility compliance technologies for Healthcare, Education, Manufacturing, and other industries.
In particular, Primex’s AcuRite division makes several models of Internet-connected weather stations with best-in-class mobile and web-based apps. The AcuRite weather stations are available with tons of sensors including rainfall, wind speed and direction, temperature, humidity, barometric pressure, and recently introduced lightning sensors.
These sensor data are then uploaded to the cloud to calculate derived values such as dew point, wind chill, heat index, feels like, rain rate and wind speed average before streaming these updates to the user’s web and mobile devices. In 2016, the AcuRite team decided to replace their legacy, hosted architecture with a more efficient and flexible native cloud architecture. They chose Amazon Web Services (AWS) as their platform of choice for their agile team of engineers because they didn’t want to dedicate people to maintaining an open source solution like Apache Spark. The Primex team designed and implemented a serverless architecture for Primex including AWS Kinesis Streams, AWS Lambda, and AWS CloudWatch to minimize maintenance efforts and focus on continued new product development.
When Primex cutover to the new AWS architecture, challenges emerged when operating AWS’s lambda service at Internet scale. With more than 154,000 internet-connected devices in the field, Primex was processing close to 67,300 AWS Lambda requests every 5 minutes.
Architectural inefficiency was a problem. While a Kinesis Stream shard can handle up to 1 MB/sec, a single AWS Lambda function cannot process events that fast and so it quickly built up a large backlog of events in the queue. To mitigate the Lambda performance problem, Primex started provisioning more and more Kinesis shards so that each could have their own Lambda functions operating in parallel. By closely monitoring the latency of each Lambda queue, Primex continued to increase the number of shards and parallel Lambda functions to 600+ before event processing throughput was satisfactory.
These “workarounds” came with significant costs. Each additional Kinesis shard comes with a ‘per Shard Hour’ cost – regardless of how much data is flowing through the pipe plus the cost for each Lambda function execution (almost 20 million/day) and for the CloudWatch monitor service (with 3 log messages per Lambda invocation). This portion of the AWS serverless architecture was costing Primex $565 per day (or $206,225/year), pressuring product sales margins.
Instability had also become a major pain point. Amazon’s DNS services caused months of grief with frequent outages and lengthy recovering times. After one 4-hour outage, it took the Kinesis- and Lambda-based message queuing component of the solution more than 20 hours to process the backlog of sensor updates. Beyond the system impacts, there was the operational impact caused by the hours of lost sleep by the operations team who was forced into 24-hour shifts to resolve these recurring issues. Primex realized they needed a more stable and efficient solution.
In 2016, Primex Chief Architect, Kevin Runde, attended Amazon’s re:Invent show in Las Vegas seeking a solution to escalating costs and platform instability and was intrigued by SQLstream’s unparalleled performance, high-volume ingest, and low-latency response times. If SQLstream could really handle 1.8M events/second/core, then it could completely replace all Lambda functions at a lower fixed cost rather than paying per transaction.
Immediately after the AWS re Invent show, Kevin went home and downloaded SQLstream’s Blaze community edition and learned the product with hosted tutorials and technical webinars. Kevin quickly built a prototype with SQLstream Blaze performing the calculations currently handled with AWS Lambda and they were impressed by the performance.
SQLstream engineers helped Kevin complete the project, lending their experience in building high-performance applications and helping Primex embrace some paradigm shifts in stream processing. As a result, Kevin was able to exceed performance expectations, while running SQLstream Blaze on a single AWS instance and connecting to a live stream on Kinesis. Primex went live with their new solution in October 2017.
As a result of moving their stream processing from AWS Lambda to a single AWS hosted instance of SQLstream Blaze, Primex was able to take full advantage of AWS Kinesis throughput and run each shard unthrottled and at full capacity. Instead of using 600+ Kinesis shards to trickle out events to on-demand Lambda functions, they were able to reduce Kinesis Streams down to just 14 input shards and 20 output shards.
Subsequently, Primex was able to reduce their AWS costs by 60% or $565 per day ($206,225/year) with a net savings of about $320 per day (or $116,800/year).
Cost savings came from 3 primary areas:
- Reduced Kinesis “Shard Hour” fees – directly related to SQLstream Blaze enabling them to reduce the number of Shards from 600+ to 34 (14 input and 20 output shards)
- Reduced Lambda fees – due to moving all of the calculations and aggregation of the readings to SQLstream Blaze
- Reduced CloudWatch fees – due to reduced Kinesis Shard Metrics and reduced amount of Lambda Logging.
Moving forward, Primex is looking to expand their remote sensor monitoring business into new areas after having resolved one of their biggest operational cost issues.