SAN FRANCISCO | SQLstream, the leading streaming analytics provider and company powering Amazon Kinesis Analytics, has just celebrated its 4th anniversary of seamless integration between SQLstream Blaze, its streaming analytics platform, and Google BigQuery, Google’s fully managed, petabyte scale, low-cost analytics data warehouse. This integration helps companies who want to move toward hybrid cloud infrastructure to easily move large volumes of enterprise data to Google Cloud Platform.
This announcement follows news that Amazon Web Services (AWS) licensed and implemented technology from of SQLstream Blaze, to power the Amazon Kinesis Analytics service. In addition, the Blaze Adapter enables businesses to securely and cost-effectively ingest, analyze, and manage streaming data on and between AWS cloud and on-premises environments.
The Blaze connector for Google BigQuery allows companies seeking to move to a hybrid cloud infrastructure to easily ingest, process and migrate massive amounts of data from all available sources, in all formats, and at volumes of millions of records per second. Unlike conventional ETL solutions that simply map data between systems, or alternative streaming solutions that only deliver near-time results, the Blaze connector enables the real-time ingestion of streaming data, integrates it with historical data, and delivers it into BigQuery environments, continuously and with millisecond precision.
Case study: Streaming Analytics for traffic congestion prevention
The streaming analytics integration has already been deployed through a streaming application that controls live traffic and congestion.
GPS records are collected from several specialist data providers. Each GPS record contains the position, direction and speed of a vehicle. SQLstream collects GPS records from all the providers in real-time and turns the live GPS data feeds into real-time traffic flow information, and by applying a variety of congestion detection algorithms, is able to generate a map of the current traffic congestion positions, plus additional information as to the extent or severity of the problem.
SQLstream calculates real-time traffic analytics and displays the results in real-time on a map-based display. Roads are color-coded based on the average traffic speed relative to the posted speed limits for each road segment. Colored push-pins appear on the highways to provide further information on the potential problem, such as the average speed for every minute over the previous 15 minutes.
Google Big Query is used to store the real-time traffic information produced by SQLstream, and to generate a confidence index on the coverage and quality, and therefore the reliability, of the incoming GPS records. Using streaming analytics, data is continuously acquired, cleaned, conditioned, transformed and then periodically appended to the BigQuery table, for example, every minute or every 500,000 events.
SQLstream’s analysis of the road network is based on individual 10m road segments. Typically GPS records are not available for every vehicle for each and every road segment, so SQLstream interpolates missing road segment data in real-time, and continuously appends those results as well into the Google Big Query table – storing a complete real-time map of traffic flow for the entire road network at a level of granularity of every 10 meters.
How it works
SQLstream allows the user to zoom in and center the geographic map to an area of interest. This action defines a bounding box and the coordinates of the bounding box are used in this first Big Table query to extract the number of 10 meter road segments within the bounding box. The action of defining the bounding box in SQlstream launches a query similar to the following example:
select count(*) from road_elements
“reLatitude” between 7.7109920000000001
and 9.8525100000000005 AND
“reLongitude” between -65.754776000000007
This query retunes the number of road segments within the bounding box (in this example, there are 191,866 road segments) which is then plugged into the query executed by BigQuery in order to calculate the percentage of road elements for each GPS data provider actually used for calculating the road network coverage within the bounding box.
select vendorID, float(integer(count(*) / 191866 * 1000)/10) as Confidence
from (select vendorID, reID from sample.july2nd
float(reLatitude) between 7.7109920000000001
and 9.8525100000000005 AND
float(reLongitude) between -65.754776000000007
group by vendorID, reID)
group by vendorID order by Confidence desc
The query results are returned to SQLstream which displays the answer to the user. BigQuery is ideal for these types of geospatial coverage metrics. There is a high volume of meaningful data stored in the Big Query table and the calculation of coverage is available immediately. By choosing a variety of bounding boxes, it is possible to obtain coverage metrics a country-wide basis, on a metropolitan area basis, and on for a much smaller problem area.
SQLstream empowers people, services, and machines to take the next right action, continuously and in real time. SQLstream has designed its streaming analytics platform to enable anyone to create real-time applications from raw data in minutes, that deliver streaming data ingestion, streaming analytics, and live actions. SQLstream offers the only solution enabling businesses to continuously respond to changing conditions, every moment.
SQLstream is based in San Francisco, California. Download SQLstream Blaze for a live experience at http://sqlstream.com/download/.