SQLstream s-Server Overview

<< Click to Display Table of Contents >>

Navigation:  Understanding Streaming SQL Concepts >

SQLstream s-Server Overview

Previous pageReturn to chapter overviewNext page

SQLstream s-Server is designed for the low latency, high volume, rapid integration needs of today's real-time businesses. SQLstream s-Server processes transactions from a variety of sources continuously, providing streaming analytics, and outputting data to multiple destinations. Complex, time-sensitive transformations and analytics are simple to configure, and they execute continuously across multiple input data sources and write to multiple destinations.

As in RDBMS systems, you use SQL to manipulate data in s-Server. However, contrast to traditional RDBMSs, which process static, stored data with repeated single-shot queries, in s-Server, data is flowing and queries are open-ended. While there are a number of key differences, if you have worked with SQL in an RDBMS context before, you will find that many procedures are similar in s-Server.

Streaming Data

Streaming data are processed as a continuous flow. You can set up s-Server to monitor a continuously changing log file, or a on ongoing network feed, or Kafka messages, and other systems that produce data in columns. s-Server parses such data in CSV, JSON, XML, key pair, and Google protocol buffers. Examples include financial trading data, internet clickstream data, sensor data, and exception events. SQLstream processes multiple input and output streams of data, for multiple publishers and subscribers.

Streaming data are time-sensitive data. Regardless of where it originates, data will be represented as sequences of time-stamped messages. (By default, such timestamps are established when data enters s-Server, but you can also configure s-Server so that such timestamps reflect when data was collected.

Streaming Queries

A streaming query is a continuous, standing query that executes over streaming data. SQLstream s-Server processes data streams using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries can be event-driven and can aggregate over rolling or periodic time windows.

Rowtimes, the Stream Clock, and Rowtime Bounds

Streams operate in a continual present, The timestamp for each row is called rowtime. The arrival of a row establishes the current time of the stream, informally called the "stream clock."

Since streaming queries are typically time-sensitive, these rowtimes often determine when processing can proceed based on thresholds or aggregation criteria. Queries that process multiple streams of input can encounter greatly varying data arrival rates on those input streams.

SQLstream enables producers to publish rowtime bounds that increase efficiency in processing multiple streams that may produce data at greatly varying rates. Publishing a rowtime bound promises that no subsequent row from this producer will have a rowtime earlier than the bound. That certainty frees queries and other processes to proceed with actions that might otherwise have waited to include such a row, that is, a row with a rowtime earlier than the now-known bound.

Stream Data Processing with SQL

Stream data is processed using familiar relational operators to handle time windows. Time windows vary depending on the application, from milliseconds to many hours, or even days. An application or user can use SQL to create a relational view over various message streams, transforming the data by applying relational operations such as aggregation, correlation, and filtering.

You will generally set up s-Server applications as pipelines, with data flowing in at one end, analyzed in the middle, and outputted at the end, though it's also the case that s-Server lets you create multiple views throughout the pipeline. You can set up queries so that different applications and users can each get their own customized view of the streaming data.

SQLstream and RDBMSs

s-Server complements an RDBMS. Both share a common data model centered on processing relational rows, queries, and views. They share common data manipulation and definition languages standardized as SQL. They share a common security model and APIs, such as JDBC, and a common representation of metadata. s-Server uses predetermined queries over arriving data, processing continuously and easy to maintain even during execution. An RDBMS is used for ad hoc queries over historical data, processing each query until it terminates.

The two work well together. s-Server can use static predetermined queries to preprocess data for an RDBMS and also respond to incoming messages by triggering dynamic queries on the data stored in an RDBMS. Queries in s-Server typically are scoped over explicit time windows based on business rules. The business rules typically specify time windows measured in minutes or hours, but time windows over any duration from milliseconds to months are possible. Both s-Server and an RDBMS can be used for transaction processing.