Streaming Data and Windows
In a conventional relational database application, queries perform aggregations and joins over finite sets of rows. Aggregate functions like SUM, COUNT, or MAX, and operations like JOIN, GROUP BY, or PARTITION, must know all the data to produce correct results.
In a streaming context, you need to apply such functions and operations over a sliding time-based subset of records called a WINDOW. Each window contains, at any given time, a subset of the rows streaming by. Defining such windows enables a query to identify the finite set of rows (from an otherwise infinite stream) over which to perform the desired aggregations and joins. This chapter shows using these windows for particular kinds of aggregations. (For the complete syntax, syntax chart, and more examples with more detailed explanations, see the WINDOW clause topic in the s-Server Streaming SQL Reference Guide, which also contains detailed examples showing a sample data set and the contents of various windows at various times. For more detail, see the topics Analytic and Aggregate functions in the s-Server Streaming SQL Reference Guide.
Aggregation can be done as windowed aggregation over rolling windows, such as generating a moving average, or as streaming aggregation over periodic windows, such as generating an hourly report or daily roll-up.
This same input stream can be processed by either form of aggregation. The input stream is created by the following code:
CREATE SCHEMA "Trading"
DESCRIPTION 'Contains the equity trades application objects';
SET SCHEMA '"Trading"';
CREATE TABLE "Tickers" (
"Ticker" VARCHAR(5) NOT NULL,
PRIMARY KEY ("Ticker")
CREATE STREAM "TRADES" (
"Ticker" VARCHAR(5) NOT NULL,
"Shares" INTEGER ,
"Price" DECIMAL(6,2) );
The Set Schema command enables referencing objects in the schema without fully qualifying the object name with the schema name (see also the SET SCHEMA command in the s-Server Streaming SQL Reference Guide). Since no quotes surround the column names given in lower case, they are saved in upper case. If lowercase distinctions are preferred for the column names, those names would need quotes around them, as has been done for the schema, table, and stream names.
As illustrated in the graphic below (Examples of Streaming and Windowed Aggregation), the input stream can be read and processed by SQLstream statements performing windowed aggregation, shown on the right, or by SQLstream statements performing streaming aggregation, shown on the left.
Windowed aggregation uses a rolling window with a specified set of rows. A windowed aggregation query can define a rolling window as having a fixed number of rows (a row-based window) or a varying (though finite) number of rows. With a varying number of rows, the number is not fixed in advance but rather is determined by some logical specification (time-based window). The windowed-aggregation window contains any newly qualifying row and all older qualifying rows up to the specified number of rows or, for a time-based specification, all rows meeting that specification.
CREATE OR REPLACE VIEW "Orders & Shares Trends"
DESCRIPTION 'rolling-period orders and volume trends for TRADES'
AS SELECT STREAM
T.ROWTIME AS "orderTime",
"Ticker", "Shares", "Price",
COUNT (*) OVER "last_Hour" AS "ordersPerHour",
COUNT (*) OVER "last_ten_min" AS "ordersLastTen",
SUM("Shares") OVER "last_Hour" AS "sharesPerHour",
SUM("Price" * "Shares") OVER "last_Hour" AS "amtPerHour"
FROM TRADES AS T
WINDOW "last_Hour" AS (RANGE INTERVAL '1' HOUR PRECEDING),
"last_ten_min" AS (RANGE INTERVAL '10' MINUTE PRECEDING);
Note: RANGE and ROWS operate differently, as follows:
RANGE is only for time-based windows, and requires a valid INTERVAL specification, which defines a time duration. INTERVAL '1' HOUR is legal, but ROWS 10 is not.
For row-based windows, the proper syntax example is ROWS 10 PRECEDING.
See WINDOW clause in the Streaming SQL Reference guide for more details.
Each row arriving on TRADES changes the set of rows defined within the window and produces an output row with the new aggregated values.
Streaming aggregation uses time-based windows with many-rows-in, one-row-out behavior, applying the SQL constructs SELECT STREAM DISTINCT ... or, more commonly, SELECT STREAM ... GROUP BY.
In this type of aggregation, incoming rows that belong to the group are added to the running aggregations. An incoming row that belongs to the next group ends the current group, causes that current group's aggregations to be emitted as a single output row, and begins the next group.
For streaming aggregation using GROUP BY in a streaming SELECT statement, the first group-by term must be time-based. For example GROUP BY FLOOR(S.ROWTIME) TO HOUR will yield one output row per hour for the previous hour's input rows. The GROUP BY can specify additional partitioning terms. For example, GROUP BY FLOOR(S.ROWTIME) TO HOUR, USERID will yield one output row per hour per USERID value.
CREATE OR REPLACE VIEW "Orders & Shares Groups"
DESCRIPTION 'rolling-period order groups for TRADES'
FLOOR(T.ROWTIME TO HOUR) AS "hour",
COUNT(*) AS "ordersPerHour"
FROM TRADES AS T
GROUP BY FLOOR(T.ROWTIME TO HOUR), "Ticker"
Each row arriving on TRADES adds to the count of orders for that ticker, i.e., the trade represented by that row, within the hour in which it arrived. Each ticker named in trades arriving during that hour provides a single output row in the streaming aggregation output, showing how many trades for that ticker occurred in that hour.