Aggregation Windows

<< Click to Display Table of Contents >>

Navigation:  Understanding Streaming SQL Concepts > Streaming Data and Windows >

Aggregation Windows

Previous pageReturn to chapter overviewNext page

The nature of streaming data requires that calculations, such as SUM, AVERAGE, population variance, and so on, be calculated within a "window"--that is, a specified interval of time or other numeric value, such as "every ten minutes" or "the last thirty seconds" or "every millimeter," and so on.

We’re going to call the set of rows to which the analytic is applied the analytic window. In the diagram below, the analytic window, highlighted in blue, lasts from 0 seconds after 2:00 on February 2, 2022, to 10 seconds after 2:00 on the same date.

conc_analytic_window_example

 

 

If you're used to working with tables that have fixed number of rows, you can think of windows as a kind of "virtual" table--windows set up a fixed window so that you can run queries on them. Actually, they're not really fixed windows--they evolve over time, and are sometimes open ended. But it's still a rough idea of how windows work in streaming SQL.

Time-based aggregation windows use rows' rowtime in order to determine windows.

Many windows end at current row, which is the latest rowtime that s-Server has seen.

conc_current_row_and_rowtime

Here are some examples of aggregation windows:

Windows That Use a Set Period of Time

A sliding window always applies an analytic to a set period of time, which moves along steadily. That is, a sliding window is always exactly 10 minutes of data (or, as we’ll see, 10 rows).

A hopping window also applies an analytic to a set period of time, but moves ahead in jumps, such that, for example, one minute is subtracted from the analytic window every minute.

5_3_indicator Hopping windows are supported as of s-Server 5.3.

Windows That End Earlier Than the Present

Offset windows also use a set period of time--actually, these can be sliding or hopping--but the time period ends at some time earlier than the present--such as "10 seconds ago" or "1 minute ago." These say to s-Server "give me all the rows for a ten minute interval that ends 1 minute ago."

conc_offset_window

5_3_indicator Offset windows are supported as of s-Server 5.3.

Windows that Start Calculating at Time X, and Keep Calculating.

An unbounded window calculates from first available data until the current row.

A snapping window applies an analytic to a time period that began one hour ago, and starts over at the top of the hour.

Tumbling Time Windows

Tumbling windows accumulate a batch of rows of the input to produce a set of aggregated result rows. Technically, they are not "windows"--they're the result of a GROUP BY statement run on rowtime.

Further Modifications for Windows

Distinct Values Only

For all of these windows, you can specify DISTINCT using SELECT STREAM DISTINCT.

Using Partitions

You can also use PARTITION BY to segment results. The window function is applied to each partition separately and computation restarts for each partition.

The Special Way s-Server Handles Current Row

Many windows end with the present, which is what we call current row.

conc_current_row

There are some slightly tricky things about current row. Mostly, you need to know that s-Server thinks of current row as “the first time we see a new rowtime,” and starts making calculations accordingly, The tricky thing is that other rows may come in with the same rowtime. These will be incorporated into later calculations, but calculations begin as soon as the first row hits s-Server. That's a good thing if you want your calculation to tell you when the temperature of a set of sensors exceeds an average--so that you can shut down whatever is causing the temperature to go up before things light on fire. But it can be a little weird to imagine that windows that involve current row incorporate rows that have the current rowtime, but not all of them.