Creating Streaming JOINs

<< Click to Display Table of Contents >>

Navigation:  Analyzing Data in s-Server >

Creating Streaming JOINs

Previous pageReturn to chapter overviewNext page

A Streaming JOIN refers to a JOIN in a streaming query where at least one of the relations being joined is a stream.

Streaming joins work just like regular table joins, but subject to the considerations implicit in dealing with streams, that is, rolling windows and rowtimes:

Rolling windows. A window defined on a stream is a rolling (sliding) window. As the current time progresses, the window excludes some rows while adding others. As a result, rows output by the join are generated incrementally.

Note: With rolling windows, an output row will be produced only once by a match on a given pair of tuples from the left and right input streams. In other words, an output row already produced by a prior match will not be produced again when a subsequent identical match is identified.

Output rowtimes. All output rows are produced in order of non-descending rowtime. (It is valid to have multiple output rows with the same rowtime.) The rowtime of a given output row is the rowtime at the point it was possible to calculate the output row:
oFor an inner join, the rowtime of an output row is the later of the rowtimes of the two input rows. This is also true for an outer join in which matching input rows are found.
oFor outer joins in which a match is not found, the rowtime of an output row is the later of the following two times:
the rowtime of the input row for which a match was not found, or
the later bound of the window of the other input stream at the point any possible match passed out of the window.

All streaming joins are implicitly windowed joins between the affected streams. If no explicit window is specified, the window specification CURRENT ROW is used.

Join limitations

SQLstream s-Server does not support left, right or full outer join operations applied to relations.

Stream-Window Joins Overview

Streaming joins take place over a time-based subset of records. By default, these subsets are ordered by row timestamps: rows from each stream whose ROWTIME values match are joined according to the JOIN condition.

A join with an explicit window is called a windowed join. This example matches bids and asks by ticker, shares, and price within a ten-second window. It is written with an inline window specification.

CREATE OR REPLACE VIEW "MatchBidsAndAsks"

DESCRIPTION 'match bids and asks for each ticker over a 10-second time window' AS

SELECT STREAM

 B.ROWTIME AS "bidTime",

 A.ROWTIME AS "askTime",

 B."ticker",

 B."shares" AS "bidShares",

 B."price" AS "bidPrice",

 A."shares" AS "askShares",

 A."price" AS "askPrice"

FROM SALES.BIDS OVER (RANGE INTERVAL '10' SECOND PRECEDING) AS B

JOIN SALES.ASKS OVER (ROWS CURRENT ROW) AS A ON A."ticker" = B."ticker"

 AND A."shares" = B."shares"

 AND A."price" = B."price";

 

For readability, or for reuse of a window specification within a query, you can use a WINDOW alias to define a named window. Here is the above example rewritten to use a WINDOW alias:

CREATE OR REPLACE VIEW "MatchBidsAndAsks"

DESCRIPTION 'match bids and asks for each ticker over a 10-second time window' AS

SELECT STREAM

 B.ROWTIME AS "bidTime",

 A.ROWTIME AS "askTime",

 B."ticker",

 B."shares" AS "bidShares",

 B."price" AS "bidPrice",

 A."shares" AS "askShares",

 A."price" AS "askPrice"

FROM SALES.BIDS OVER "lastTenSeconds" AS B

JOIN SALES.ASKS OVER (ROWS CURRENT ROW)AS A ON A."ticker" = B."ticker"

 AND A."shares" = B."shares"

 AND A."price" = B."price"

WINDOW "lastTenSeconds" AS (RANGE INTERVAL '10' SECOND PRECEDING);

 

Using Windows to Correlate and Join Streams

It's not usually meaningful to join two streams without using a "window" on one of them because the instantaneous content of the second stream is likely to be empty - the rowtimes of tuples in each stream would have to match exactly. In effect, every time a new tuple arrives in the stream, you can compare it to a sliding window of data in other streams (or stream views). The sliding window can be a time range (going back from now), or a number of rows.

The definition of the Edamame view "SuspectDebits" contains the following SELECT statement:

SELECT STREAM

      LF."accountNumber",

      "loginFailureCount",

      "transactionType",

       "amount"

 FROM "SuspectLoginFailures" AS LF

 JOIN "Transactions" AS TX

                   OVER (RANGE INTERVAL '1' MINUTE PRECEDING)

                   ON LF."accountNumber" = TX."accountNumber"

WHERE ("transactionType" = 'isDebit');

 

Using a WINDOW alias

You can factor out the RANGE INTERVAL window by giving it an alias, as in this equivalent SQL definition:

SELECT STREAM

      "SuspectLoginFailures"."accountNumber",

      "loginFailureCount",

      "transactionType",

      "amount"

 FROM "SuspectLoginFailures"  OVER "lastFew"

 JOIN "Transactions" OVER "lastFew"

        ON "SuspectLoginFailures"."accountNumber" = "Transactions"."accountNumber"

WHERE ("transactionType" = 'isDebit')

WINDOW "lastFew" AS (RANGE INTERVAL '1' MINUTE PRECEDING);

 

Using a separate WINDOW definition is particularly useful when joining a stream to two or more other streams as it is easier to maintain (for example, if you want to increase the gap from one minute to two.

Row-based windows, defined by row counts - SQL ROWS

As well as using time-based windows, you can simply use a defined number of rows (row-based windows).

SELECT STREAM

      ticker,

      MIN("price") OVER w AS PriceMin,

      MAX("price") OVER w AS PriceMax,

 FROM sales.bids

WINDOW w AS (ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING);

 

(See Windowed Aggregation on Streams.)

Note

When several streams are joined, the "secondary" streams don't all have to use the same WINDOW. They can also mix ROWS and INTERVAL windows.