Pumps, Streams, and Processing Pipelines
By default, each stream query in SQLstream is run independently from all other stream queries. However, it is often desirable for either performance or consistency reasons to share the output of one query as input to one or more other queries.
A potential complexity arises from three facts:
(1) that queries can be inserted or closed dynamically while SQLstream is processing streams,
(2) that data can arrive at varying rates, and
(3) that data can be emitted at varying rates after being processed.
In this context, ensuring that all such queries receive identical input from the time each of them becomes active requires some forethought.
In SQLstream this goal is accomplished by defining a stream that all such queries will listen for, and then creating a pump to feed that stream. The pump is based on the source views or queries. Using the pump compensates for the variations in the timing of the data sources; using the stream that the pump feeds ensures that every query listening for that stream sees the same set of results.
This procedure enables "processing pipelines," that is, modular sequences of processing steps, where each step performs filtering, aggregation, and transformation, providing its results to downstream consumers. Each such step thus also provides a public junction where its results may be
|•||inspected for debugging purposes,|
|•||analyzed for SLAs or regulatory compliance,|
|•||selected and repurposed by streams in other processing pipelines,|
|•||pumped into sink adapters or other streams, or|
|•||subscribed by JDBC client applications.|
The simple example that follows illustrates the basic view/stream/pump mechanisms.
This example uses the BIDS and ASKS streams already defined in the SALES schema included in the First SQLstream s-Server distributed with the SQLstream product. The view matches bids and asks by ticker, shares, and price within a sliding ten-second window by using a windowed join. Such a join is a streaming join over a time-based subset of records ordered by row timestamps.
CREATE OR REPLACE VIEW "MatchBidsAndAsksView"
DESCRIPTION 'match bids and asks for each ticker over a 10-second time window' AS
B.ROWTIME AS "bidTime",
A.ROWTIME AS "askTime",
B."shares" AS "bidShares",
B."price" AS "bidPrice",
A."shares" AS "askShares",
A."price" AS "askPrice"
FROM SALES.BIDS OVER (RANGE INTERVAL '10' SECOND PRECEDING) AS B
JOIN SALES.ASKS AS A ON A."ticker" = B."ticker"
AND A."shares" = B."shares"
AND A."price" = B."price";
After defining the view, you create a related stream to receive the view results and a pump to insert those results into that stream, as follows:
CREATE STREAM "MatchBidsAndAsks"
CREATE PUMP "MatchBidsAndAsksPump" STARTED as
INSERT INTO "MatchBidsAndAsks" SELECT * FROM "MatchBidsAndAsksView";
At this point, any queries created using "MatchBidsAndAsks" will all see the same data stream.
A conventional SQL application prepares and executes a statement with a SELECT... query and iterates through the returned result set until end of fetch is detected, when there are no more rows to return. The application then returns to doing something else.
In the SQLstream context of streaming data and queries that run forever, there is no obvious "end of fetch". Instead, the "get next row" call (i.e., ResultSet.next() for JDBC) blocks within the SQLstream client driver until one of the following two possibilities occurs:
|•||The next row becomes available, which could be a very long wait for some data sources, or|
|•||The statement is deliberately closed by the client application.|
In the latter case, the application is in effect "unsubscribing" from the stream. For more detailed discussion of preventing blocking, see the topic JDBC driver in the Integration Guide.
SQLstream extends the JDBC API to offer millisecond-granularity timeout intervals to make ResultSet polling more practical for an application that wants to remain responsive to other events.