Promoting a Column to Rowtime

<< Click to Display Table of Contents >>

Navigation:  Building Streaming Applications > Tutorial 2: Introduction to Coding with Streaming SQL > Tutorial 2B: Coding Streaming SQL in SQLstream sqlLine >

Promoting a Column to Rowtime

Previous pageReturn to chapter overviewNext page

In working with streaming data, you need to be aware of what time data arrives. Because streams continually update, and may update from multiple sources, time is an important concept in Streaming SQL. Time in streams is monotonic, meaning it always goes forward.

Working with ROWTIME

This monotonically increasing time is tracked as a column value called ROWTIME. By default, ROWTIME is the time a row enters the stream, though you can also configure the system to assign this value to a time generated by the data source.

This will help you produce meaningful analysis about this data. Every streaming row carries a time value called a rowtime, implemented as a column in every row. The rowtime for newly arriving rows cannot be less than the rowtime for previously received rows (though it can be equal to the rowtime of the current row).

Rowtimes can be implicit or explicit.

Implicit rowtimes are established by the "arrival time" of the row: the time that s-Server receives the row. Even though there is no explicit mention of ROWTIME, it is nevertheless part of that row:

INSERT INTO logStream VALUES('test', 'message1');

 

Explicit rowtimes are provided by the source application with INSERT, as in the following examples. In each case, a timestamp from the source application is explicitly assigned to ROWTIME.

INSERT INTO logStream (ROWTIME, source, message)
VALUES(TIMESTAMP '2014-09-30 19:13:00', 'test', 'message1');

 

INSERT INTO logStream (ROWTIME, source, message)
VALUES(LOCALTIMESTAMP, 'test', 'message1');

 

Note: When setting an explicit ROWTIME, TIMESTAMP must be monotonically increasing from the previous TIMESTAMP. In the example above, '2014-09-30 19:13:00' needs to be later than the previous TIMESTAMP.

In either case, the ROWTIME of the arriving row establishes the current time of the stream, known as the stream clock.

Working with VIEWs

In many cases, you will want to set an explicit rowtime, since often your data will have more meaningful timestamps than the time a row entered the column. To set the rowtime explictly, we'll create a VIEW on the foreign stream. VIEWs are saved queries, which you can later access as shorthand for the query. VIEWs function as "macros" for queries. When you write a statement that references a VIEW, the view runs the query as defined in the VIEW.

For example, if you write the statement CREATE VIEW all_emps AS SELECT * FROM emp, you can later query or analyze all_emps as if it were itself a stream or table. The results are the same as they would be if you ran the original code.

VIEWs are more useful in a streaming context than a database context, mostly because of the kinds of complex systems to which streaming SQL lends itself. Such systems often involve multiple data sources that need to be analyzed at several junctures in the pipeline)

Determining current ROWTIME

You can check the current rowtime for a stream with the following query:

SELECT STREAM ROWTIME from "buses"."buses_stream";

 

This query should return times from today:

sbur_sqlline_unpromoted_rowtime

These rowtimes reflect when data entered the stream. We want to change the ROWTIME so that it reflects when data was actually collected.

Promoting data time to ROWTIME

To do so, we'll promote a column with this information to ROWTIME, using the syntax "as ROWTIME". In this case, that column is "reported_at", the column with information on when the bus reported its data. Promoting data time to ROWTIME makes a rowtime explicit. The following code creates a view that promotes the column "reported_at" to ROWTIME.

Enter the following code into the SQLline prompt:

SET SCHEMA '"buses"';

CREATE VIEW "buses_with_rowtime" AS SELECT STREAM

 "reported_at" AS ROWTIME,

--This promotes the "reported_at" column to ROWTIME

 "id",

 "shift_no",

 "trip_no",

 "route_variant_id",

 "waypoint_id",

 "last_known_location_state",

 "lat",

 "lon",

 "speed",

 "bearing",

 "driver_no",

 "prescribed",

 "highway",

 "created_at",

 "updated_at"

 FROM "buses"."buses_stream";

 

Note that the view's query uses the STREAM keyword. You need to use this keyword whenever you query a stream.

In the view, "buses_with_rowtime," ROWTIME is explicitly set to the value of the column "reported_at". In other words, our stream is now timed according to its source.

You can test this by running the following statement:

SELECT STREAM ROWTIME from "buses"."test_with_rowtime";

 

Observe that the dates displayed (right column below) are different then when you originally queried rowtime (left column below). These dates reflect when the buses actually reported data.

Without ROWTIME Promotion


sbur_sqlline_unpromoted_rowtime

sbur_sqlline_promoted_rowtime

Next: Creating a VIEW on the Foreign Stream that Performs a Basic Calculation