Streams and Views

<< Click to Display Table of Contents >>

Navigation:  Understanding Streaming SQL Concepts > SQLstream Overview >

Streams and Views

Previous pageReturn to chapter overviewNext page

A stream is a continually updating data object, with columns of data similar to a database table. In database terms, a stream is a schema object that is a relation but which does not store data like as a finite relation (such as a table in a database). A stream can be written to by multiple writers (with INSERT statements, often contained in pumps) and read from by multiple readers (with SELECT statements). The number of records in a stream can be infinite, as with a log file that is continually read as new data is logged.

Querying a Stream

Queries work slightly differently in a streaming context than in a traditional database context. A relational database prepares and executes a statement with a SELECT... query and iterates through the returned result set until there are no more rows to return. In a streaming context, a SELECT statement essentially runs forever. This means that the "get next row" call ( such as, for JDBC: ResultSet.next() ) blocks within the SQLstream s-Server client driver until one of the following two possibilities occurs:

The next row becomes available, which could be a very long wait for some data sources, or
The statement is deliberately closed by the client application.

In the latter case, the application is in effect "unsubscribing" from the stream.

Note: SQLstream s-Server extends the JDBC API to offer millisecond-granularity timeout intervals to make ResultSet polling more practical for an application that wants to remain responsive to other events.

Streams and Time

Because streams continually update, time is an important concept in Streaming SQL. Time in streams is monotonically increasing, meaning it always goes forward. This monotonically increasing time is tracked as a column value called ROWTIME. ROWTIME is usually the time a row enters the stream, thought you can also configure the system to assign this value to a time generated by the data source.

Pipelines

Streaming SQL applications are often organized into pipelines, with multiple data sources that are merged and analyzed, and multiple writers to visualization systems such as StreamLab, as well as archiving systems, such as Hadoop or RDBMS systems.

Views

You can save queries as views, which you can later access as shorthand for the query. Views function as "macros" for queries. When you write a statement that references a view, the view runs the query as defined in the view.

For example, if you write the statement CREATE VIEW all_emps AS SELECT * FROM emp, you can later query or analyze all_emps as if it were itself a table. The results are the same as they would be if you ran the original code.

Views are more useful in a streaming context than a database context, mostly because of the kinds of complex systems to which streaming SQL lends itself. Such systems often involve multiple data sources that need to be analyzed at several junctures in the pipeline.