Tailing Foreign Tables with SQL/MED

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Reading Data into s-Server > Reading Data from RDBMS Sources >

Tailing Foreign Tables with SQL/MED

Previous pageReturn to chapter overviewNext page

In using the SQL/MED adapter to query remote database tables, you will often want to "tail" the remote database table. "Tailing" refers to setting up an open-ended query that returns any newly-added rows to the remote database table. When new rows are added, s-Server automatically adds these to a streaming query. In order to tail a database, you will need to make use of the STREAM keyword. Before discussing this process, this document will briefly cover a more limited use case of querying a remote database table--a static query on the current contents of the table.

Foreign tables are created automatically once you create a server object using the SQL/MED plugin. This process is described in the topic Reading Data from RDBMS Sources, and involves code along the following lines:

CREATE OR REPLACE SCHEMA "ForeignTableData";

SET SCHEMA '"ForeignTableData"';

 

CREATE OR REPLACE SERVER mysql_reader

   FOREIGN DATA WRAPPER SYS_JDBC

   OPTIONS

         (

       DRIVER_CLASS 'com.mysql.jdbc.Driver',

       URL 'jdbc:mysql://localhost:3306/sample',

       USER_NAME 'sqlstream',

       PASSWORD 'sqlstream',

       txInterval '1000'

         );

Note: You can also create foreign tables manually. See the topic Reading Data from RDBMS Sources for more details.

Option for Performing a Static Query on a Remote Database Table

SELECT * FROM <FOREIGN TABLE> is a relational table read. Returns all the current rows and then an end of statement. You do not need to create a special foreign table object in order to execute this query. You can do so using the the SQL-standard three-level qualifier scheme for the names of tables, as described in the topic Reading Data from RDBMS Sources.

Options for Performing a Tailing Query on a Remote Database Table

SELECT STREAM * FROM <FOREIGN TABLE> (note the use of the STREAM keyword) is a private polling table read. This query returns all current rows and rows in the future. There is no need for a highwater mark when using this option.

SELECT STREAM * FROM <FOREIGN STREAM> (note the use of the STREAM keyword and the FOREIGN STREAM object) is a shared/public polling table read. This option requires you to create a foreign stream object for the remote database table, as described below. This query returns all current rows and rows in the future based upon the current state of the highwater mark. A highwater mark is a data point that defines the current row being read by an s-Server query (this is important only when multiple readers are querying the same SQL/MED foreign stream). Initially this query will work the same way as SELECT STREAM * FROM <FOREIGN TABLE> (it returns all rows). However, if another query starts a read on the same foreign stream, then this query will pick up at the same point (the highwater mark) as the other query. If no query is currently being run against the foreign stream, the foreign stream does not discard rows. As a result, querying the foreign stream a second time will initially return only new tuples as the stream "catches up" to the current state of the table. Any new query will update the highwater mark to the current highest rowtime or key.

Writing a Tailing Query Using the STREAM Keyword/Table Object Option

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream.  For example, given a remote table called TEST_TABLE and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_TABLE WHERE ID > -1 AND ID <= (SELECT MAX(ID) FROM

TEST_TABLE) - 2.

 

The id > ? is the lower bound and id <= (select max(id) from TEST_TABLE) - 2 is the upper bound.  The lower bound excludes rows that we have already seen.  The upper bound excludes rows that may not have been committed yet.
The 2 in the upper bound is called the txInterval and gives a fudge factor for the inserters to this table. If there will only be 1 inserter to TEST_TABLE, txInterval can safely be set to 0 (it must always be positive).  txInterval defaults to 2 but should be set to the maximum total number of concurrent inserters into the foreign table (ie the maximum number of insert queries that could simultaneously be executing at once).
Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.
Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together (ie (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE))).
You can use this process with both foreign streams and foreign tables in a select stream query.  In foreign stream use cases, a highwater mark will be stored.

Again, this query only works for private polling. If you want to create a tailing query that lets multiple users read from a remote database table, you will need to create a foreign stream object.

Tailing a Remote Database Table with a Foreign Stream Object

When you tail a remote database table with a foreign stream object, you allow multiple readers to query the remote database table. In order to do so, you need to create a foreign stream and make use of a highwater mark.

SQL for Creating Foreign Stream to Tail Foreign Table

The following SQL first creates a schema in which the foreign stream will reside, called ForeignTableData. The next block of SQL creates a server object

CREATE OR REPLACE SCHEMA "ForeignStreamData";

SET SCHEMA '"ForeignStreamData"';

 

CREATE OR REPLACE SERVER mysql_reader

   FOREIGN DATA WRAPPER SYS_JDBC

   OPTIONS

         (

       DRIVER_CLASS 'com.mysql.jdbc.Driver',

       URL 'jdbc:mysql://localhost:3306/sample',

       USER_NAME 'sqlstream',

       PASSWORD 'sqlstream',

       txInterval '1000'

         );

 

CREATE OR REPLACE FOREIGN STREAM "ForeignTableDataStream"

 (

    "id" BIGINT NOT NULL,

    "col1" VARCHAR(10),

    "col2" DOUBLE,

    "col3" INTEGER

  )

 SERVER "mysql_reader"

         (

          TABLE_NAME 'sample_data',

          queryCol 'id',

          SCHEMA_NAME 'public',

           DESCRIPTION 'Foreign stream to deliver rows appended to

                 public.sample_data. Foreign streams are a shared or public polling table

                 read. With multiple readers, s-Server maintains a highwater mark to

                 ensure that all readers read from the same point concurrently.'

         )

 ;

 

Highwater Marks

Highwater marks can be stored in one of three different ways:

1) In a remote table. Highwater marks are stored in a remote DB table. If the highwater mark is a TIMESTAMP column called ROWTIME, then the highwater marks are stored in a table called SQLS_TableReader_TsState with the following columns:

VARCHAR(30) SQLS_TABLE_NAME,

TIMESTAMP SQLS_highTs

 

Other types of highwater marks are stored in a table called SQLS_TableReader_<TABLE_NAME>State where <TABLE_NAME> is the name of the remote database table (not the name of the MED/JDBC (s-Server) foreign stream) and the columns are just the monotonically increasing columns of the remote database table. To use remote highwater tables add HIGHWATER_TYPE 'remote' to the MED/JDBC table properties.

2) In a local table. In this case, you use native s-Server tables with the same rowtypes as the remote highwater mark tables above. To use local highwater tables add HIGHWATER_TYPE 'local' to the MED/JDBC table properties.  This is the default if HIGHWATER_QUERY is not present. See the topic CREATE TABLE in the Streaming SQL Reference Guide for more details on s-Server native tables.

3) In "resilient tables." With resilient tables, a table property called HIGHWATER_QUERY of the MED/JDBC (s-Server) stream is a SELECT statement whose return rowtype is the same as the monotonically increasing columns of the remote table. The purpose of this query is to retrieve the current highwater state (the id or rowtime of the last row of data processed) of the system from its sinks/systems of record.  This SELECT query is executed once when the foreign stream is started to populate the first highwater mark of the stream.

If the HIGHWATER_QUERY table property is present, the HIGHWATER_TYPE table property can't be present and vis-versa.  If neither are present, the default is HIGHWATER_TYPE 'local'

ROWTIME columns are allowed in foreign tables or streams.

Monotonically increasing columns are indicated either using the queryCol table property or by MONOTONIC_<COLUMN_NAME>=true table property.  For example the following are equivalent. queryCol 'ID'

MONOTONIC_ID 'true'

queryCol and the Foreign Stream's High Water Mark

Since table entries from multiple insertion sources can arrive without being in strict order, a newly arriving row can have a queryCol value higher than a prior row. If earlier rows with lower queryCol values were already streamed, a new arriving row with a higher queryCol value is seen as "out of order" and discarded.

You can minimize this data loss by specifying a querytSort parameter. This parameter establishes an interval in which rows with out-of-order queryCol can arrive. The querytSort value you choose enables the streaming of rows whose queryCol values are lower than "high-water mark minus querytSort." As incoming rows establish a new high-water mark, earlier rows outside that new interval can then be streamed.

Querying the Foreign Stream as a Tail

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream.  For example, given a foreign stream called TEST_STREAM and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_STREAM WHERE ID > <HIGHWATER MARK> AND ID <= (SELECT MAX(ID) FROM TEST_TABLE) - 2.

 

The id > <highwater mark> is the lower bound and id <= (select max(id) from TEST_STREAM) - 2 is the upper bound.  The lower bound excludes rows that we have already seen.  The upper bound excludes rows that may not have been committed yet.
The 2 in the upper bound is called the txInterval and gives a fudge factor for the inserters to this table. If there will only be 1 inserter to TEST_TABLE, txInterval can safely be set to 0 (it must always be positive). txInterval defaults to 2 but should be set to the maximum total number of concurrent inserters into the foreign table (ie the maximum number of insert queries that could simultaneously be executing at once).
Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.
Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together (ie (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE))).
You can use this process with both foreign streams and foreign tables in a select stream query.  In foreign stream use cases, a highwater mark will be stored.