Tailing Foreign Tables with SQL/MED

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Reading Data into s-Server > Reading Data from RDBMS Sources >

Tailing Foreign Tables with SQL/MED

Previous pageReturn to chapter overviewNext page

In using the SQL/MED adapter to query remote database tables, you will often want to "tail" an external database table. "Tailing" refers to setting up an open-ended query that returns any newly-added rows from the external database table. When new rows are added, s-Server automatically adds these to a streaming query. In order to tail a database, you will need to make use of the STREAM keyword. Before discussing this process, this document will briefly cover a more limited use case of querying a remote database table--a static query on the current contents of the table.

Foreign tables are created automatically once you create a server object using the SQL/MED plugin. This process is described in the topic Reading Data from RDBMS Sources, and involves code along the following lines:

CREATE OR REPLACE SCHEMA "ForeignTableData";

SET SCHEMA '"ForeignTableData"';

 

CREATE OR REPLACE SERVER mysql_reader

   FOREIGN DATA WRAPPER SYS_JDBC

   OPTIONS

         (

       DRIVER_CLASS 'com.mysql.jdbc.Driver',

       URL 'jdbc:mysql://localhost:3306/sample',

       USER_NAME 'sqlstream',

       PASSWORD 'sqlstream',

       txInterval '50'

         );

 

Note: You can also create foreign tables manually. See the topic Reading Data from RDBMS Sources for more details.

Option for Performing a Static Query on a Remote Database Table

SELECT * FROM <FOREIGN TABLE> is a relational table read. Returns all the current rows and then an end of statement. You do not need to create a special foreign table object in order to execute this query. You can do so using the the SQL-standard three-level qualifier scheme for the names of tables, as described in the topic Reading Data from RDBMS Sources.

Options for Performing a Tailing Query on a Remote Database Table

SELECT STREAM * FROM <FOREIGN TABLE> (note the use of the STREAM keyword) is a private polling table read. This query returns all current rows and rows in the future. There is no need for a highwater mark when using this option.

SELECT STREAM * FROM <FOREIGN STREAM> (note the use of the STREAM keyword and the FOREIGN STREAM object) is a shared/public polling table read. This option requires you to create a foreign stream object for the remote database table, as described below. This query returns all current rows and rows in the future based upon the current state of the highwater mark. A highwater mark is a data point that defines the current row being read by an s-Server query (this is important only when multiple readers are querying the same SQL/MED foreign stream). Initially this query will work the same way as SELECT STREAM * FROM <FOREIGN TABLE> (it returns all rows). However, if another query starts a read on the same foreign stream, then this query will pick up at the same point (the highwater mark) as the other query. If no query is currently being run against the foreign stream, the foreign stream does not discard rows. As a result, querying the foreign stream a second time will initially return only new tuples as the stream "catches up" to the current state of the table. Any new query will update the highwater mark to the current highest rowtime or key.

Writing a Tailing Query Using the STREAM Keyword/Table Object Option

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream.  For example, given a remote table called TEST_TABLE and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_TABLE WHERE ID > -1 AND ID <= (SELECT MAX(ID) FROM

TEST_TABLE) - 2.

 

The id > ? is the lower bound and id <= (select max(id) from TEST_TABLE) - 2 is the upper bound.  The lower bound excludes rows that we have already seen.  The upper bound excludes rows that may not have been committed yet.

The 2 in the upper bound is called the txInterval and gives a fudge factor for the inserters to this table. If there will only be 1 inserter to TEST_TABLE, txInterval can safely be set to 0 (it must always be positive). txInterval defaults to 2, but you should set it equal to the maximum total number of concurrent inserters into the foreign table (that is, the maximum number of insert queries that could simultaneously be executing at once).

Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.

Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together (ie (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE))).

You can use this process with both foreign streams and foreign tables in a select stream query.  In foreign stream use cases, a highwater mark will be stored.

Again, this query only works for private polling. If you want to create a tailing query that lets multiple users read from a remote database table, you will need to create a foreign stream object.

Tailing a Remote Database Table with a Foreign Stream Object

When you tail a remote database table with a foreign stream object, you allow multiple readers to query the remote database table. In order to do so, you need to create a foreign stream and make use of a highwater mark.

You pass highwater marks using an option called "SQLSTREAM_POSITION_KEY" in the OPTIONS clause of FOREIGN STREAM definition.

Highwater marks can be passed dynamically through an OPTIONS_QUERY option on the foreign stream. This lets you change the highwater mark if the foreign stream is being queried by different users. See Using the OPTIONS_QUERY Option below.

SQL for Creating Foreign Stream to Tail Foreign Table

The following SQL first creates a schema in which the foreign stream will reside, called ForeignTableData. The next block of SQL creates a server object

CREATE OR REPLACE SCHEMA "ForeignStreamData";

SET SCHEMA '"ForeignStreamData"';

 

CREATE OR REPLACE SERVER mysql_reader

  FOREIGN DATA WRAPPER SYS_JDBC

  OPTIONS

        (

      DRIVER_CLASS 'com.mysql.jdbc.Driver',

      URL 'jdbc:mysql://localhost:3306/sample',

      USER_NAME 'sqlstream',

      PASSWORD 'sqlstream',

      txInterval '1000'

        );

 

CREATE OR REPLACE FOREIGN STREAM "ForeignTableDataStream"

(

   "id" BIGINT NOT NULL,

   "col1" VARCHAR(10),

   "col2" DOUBLE,

   "col3" INTEGER

 )

SERVER "mysql_reader"

        (

         TABLE_NAME 'sample_data',

         queryCol 'id',

          --highwater mark

         "SQLSTREAM_POSITION_KEY" '1000',

         SCHEMA_NAME 'public',

          DESCRIPTION 'Foreign stream to deliver rows appended to

                public.sample_data. Foreign streams are a shared or public polling table

                read.'

        )

;

 

Using the OPTIONS_QUERY Option for Highwater Marks

When you use the OPTIONS_QUERY, you can pass options to the foreign stream dynamically without the need to change the FOREIGN STREAM statement itself.

In the following example, each user can pass her own watermark to the foreign stream while running the query. When the SELECT query is run on ForeignTableDataStream, s-Server also runs OPTIONS_QUERY to fetch new values for options.

CREATE OR REPLACE SCHEMA "ForeignStreamData";

SET SCHEMA '"ForeignStreamData"';

 

CREATE OR REPLACE SERVER mysql_reader

  FOREIGN DATA WRAPPER SYS_JDBC

  OPTIONS

        (

      DRIVER_CLASS 'com.mysql.jdbc.Driver',

      URL 'jdbc:mysql://localhost:3306/sample',

      USER_NAME 'sqlstream',

      PASSWORD 'sqlstream',

      txInterval '1000'

        );

 

CREATE OR REPLACE FOREIGN STREAM "ForeignTableDataStream"

(

   "id" BIGINT NOT NULL,

   "col1" VARCHAR(10),

   "col2" DOUBLE,

   "col3" INTEGER

 )

SERVER "mysql_reader"

        (

         TABLE_NAME 'sample_data',

         queryCol 'id',

         "OPTIONS_QUERY" 'SELECT SQLSTREAM_POSITION_KEY, USER_NAME, PASSWORD FROM my_config_table WHERE STREAM_NAME = ''ForeignTableDataStream''',

         SCHEMA_NAME 'public',

         DESCRIPTION 'Foreign stream to deliver rows appended to

                public.sample_data.'

        )

;

 

 

Monotonically increasing columns are indicated either using the queryCol table property or by MONOTONIC_<COLUMN_NAME>=true table property.  For example the following are equivalent. queryCol 'ID'

MONOTONIC_ID 'true'

queryCol and the Foreign Stream's High Water Mark

Since table entries from multiple insertion sources can arrive without being in strict order, a newly arriving row can have a queryCol value higher than a prior row. If earlier rows with lower queryCol values were already streamed, a new arriving row with a higher queryCol value is seen as "out of order" and discarded.

You can minimize this data loss by specifying a querytSort parameter. This parameter establishes an interval in which rows with out-of-order queryCol can arrive. The querytSort value you choose enables the streaming of rows whose queryCol values are lower than "high-water mark minus querytSort." As incoming rows establish a new high-water mark, earlier rows outside that new interval can then be streamed.

Querying the Foreign Stream as a Tail

In order to tail a remote database table using this option, you prepare a SELECT query with a lower and upper bound on a monotonically increasing column of the foreign stream.  For example, given a foreign stream called TEST_STREAM and a monotonically increasing column called ID, you might use a query along the following lines:

SELECT STREAM * FROM TEST_STREAM WHERE ID > <HIGHWATER MARK> AND ID <= (SELECT MAX(ID) FROM TEST_TABLE) - 2;

 

The id > <highwater mark> is the lower bound and id <= (select max(id) from TEST_STREAM) - 2 is the upper bound.  The lower bound excludes rows that we have already seen.  The upper bound excludes rows that may not have been committed yet.

The 2 in the upper bound is called the txInterval and allows flexibility (a "fudge factor") for the inserters to this table. txInterval defaults to 2, but should be set to the maximum total number of concurrent inserters into the foreign table (ie the maximum number of insert queries that could simultaneously be executing at once). If you know that there will only be one inserter for TEST_TABLE, you can safely set txInterval to 0 (it cannot be negative).

Another table property, pollingInterval, determines how long to wait between executing the prepared statement against the remote DB table. The MED/JDBC reader only waits if no rows were found in the last execution pass.

Multiple monotonically increasing columns where clause expressions in the select query are OR'ed together. For example: (ID > ? AND ID <= (select max(id) from TEST_TABLE)) OR (ID2 > ? AND ID2 <= (select max(id) from TEST_TABLE)).

You can use this process with both foreign streams and foreign tables in a SELECT STREAM query. In foreign stream use cases, a highwater mark will be stored.