JOIN clause

<< Click to Display Table of Contents >>

Navigation:  Streaming SQL Reference Guide > SELECT statement  >

JOIN clause

Previous pageReturn to chapter overviewNext page

s-Server supports the following JOIN syntax for the stream or table reference part of the SELECT statement. To see where the JOIN clause fits, see the Select chart in the topic SELECT in this guide.

Syntax Chart for the JOIN Clause

A join combines two relations according to some condition. The relation resulting from a join has the columns of the left and right inputs to the join.

select_06_join_ref_w_inner_join_w_query_stmt

Note: In the diagram above, stream_or_table_reference can itself be a joined table. This allows for multi-way joins. See the section Multi-way Joins below for more details.

Join types

There are five types of joins:

INNER JOIN (or just JOIN)

Returns all pairs of rows from left and right for which the join condition evaluates to TRUE.

LEFT OUTER JOIN (or just LEFT JOIN)

As INNER JOIN, but rows from the left are kept even if they do not match any rows on the right; NULL values are generated on the right.

RIGHT OUTER JOIN (or just RIGHT JOIN)

As INNER JOIN, but rows from the right are kept even if they do not match any rows on the left; NULL values are generated on the left for these rows.

FULL OUTER JOIN (or just FULL JOIN)

As INNER JOIN, but rows from both sides are kept even if they do not match any rows on the other side; NULL values are generated on the other side for these rows.

CROSS JOIN

Returns the cartesian product of the inputs: every row from the left is paired with every row from the right. A cross join never has an ON or USING condition.

See also the streaming joins, which include stream-to-stream joins and stream-to-table joins.

The NATURAL keyword  is actually a condition. It is described with ON and USING in the subtopic below called Join Conditions.

Examples of the five different joins appear below. All of these examples use the following data:

ename deptno

===== ======

Fred      10

Bill      20

Jayne     10

Martin    40

 

 

deptno dname

====== ===========

   10 Sales

   20 Marketing

   30 Engineering

 

Inner Join

SELECT *

FROM Emp AS e

 JOIN Dept AS d

 ON emp.deptno = dept.deptno

ORDER BY e.ename;

 

ename deptno deptno dname

===== ====== ====== ===========

Bill      20     20 Marketing

Fred      10     10 Sales

Jayne     10     10 Sales

 

Note the omissions: Martin, whose department 40 does not exist in Dept, and Department 30, which has no employees.

Left Outer Join

SELECT *

FROM Emp AS e

 LEFT JOIN Dept AS d

 ON emp.deptno = dept.deptno

ORDER BY e.ename;

ename deptno deptno dname

===== ====== ====== ===========

Bill      20     20 Marketing

Fred      10     10 Sales

Jayne     10     10 Sales

Martin    40   NULL NULL

 

Right Outer Join

SELECT *

FROM Emp AS e

 RIGHT JOIN Dept AS d

 ON emp.deptno = dept.deptno

ORDER BY e.ename;

 

ename deptno deptno dname

===== ====== ====== ===========

Bill      20     20 Marketing

Fred      10     10 Sales

Jayne     10     10 Sales

NULL    NULL     30 Engineering

 

Full Outer Join

SELECT *

FROM Emp AS e

 FULL JOIN Dept AS d

 ON emp.deptno = dept.deptno

ORDER BY e.ename;

 

ename deptno deptno dname

===== ====== ====== ===========

Bill      20     20 Marketing

Fred      10     10 Sales

Jayne     10     10 Sales

Martin    40   NULL NULL

NULL    NULL     30 Engineering

 

Cross Join

SELECT *

FROM Emp AS e

 CROSS JOIN Dept AS d

ORDER BY e.ename, d.dname;

ename deptno deptno dname

===== ====== ====== ===========

Bill      20     10 Sales

Bill      20     20 Marketing

Bill      20     30 Engineering

Fred      10     10 Sales

Fred      10     20 Marketing

Fred      10     30 Engineering

Jayne     10     10 Sales

Jayne     10     20 Marketing

Jayne     10     30 Engineering

Martin    40     10 Sales

Martin    40     12 Marketing

Martin    40     30 Engineering

 

Note that, as always for a CROSS JOIN, the size of the result is the product of the number of rows: four Emp rows multiplied by three Dept rows gives 12 result rows.

Join Conditions

All types of join except CROSS JOIN accept a join condition.

There are three ways to specify a join condition:

The ON condition evaluates a Boolean condition. It is the most general and powerful way to specify a join condition.
USING (column {, column }...) matches columns from left and right. For each named column, left and right must both have a column of that name. r1 JOIN r2 USING (c1, c2) is equivalent to r1 JOIN r2 ON r1.c1 = r1.c1 AND r1.c2 = r2.c2.
Inserting the NATURAL keyword before INNER, LEFT, RIGHT or FULL JOIN creates a condition that matches each pair of columns that have the same name on left and right side of the join.

A WHERE clause (WHERE condition) can achieve a similar effect to ON, except that it filters the rows after they have been emitted from the join. For an inner join, WHERE is equivalent to ON, but for an outer join, the partially NULL rows are only generated correctly if the condition is evaluated for each pair of candidate rows, and a WHERE clause cannot do that. For more details, see the topic WHERE clause in this guide.

The following inner join examples are equivalent:

Using ON:

  SELECT *

  FROM Emp AS e

    JOIN Dept AS d

    ON emp.deptno = dept.deptno

  ORDER BY e.ename;

 

Using USING

  SELECT *

  FROM Emp AS e

    JOIN Dept AS d

    USING (deptno)

  ORDER BY e.ename;

 

Using NATURAL

Here, deptno is the only column in common between Emp and Dept.

  SELECT *

  FROM Emp AS e

    NATURAL JOIN Dept AS d

  ORDER BY e.ename;

 

Using CROSS JOIN and WHERE

  SELECT *

  FROM Emp AS e

    CROSS JOIN Dept AS d

  WHERE emp.deptno = dept.deptno

  ORDER BY e.ename;

 

Using a FROM clause with tables separated by commas, and a WHERE condition

  SELECT *

  FROM Emp AS e,

    Dept AS d

  WHERE emp.deptno = dept.deptno

  ORDER BY e.ename;

 

Join limitations

SQLstream does not support left, right or full outer join operations applied to relations.

Streaming JOINs

JOIN can be used in streaming queries. Streaming joins work just like regular table joins, but need to work with the particular properties of streams by implementing rolling windows and rowtimes:

Rolling windows

A window defined on a stream is a rolling (sliding) window. As the current time progresses, the window excludes some rows while adding others. As a result, rows output by a join with a sliding window are generated incrementally. It is important to note that an output row is only produced once by a match on a given pair of columns from the left and right input streams. In other words, an output row already produced by a prior match will not be produced anew by a subsequent identical match.

Output Rowtimes

All output rows are produced in order of non-descending rowtime. (It is valid to have multiple output rows with the same rowtime.) As a rule, the rowtime of a given output row is the rowtime at the point it was possible to calculate the output row. In other words:

For an inner join, the rowtime of an output row is the later of the rowtimes of the two input rows. This is also true for an outer join in which matching input rows are found.

For outer joins in which a match is not found, the rowtime of an output row is the later of the following two times:

1.the rowtime of the input row for which a match was not found
2.the later bound of the window of the other input stream at the point any possible match passed out of the window.

All streaming joins are implicitly windowed joins between streams. If no you do not specify an explicit window, s-Server uses the window specification CURRENT ROW.

Stream-to-Stream Joins

With stream-to-stream joins, at least one stream must be restricted to a time window by the use of an OVER clause.

The OVER clause defines a window of rows that are to be considered for joining at a given time.

The window can be time-based or row-based:

A time-based window uses the RANGE keyword, and defines the window as the set of rows whose ROWTIME column falls within a particular time interval of the query's current time.
A row-based window uses the ROWS keyword, and defines the window as a given count of rows before or after the row with the current timestamp.

The following is an example of a stream-table join without an OVER clause.

SELECT STREAM ROWTIME, o.orderId, o.ROWTIME AS orderTime

FROM Shipments AS s

 JOIN Orders OVER (RANGE INTERVAL '1' HOUR PRECEDING) AS o

 ON o.orderId = s.orderId;

 

We have not specified an OVER clause for the Shipments stream, so at any moment in time we are considering matching the current row of the Shipments stream against rows from the Orders stream in the hour preceding.

Orders and Shipments inputs produce the following output:

Orders

 

Shipments

 

rowtime

orderId

rowtime

orderId

10:00

100

 

 

10:10

101

 

 

10:20

102

 

 

10:25

103

 

 

 

 

10:30

101

10:40

104

 

 

 

 

10:45

100

 

 

10:55

103

 

 

11:05

103

 

 

11:30

104

 

The output rows are as follows:

rowtime

orderId

orderTime

10:30

101

10:10

10:45

100

10:00

10:55

103

10:25

11:05

103

10:25

11:30

104

10:40

 

Note that output rows have the timestamp of the Shipments.ROWTIME column, and are sorted in that order. Order 100, 101 and 104 are each matched by a shipment within the window, and order 103 is matched by two shipments. But order 102 is omitted, because its shipment is not made within its one hour time window 10:20-11:20.

The window specification may also be the name of a window defined in the WINDOW clause. However, windows specified by name have the same limitation as windows specified inline: see the subtopic entitled Allowed and Disallowed Window Specifications of the WINDOW clause topic.

Additional window limitations specific to streaming joins are as follows:

PARTITION BY must not be present.
ORDER BY, if present, must sort by the ROWTIME column of one of the inputs.

Rowtime generation

The timestamp of the generated row is the earliest time that rows necessary to make the match are in both sides' windows. For example, in the previous example, the output row (10:45, 100, 10:00) could only be made when the left window is 10:00-10:00 and the right window is 10:00-11:00. A minute earlier, the windows would have been 09:59-09:59 and 09:59-10:59, in which case the necessary row would have been in the right hand window but not the left.

Rowtime bounds

Rowtime bounds received on the left and right side of the join help the join to make progress. In the above example, the (10:30, 101, 10:10) output record can be emitted only when the order 100's window has expired. That expiration was only evident when the (11:05, 103) row arrived in the Shipments stream, taking the time past 11:00. If the process writing the Shipments stream had sent an 11:01 rowtime bound, the output record could have been emitted 4 minutes sooner.

Limitations to Stream-Stream Join

The join must be an equi-join (a join using an equality operator), not a general theta join.

Stream-Table Joins

You can use the JOIN clause to enrich the content of a stream by treating one of the stream's columns as a foreign key to a table, and finding table data that matches each row from the stream. In effect, this means computing the relational join of the stream with the table.

There are currently two ways to perform a stream-to-table join:

Using a stream-table join in SQL, described here.
Using the TableLookup UDX, as described in the topic Table Lookup UDX in the Integration Guide. We recommend only using this option if the stream-table join described here is insufficient. Most notably, the Table Lookup UDX lets you control how table data is fetched and cached. You can also subclass the UDX for special semantics. While performance is good, you need to call this as a UDX. Also, the s-Server query engine is unable to push projections and filters into the UDX.

Stream-Table Join

With stream-table joins, s-Server allows the ON condition and the USING condition as described above, and allows both INNER JOIN and OUTER JOIN. Either input can be a native table, a foreign table, a native stream, a foreign stream, a view, or a subquery.

The query engine detects when an input is like a stream (such as a stream or streaming view) and when it is like a table (such as a native table or table view). The join of two table-like inputs is a normal join. This works as a hash-join or an index-scan. The join of two stream-like inputs is described above.

When one input is stream-like and the other is table-like, the engine implements the query as a static stream-table join.

With a stream-table join, s-Server computes the stream by capturing the table or table view at the start of the query, and loading its data into a lookup-structure. Streaming input is read row by row; for each row, s-Server identifies matches in the table data, using the condition you have defined for the join,It emits matching rows based on this condition. For an outer join, if there are no matches, s-Server emits nulls as the match.

Limitations to Stream-Table Join

The table or table view input must fit into memory. However, since the query planner can push filters and projections through the join operator, this input can be much smaller than the whole table.

You cannot refresh the loading table data, except by executing the query again.

An outer join must be one-sided. It cannot detect table rows that never matched a stream row.

The join must be an equi-join (a join using an equality operator), not a general theta join.

Note: Joins using a tailing table stream have been defined but not implemented. Tailing foreign tables as a stream is defined in the topic Tailing Foreign Tables with SQL/MED in the Integration Guide. A future feature will let you join a stream to one of these streams.

Examples of Static Stream/Table Join

The following example uses a stream and a table that match on a key value called animal_id.:

The stream is a simple input stream of observed beast events:

create stream beast_events (

   row_id integer not null,

   sensor integer,

   zone integer,

   animal_id int,

   weight integer

);

 

The table lists animal attributes:

create table animal_data (

   animal_id int not null primary key,

   boolean edible;

   boolean poisonous;

   name varchar(32) not null.

   species varchar(32) not null,

   genus varchar(32) not null

);      

 

Here are examples of SQL to join this stream to this table, both valid and invalid.

Simple join:

select stream * from beast_events e join animal_data a using (animal_id);

 

Valid left outer join:

select stream * from beast_events e left join animal_data a using (animal_id);

 

Invalid theta join:

select stream * from beast_events e left join animal_data a on (e.animal_id > a.animal_id);

 

Valid equi-join with extra filters:

select stream * from beast_events e join animal_data a

   on (e.animal_id = a.animal_id and a.edible and not a.poisonous);

 

Multi-way JOINs

To do a three way join, you use a joined table-reference as the table-reference in a JOIN statement. Here, stream/table 1 (b1) relates to stream/table 2 (asks) and stream/table 2 relates to stream/table 3 (b2), on the column "ticker."

select stream * from bids over (range interval '1' hour preceding) as b1

      join asks over (range interval '2' second preceding)

          on b1."ticker" = asks."ticker"

      join bids over (range interval '3' minute preceding) as b2

          on b2."ticker" = asks."ticker";