Managing Late Rows

<< Click to Display Table of Contents >>

Navigation:  Understanding Streaming SQL Concepts > Time and Streaming Data > Rowtime Bounds >

Managing Late Rows

Previous pageReturn to chapter overviewNext page

Streaming data depends on data being in order, tracked by a special column called ROWTIME. Such operators track the latest rowtime, known as the "highwater mark" or current streamtime. When rows arrive with rowtimes earlier than this highwater mark, they are discarded as late. These rows are called "late" because their timestamps are out of order. Late rows are discarded. They will not processed as an input stream, nor  put into an output stream. This means, for example, that if you are archiving rows into an RDBMS database, late rows will not be written.


You can avoid the problem of late rows by T-sorting your data using the ORDER BY clause of the SELECT statement. This clause implements a timesort XO that uses a sliding time-based window of incoming rows to reorder those rows by ROWTIME. See the topic T-sorting Stream Input in the Streaming SQL Reference Guide.

Late rows are logged to both the trace log and the error stream. Because the trace log could quickly fill up with a log of each individual row, they are logged in powers of ten--the first row, the tenth row, the hundredth row, the ten thousandth row, and so on. You will not see late rows logged between late row 100 and late row 10,000, or late row 10,000 and late row 100,000.

If you need more granular information on late rows logged, you can change the tracer level for com.sqlstream.aspen.native.xo.laterow to FINEST. This will fill your trace log with late rows, but will give you information on every late row logged.

You can change the default tracing level by editing the file, usually located at /var/log/sqlstream/ To view more logging, open one of the properties files in a text editor and uncomment (remove the "#") the following line: