Using Telemetry to Monitor Performance

<< Click to Display Table of Contents >>

Navigation:  Administering SQLstream Blaze >

Using Telemetry to Monitor Performance

Previous pageReturn to chapter overviewNext page

Telemetry provides information on the structure, status, and progress of data flowing in s-Server, as a table (snapshot view) or a stream (continually updated data):

sys_boot.mgmt.getStreamGraphInfo. Contains a row to describe each stream-graph. Stream-graphs represent one SQL statement running in s-Server, usually either a SELECT or INSERT. The stream version is getStreamGraphInfoForever
sys_boot.mgmt.getStreamOperatorInfo. Contains a row to describe each stream-node. Stream-nodes represent one step in the running implementation of a SQL statement. The stream version is getStreamOperatorInfoForever

Note: These functions are part of s-Server's general management functions. See the topic s-Server Management Functions in the Streaming SQL Reference Guide for more details.

These allow you to see the following:

the overall structure of your data pipelines, in terms of both the SQL operating on these and the nodes processing your SQL.
how your data is flowing and where it is actually transformed, in order to troubleshoot your SQL.
The progress and throughput of data moved through pipelines, so that, for example, you can identify performance-slowing bottlenecks.

Stream-graphs, Nodes, and s-Server

s-Server consist of two layers:

1.A query engine that maintains a collection of SQL definitions and translates a SQL statement into a specified data-flow.
2.A data engine that executes and coordinates these data-flows.

The specification of a data-flow is called a physical query plan. The actual data flow is called a stream graph, which is a chain of stream nodes, each of which is a small software machine that applies some relational operation to its input data stream(s) to produce is output data stream(s). Nexus nodes are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.

nodes_stream_graph

In the most common case, a node has 1 input and 1 output, and so the graph is a linear chain of operations. But data flows can merge and split, and the graph of a new query can connect to the graphs of existing queries as they run.

Note: A node does not read data directly from its upstream neighbor. Instead, the upstream neighbor puts its output rows into a block of memory called a buffer, which it shares with the downstream neighbor. The upstream node writes rows to the buffer, and the downstream node reads rows from the buffer.

Because there are more stream nodes than CPU cores, all the nodes cannot actually run simultaneously. Instead, a software component called the scheduler manages the stream nodes. The scheduler runs nodes as needed in order to push data down the pipeline.

The scheduler decides when to execute a node based on its status, which consists of 1) its state (ready/sleeping/blocked/finished), 2) the availability of data for input or room for output, and 3) the rowtime of the next input row. The scheduler also keeps track of statistics for each execution of each node, measuring the time spent, the amount of data input and output, and so on.

Using Telemetry

To use telemetry, you run a SQL statement against the virtual tables or streams described below. To run SQL, you can use sqlLine, a command-line process that executes SQL code against SQLstream s-Server. You can access sqlLine through the desktop folder that installs with s-Server, or in $SQLSTREAM_HOME.

Note: $SQLSTREAM_HOME refers to the installation directory for s-Server, such as /opt/sqlstream/5.0.XXX/s-Server.

This SQL fetches information about all the steam graphs that belong to session 2. The 2nd argument 0 means take a new snapshot.

SELECT * FROM TABLE(SYS_BOOT.MGMT.getStreamGraphInfo(2, 0);

 

This SQL fetches information for all the stream operators (nodes) of all the stream graphs that belong to session 2. Again, we request a new snapshot by giving a staleness parameter of 0 seconds. We demand a snapshot no older than 0 seconds, that is, a brand new snapshot.

SELECT * FROM STREAM(SYS_BOOT.MGMT.getStreamOperatorInfoForever(2, 10);

 

This SQL fetches information about all the steam graphs that belong to session 2, requesting new snapshots at ten second intervals:

SELECT * FROM STREAM(SYS_BOOT.MGMT.getStreamGraphInfo(2, 10);

 

This SQL fetches information for all the stream operators (nodes) of all the stream graphs that belong to session 2, requesting new snapshots at ten second intervals:

SELECT * FROM STREAM(SYS_BOOT.MGMT.getStreamOperatorInfoForever(2, 10);

 

You can also query individual columns, as described later on this page. A full list of columns for each virtual table appears at the end of this page.

Argument

Definition

SESSION

Sessions refer to a single connection to s-Server. You can designate a session ID, or 0 to mean all sessions.

Note: To determine sessions:

1.Open the desktop folder that installs with s-Server.
2.Double-click the sqlLine icon.
3.In the window that opens, enter !run $SQLSTREAM_HOME/support/sql/showSessions.sql, where $SQLSTREAM_HOME is the location of your s-Server installation.

RECENT

In seconds, how recent a snapshot is required. 0 will force a new snapshot, 10 means use a extant snapshot no more than 10 secs old. You can use RECENT to get a pair of results that match the same snapshot.

Note: It's a good idea to add 0.5 sec to RECENT for the second query. This allows room for the 2nd query to be processed slightly later than the 1st.

Results

Queries return columns as described below.

Stream Graph Virtual Table

A query on this virtual table returns rows for each stream-graph. These generally correspond to one SQL statement (DML or query). Stream-graph data offers you an overview of how a stream-graph is processing data, including information on how much data it has processed since inception (in bytes and rows) how quickly it is processing data, and how many nodes it is using to process data.

Columns for these rows fall into four groups: identification, description, status, and statistics. Identification and description are stable identifiers and characteristics for the stream, including numerical identifiers for the stream graph, its session number, and so on. Status and statistics offer information on what the stream graph is doing now, how long it has been doing it, how much data it has processed, how quickly it is doing so, and so on. The former are useful for determining information about a stream graph, and the latter are useful for understanding how the stream graph is performing.

Identification columns consist of graph_id and statement_id. These are both unique numerical identifiers.

Description columns include session_id, source_sql, is_global_nexus, is_auto_close, num_nodes, and num_data_buffers. These tell you what the stream looks like before any processing occurs.

Status columns include sched_state, num_live_nodes, and num_data_buffers. These columns tell you how the scheduler is handling the stream, as well as how many active nodes the stream is engaging.

Statistics columns include net_execution_time, net_schedule_time, net_input_bytes, net_input_rows, net_input_rate, net_input_row_rate, net_output_bytes, net_output_rows, net_output_rate, and net_output_row_rate.

A full list of columns and descriptions appears at the end of this page.

You can query individual columns as either tables or streams. For example, the following statement produces a streaming (regularly updated) result for the columns sched_state, num_live_nodes, net_input_row_rate, and net_output_row_rate.

SELECT sched_state,num_live_nodes,net_input_row_rate,net_output_row_rate FROM STREAM(SYS_BOOT.MGMT.GETSTREAMGRAPHINFO(2, 10);

 

Note: Columns do not require quotation marks.

 

Stream Node Virtual Table

A query on this virtual table returns rows for each stream node (operator). Stream node data offers you an overview of how a particular node in a stream graph is processing data, including information on how much data it has processed since inception (in bytes and rows), how quickly it is processing data, and how much data it has output since inception.

As with the stream graph virtual table, the columns fall mainly into four groups: identification, description, status, and statistics. Identification and description are stable identifiers and characteristics for the node, including numerical identifiers for the node and stream_graph, and so on. Status and statistics offer information on what the stream node is doing now, how long it has been doing it, how much data it has processed, how quickly it is doing so, and so on. The former are useful for determining information about a node, and the latter are useful for understanding how the stream node is performing.

Identification columns

node_id is a varchar, though logically the value is a pair of integers. It consists of a dotted pair of integers, such as 1.0 or 2.3. The first number represents the stream graph, and the second the stream graph node. 1.0 means node #0 in stream graph #1,  and 4.3 means node #3 in stream graph #4.

graph_id is the same as the first (graph) part of node_id.  It is provided as a column to serve as a foreign key.

Descriptive columns

name_in_query_plan is for advanced debugging. It is a generated name that matches the logical and physical query plans as reported in the s-Server trace log. The overall structure of the data flow graph is indicated by listing the node_id of the input neighbor and the output neighbor.

Note: The s-Serer query makes a unique identifier for each stream node. An example is FennelValuesRel.#23:489.

Status fields

These fields are helpful in answering questions like "why is my pipeline stuck?" or "why won't the scheduler do what I want it to do next?" Important columns include the following:

sched_state. Whether node is blocked (needs more input / needs room to output), suspended, finished, or ready to run. The values of sched_state are:
oR*  running
oR  ready to run
oN   not runnable, because a neighbor is running
oB  blocked for data (underflow or overflow)
oT  suspended externally,
oE  finished (has reached end of input stream)
oC  stream graph is closed  (but could be executed again)
oZ  stream graph was removed (SQL statement is closed).
last_exec_result. Indicates why the node is blocked. UND means underflow, and OVR means overflow.
num_busy_neighbors. Data nodes cannot write to an output node that is currently reading its input and vice versa. (This is because data engines uses single-buffers.) When num_busy_neighbors > 0, the data node cannot run.
input_rowtime_clock. The rowtime field of the next available input row. This could be seen in an input buffer or as an input rowtime bound.
output_rowtime_clock. The rowtime field of the latest row output by this node. The clock rowtimes show how far the data has progressed down the pipeline.
oThis show latency, overall and between nodes. This is important for diagnosing when a pipeline is stuck because a node is waiting for an earlier row from an empty input buffer.
oInput and output are measured in bytes and bytes/sec. Input and output are also measured in rows and rows/sec when possible. Most stream operators process row by row, and so can count rows. But a few process a whole buffer at a time, and since row-length can vary, we don't pay the cost of converting, and sometimes report null for the stats in terms of rows.

A full list of columns and descriptions appears at the end of this page.

You can query individual columns as either tables or streams. For example, the following statement produces a streaming (regularly updated) result for the columns sched_state, num_busy_neighbors, net_input_bytes, net_input_rate, and net_output_rate.

SELECT sched_state,num_busy_neighbors,net_input_bytes,net_input_rate,net_output_rate FROM STREAM(SYS_BOOT.MGMT.GETSTREAMOPERATORINFO(2, 10);

 

Column Descriptions

The following tables list all columns, types, and definitions for the telemetry virtual tables.

Stream Graph Virtual Table Column Types and Definitions

Column Name

Type

Definition

measured_at

timestamp not null

Time of query snapshot.

graph_id

int not null

Numerical identifier for stream graph. Numbered from 1 as graphs are added to scheduler. The same column appears in the Stream Node Virtual Table.

statement_id

int

fk in Statements table. 0 if not from a SQL statement

session_id

int

fk in Sessions table (a Session = a JDBC Connection)

source_sql

varchar(2048)

SQL for statement.

sched_state

char(1)

State of graph in scheduler. Either B (blocked).O(pen), C(losed), R(unnable), E(nd of Stream)

close_mode

char(4)

When the stream graph was closed, and how it was closed.

is_global_nexus

boolean

Implements a named stream. These are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.

is_auto_close

boolean

Whether or not the stream will close automatically at state E

num_nodes

int not null

Number of nodes in the stream graph. These may be shared with other stream graphs, as in the case of named streams.

num_live_nodes

int not null

Number of nodes with that are currently running.

num_data_buffer

int not null

Number of data buffers between nodes.

Note: Two adjacent nodes share a data buffer; the upstream node writes rows to the buffer, and the downstream node reads rows from the buffer.

total_execution_time

double

Time spent executing this stream graph in seconds.

total_opening_time

double

Time spent setting up this stream graph in seconds.

Note: When a user sends a SQL query to the server, the query engine sends a query plan to the data engine layer. The data engine then "sets up" the query, by performing a number of steps, including allocating and initializing software objects that do the data operations, allocating data buffers to stand between the operators, adding to internal control & bookkeeping data structures, and connecting the graph to one or more global nexuses.

total_closing_time

double

Time spent tearing down this graph in seconds.

Note: When the user ends the query, the data engine stops executing the stream graph and does the set-up in reverse.

net_input_bytes

bigint

In bytes, the overall input into the whole graph.

net_input_rows

bigint

In rows, the overall input into the whole graph.

net_input_rate

double

Data input flow to the stream graph in bytes read since opened.

net_input_row_rate

double

Data input flow to the stream graph in rows read since opened.

net_output_bytes

bigint

In bytes, the overall output from the whole graph.

net_output_rows

bigint

In rows, the overall output from the whole graph.

net_output_rate

double

Data output flow from the stream graph in bytes written since opened.

net_output_row_rate

double

Data output flow from the stream graph in rows written since opened.

when_opened

timestamp

System time when stream graph was opened.

when_started

timestamp

System time of first execution of stream graph.

when_finished

timestamp

System time of latest execution of stream graph.

when_closed

timestamp

System time when stream graph was closed.

Stream Node Virtual Table Column Types and Definitions

Column name

Type

Definition

measured_at

timestamp not null

Time of query snapshot.

node_id

varchar(8) not null

Node identifier, such as 4.6 for the 6th node in 4th stream graph

graph_id

int not null

graph identifier (fk to stream_graphs table)

num_inputs

int not null

Count of input neighbors.

input_nodes

varchar(64) not null

node_id numerical identifier of input neighbors.

num_outputs

int not null

Count of output neighbors.

output_nodes

varchar(64) not null

node_id numerical identifier of output neighbors.

sched_state

char(2)

State of node in scheduler. Either C(losed), E(ended), R* (running), R(unnable), or B (blocked).

last_exec_result

char(3)

Latest code returned by ExecStream::execute() eg UND(erflow), OVR(overflow)

num_busy_neighbors

int

When positive, indicates that this node is blocked because some neighboring nodes are busy.

input_rowtime_clock

timestamp

Rowtime of the latest input row read by this node.

output_rowtime_clock

timestamp

Rowtime of the latest output row written by this node.

execution_count

bigint

Number of times this node has been executed by the scheduler.

started_at

timestamp

System time of first execution.

latest_at

timestamp

System time of latest execution.

net_execution_time

double

Total time node has executed in seconds.

net_schedule_time

double

Total overhead scheduling for operator in seconds.

net_input_bytes

bigint

Data input flow to the node in bytes read since opened.

net_input_rows

bigint

Data input flow to the node in rows read since opened.

net_input_rate

double

Data input flow to the node in bytes/sec since opened.

net_input_row_rate

double

Data input flow to the node in rows/sec since opened.

net_output_bytes

bigint

Data output flow from the node in bytes written since opened.

net_output_rows

bigint

Data output flow from the node in rows written since opened.

net_output_rate

double

Data output flow from the node in bytes/sec since opened.

net_output_row_rate

double

Data output flow from the node in rows/sec since opened.