Monitoring s-Server and Components Overview

<< Click to Display Table of Contents >>

Navigation:  Administering SQLstream Blaze >

Monitoring s-Server and Components Overview

Previous pageReturn to chapter overviewNext page

You can use a number of techniques to monitor s-Server and its components. The topic s-Server Features for Managing and Monitoring describes system views and system procedures related to monitoring and management. This topic provides an overview of monitoring s-Server and its components.

You can use Apache Kafka to automate the distribution of streaming workloads. See Automatic Distribution of Streaming Workloads with Apache Kafka in the topic Automatic Distribution of Streaming Workloads Across Federated s-Server Instances in the Integration Guide.

You may also find it useful to also incorporate generic system monitoring tools including:

SMTP for verifying system presence, monitoring processing load, disc and memory usage.
JMX  for remote monitoring of suitably enabled Java VMs, such as for monitoring Kafka brokers.

For information on these tools, please see your operating system's documentation.

To ensure that your system remains operational, you should monitor all servers in some way (see Monitoring the Monitors below for information on managing monitoring systems themselves).

Monitoring SQLstream Products

Services/Processes

Process Memory

Log Files

Monitoring webAgent

s-Server Self Monitoring

Open Ports

Services / Processes

SQLstream service PIDs are recorded in pid files under /var/run/sqlstream.

For each service to be monitored, verify that:

1.The pid file exists (if not, the service has been cleanly shut down)
2.The process corresponding to the pid is running (if not, the service has died unexpectedly)

s-Server itself has two pid files: s-serverd.pid for the core and jvm.pid for the Java VM.

Here is a simple bash fragment to check process status, and report UP, BROKEN or DOWN for each:

  for s in $SERVICES

   do

      pid=0

      message=

 

      pidfile=/var/run/sqlstream/$s.pid

 

      if [ "$s" = "snmpd" ]

      then

          # this is not owned by sqlstream, so use simpler method

          sudo service snmpd status > /dev/null

          if [ $? -eq 0 ]

          then

              status=UP

              message="service running normally"

          else

              status=DOWN

              message="service not running"

          fi

      elif [ -e $pidfile ]

      then

          pid=`cat $pidfile`

          pgrep --pidfile $pidfile > /dev/null

          if [ $? -ne 0 ]

          then

              # there is no matching process

              status=BROKEN

              message="process not running"

 

              # should we restart the service?

              # restart $s

          else

              # all good - process running

              status=UP

              message="service running normally"

          fi

 

          if [ -z "$pid" ]

          then

              pid=0

          fi

      else

 

 

          status=DOWN

          message="service not running"

      fi

 

    # send the service status in all cases, as we are using it to drive alerts as well as status monitor

      echo "$server,$s,$status,$pid,$message" >> $serviceStatusFile

      statustext="$statustext - $s:$status"

 done

 

Process Memory

We recommend using your operating systems utilities and/or jstack to monitor heap size for the various SQLstream processes.

Here is a simple bash function to check heap size for a given JVM. Parameter 1 is the name of the service (such as jvm or webagentd). The PID and heap stats are returned in jspid and jstats (the latter contains two values, current and max heap size).

function getJavaStats() {

  if [ -f /var/run/sqlstream/$1.pid ]

  then

      # process is apparently running

      # TODO check pid is not broken

      jspid=`cat /var/run/sqlstream/$1.pid`

      jstats=`jstat -gc  $jspid | tail -1 | awk '{ print ($3 + $4 + $6 + $8 + $10), ",", ($1 + $2 + $5 + $7 + $9) }' | tr -d ' ' `

  else

      jspid=0

      jstats="0,0"

  fi

 

}

 

The heap size required will vary substantially depending on the nature of the processing pipelines installed. The important thing is to establish a normal pattern, then identify exceptional conditions when they occur.

Log files

The default locations for trace/log files are:

Service

Trace/Log Directory

Log Filename

s-serverd

/var/log/sqlstream which is a link to

$SQLSTREAM_HOME/trace

Trace.log.<n>

webagentd

$SQLSTREAM_HOME/../../clienttools/webagent/trace

Trace.log.<n>

Note: $SQLSTREAM_HOME refers to the installation directory for s-Server, such as /opt/sqlstream/5.0.XXX/s-Server.

Monitoring webAgent

If you are using webagent, you can monitor it using its /status API. A tool for accessing this API is available at port 5580 at whatever host is running WebAgent.

If you launch webAgent manually (if you are not running it as a service, you will need to enable the -a option when you launch WebAgent in order to make the API available.

If there is no response at all, that may be because the agent is down or it could be that the agent is up but is currently unable to connect to s-Server.

Here is a very simple bash function to check the webagent status (returned as waStatus):

function checkWebAgentStatus () {

  WAPORT=5580

 

  rm -f /tmp/sqlstream/wastatus.json /tmp/sqlstream/wastatus.log  2>/dev/null

 

  if [ -f /var/run/sqlstream/webagentd.pid ]

  then

      # apparently running

      # TODO - check service is not broken

      #wget localhost:$WAPORT/status -O /tmp/sqlstream/wastatus.json -o /tmp/sqlstream/wastatus.log 2>&1 >/dev/null

      # seems curl is more likely to be available

      curl --silent -o /tmp/sqlstream/wastatus.json localhost:$WAPORT/status   2>&1 >/dev/null

      if [ $? -ne 0 ]

      then

          # there was some problem with the fetch

          waStatus="UNREACHABLE"

      else

          # we got a status back

          grep "\"message\":\"OK\"" /tmp/sqlstream/wastatus.json 2>&1 >/dev/null

          if [ $? -eq 0 ]

          then

              # status was apparently OK

              waStatus="UP"

          else

              waStatus="BROKEN"

              # save a timestamped copy of the file

              mv /tmp/sqlstream/wastatus.json /tmp/status/wastatus.$tstamp

          fi

      fi

  else

      waStatus="DOWN"

  fi

}

 

Self Monitoring of SQLstream s-Server

SQLstream s-Server 5.1 includes extensive self-monitoring capabilities that can be integrated into a monitoring framework. See Managing and Monitoring SQLstream s-Server and Using Telemetry to Monitor Performance sections of the Administration Guide for more information.

Open Ports

See Configuring Ports for SQLstream Blaze for a list of (default) ports used by SQLstream Blaze.

Note: Some of these port allocations can be modified either during the installation process, or when starting services / processes. For more information, see the documentation or contact SQLstream Support.

Application-Specific Monitoring

Checking that data is being processed

A very simple way of telling that data is being processed is to arrange for some sort of summary data to be delivered to a file every minute (or at some other suitable frequency).

As well as giving application-level statistics (such as counting input rates) this provides immediate assurance that the end-to-end process is running.

Use the ECDA adapter for writing to a file (see Writing to the File System). For the FILENAME_DATE_FORMAT use a partial format - for example ‘HH’. This will ensure that the file is rotated each hour, and a maximum of 24 hours worth of data is retained.
Periodically monitor the directory for the latest file, and raise an exception if is more than 2 aggregation periods old.

Of course in a complex system there may be many inputs and outputs; each of these could be monitored by tapping off the processing pipeline in this way.

Using SQLstream as a Monitoring Solution

It is of course possible to build a monitoring system using SQLstream itself. Data flows can be aggregated and analysed; changes to data flow patterns can be detected and alerted on. Long term storage of the data into a database is simple.

Monitoring Options

File based monitoring on the individual SQLstream servers may be inconvenient, unless there is already a standardised monitoring agent present on the application nodes (for example, logStash as a companion to Kibana). As an alternative, consider push/send (or use agents to receive/pull) the periodic messages direct to a central monitoring service.

Wherever possible the system being monitored should be unaware of the monitoring framework. The monitoring framework should be able to scale easily from monitoring a simple, single application node to monitoring one or more clusters.

Push vs Pull

One of the first decisions to be taken is whether the application should push monitoring data to the monitor, or the monitor should poll data from the application.

Push Mechanisms Available

Network sockets: the lowest common denominator, very simple to implement. No protection against loss of data, and the flow is point-to-point (so there is no failover and no.
Kafka: provides a “lingua franca” transport mechanism that includes data store and forward, replay, and replication. The SQLstream application service doesn’t need to be aware of where the monitor is running - it only needs to have a list of (at least some) Kafka brokers.

Other MQs: as for Kafka

A big advantage of the push approach is that the monitor doesn’t have to worry which application nodes to monitor; it can passively accepts data from any running system (as long as messages are all identified by originating server).

Pull Mechanisms Available

SQLstream JDBC (SELECT STREAM … FROM …)
SQLstream ECDA Agents
SQLstream Federation

The monitor can pull data (without polling) by setting up streaming queries at the monitored application nodes. These can pull data into a SQLstream processing pipeline for further monitoring, or ECDA agents can hand off the data into a local flat-file on the monitor server, assuming that will be loaded/processed by the monitoring software.

The drawback with pull-based approaches is that the monitor needs to set up queries to all application nodes (SQLstream s-servers). This can substantially increase the complexity of the monitor. The monitor of course also needs to know exactly which application nodes are expected to be running at any time.

If the SQLstream monitor needs to access the SQLstream application service directly, they both need to use the same version of the SQLstream JDBC driver. If you want to be able to upgrade the monitor independently of the application, you should consciously break this dependency; instead use message transport mechanisms that are not SQLstream version dependent, such as Kafka or network sockets.

Conclusion

Generally, a push-based approach is likely to be the easiest to build and deploy. If Kafka is in wide use, consider benefiting from its store-and-forward and replay capabilities.

Location of monitor analysis processing

Analytical processing can take place either in the application node or at the central monitor. The choice will depend on many factors:

Analytical capability of the monitor compared to the SQLstream application.
Processing loads at application and monitor servers.
Preference for encapsulation of application-specific functionality (the monitor does not / should not need detailed knowledge of the application).
Can the necessary metrics be calculated at the source? Of course some application-specific final stage of aggregation is always required at the monitor - but where possible make this as application-neutral as possible.
oFor example: use a single stream for counting transaction volumes for all transactions across all application nodes. Then new transactions can be added to the application without modifying the monitor.
oThis also allows higher level aggregations (eg count of total volume, as well as volume by transaction).

Monitoring the Monitors

There is always a risk that monitoring systems and subsystems can themselves fail.

To mitigate that, we recommend that you always include some basic stand-alone monitoring or alerting on every application and monitor node. This may be as simple as some bash scripts and email alerters, or it could use common network monitoring tools such as Nagios. These tools should be completely independent of the main application components (such as Kafka and SQLstream).