Log File Adapter

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Appendix A: Legacy Adapters and Agents >

Log File Adapter

Previous pageReturn to chapter overviewNext page

Note: This topic describes a legacy adapter. See the topics Reading from Other Sources and Writing to Other Destinations for current solutions.

Selection and analysis of log file records is a major tool in monitoring ongoing data-generating, activities. Such log file analysis is useful for fraud detection and usage monitoring.

SQLstream provides two mechanisms for processing log files:

Logs that are accessible directly from the server that hosts SQLstream can be input to streams directly by using the SQLstream Log File Adapter as described in this in this topic.
Logs that are generated from other servers, termed remote servers, can be input to streams by using the SQLstream Log File Agent.

The Log File Adapter lets you process log files on the fly and then query these as streams, using SQL to analyze these streams.

Overview

The Log File Adapter is a streaming adapter that creates relational streams from specified computer system log files. Similar to the UNIX/Linux command "tail," the Adapter opens a specified log file and periodically polls it to discover and process new data. You use adapter options to identify the log files that you wish to track.

server options .......... Valid only for a CREATE SERVER command defining a specific instance of a log file server
stream options ......... Valid only for a CREATE FOREIGN STREAM command defining a specific instance of the log file adapter

Log file data is initially presented as a single, variable-width text column. You can use one of s-Server's parsers to convert the "raw" log data into rows and columns for later SQL manipulation. These parsers work with common log file formats, as described in Section 2 under views and parsing functions.

The Logfile Adapter (com.sqlstream.plugin.logfile.LogStreamControlPlugin) is a standard component and is included in the SQLstream installation package. No special steps are needed before using the adapter in server or stream definitions.

The Logfile Adapter never considers a logfile "closed" until a new file appears. This is because log files often have multiple loggers, which another line to the log file at any time.

Limitations

The Log File Adapter currently does not handle log files that present data with multiple lines per log file entry. For example, XML-based log file formats are not supported by this adapter. Log files that are stored in a compressed format are also unsupported.

Using the Log File Adapter

To use Log File Adapter, you need to understanding the following:

the commands that define the Log File Adapter in the SQLstream catalog and make it available to the SQLstream s-Server,
the commands that define a stream that uses the log file adapter.
the options those commands use: server options and stream options.

It would also be useful to understand the following related topics:

the use of views to enable more complex representations of log file data, and
the special parsing functions that enable log entry access and manipulation:
oVARIABLE_COLUMN_LOG_PARSE ----- enables processing each arbitrary-length input row based on user-supplied definitions for delimiters, escape, and quote characters
oFIXED_COLUMN_LOG_PARSE ----- enables processing of fixed-width fields and automatically converting them to specified SQL types
oW3C_LOG_PARSE ----- supports access to logs generated by W3C-compliant applications like the Apache web server
oSYS_LOG_PARSE ----- enables processing entries commonly found in UNIX/Linux system logs, consisting of system log entries that start with a timestamp and are followed by a free form text field
oREGEX_LOG_PARSE ----- operates like SYS_LOG_PARSE except that it takes an extra parameter: the regular expression to use for parsing each line into columns. This function corresponds to the REGEX parser option in CREATE FOREIGN STREAM.
oFAST_REGEX_LOG_PARSE----a faster version of REGEX_LOG_PARSE that uses lazy semantics with no backtracking. ( In regular expression parsing parlance, 'lazy' means "don't parse more than you need to at each step; 'Greedy' means "parse as much as you can at each step." )

Examples appear throughout this document. References and links to additional examples appear at the end.

Relevant Commands

The CREATE FOREIGN DATA WRAPPER statement defines the Log File Adapter in the SQLstream catalog and makes the adapter available to the SQLstream s-Server. The Log File Adapter is implemented by the Java class com.sqlstream.plugin.logfile.LogStreamControlPlugin as specified in LIBRARY option.

The CREATE SERVER statement informs the SQLstream s-Server that data will be supplied by the Log File Adapter using an external data source. Server options, valid only for the CREATE SERVER command, define a specific instance of a log file server.

The CREATE FOREIGN STREAM statement is by far the most important of the three statements. The stream options describe the operational parameters of the data source. They are specified to the SQLstream s-Server in the OPTIONs clause of the CREATE FOREIGN STREAM statement, which passes them to the adapter.

Note that all parser-related options are prefixed with the string 'PARSER_'. For example, PARSER_COLUMNS, "PARSER_COLUMN COUNT", PARSER_DELIMITERS, "PARSER_TERMINATION STRING", "PARSER_ESCAPE CHAR", "PARSER_TRIM WHITESPACE", and PARSER_QUOTES are the names used in an OPTIONS clause for the parameters to the VARIABLE_COLUMN_LOG_PARSE function. When that function is specified in an OPTIONS clauses, it is identified as PARSER 'VARIABLE', as shown in example 1, example 2, and example 3.

Note: The four options shown above enclosed in double-quotes require being enclosed in double-quotes; absent the double-quotes, the option has no effect.

Schema Definitions

Creating a stream to follow a log file consists of defining the stream using DDL statements. The basic schema for any log file stream is a single column with one row per line in the log file. By default, the name of the column is MESSAGE with type VARCHAR(1024).

The following sample data set will be used in the examples and explanations presented in the following sections, unless otherwise specified. The sample data represents the log from a mail application:

Mar 7 06:30:03 server1 sendmail[29353]: i27EU2725212: from=<sender@source.com>, size=881,

         class=0, nrcpts=1, msgid=<20040307142957.17974@foo.source.com>, proto=SMTP,

         daemon=MTA, relay=foo.source.com [192.168.1.1]

Mar 7 06:30:03 server1 sendmail[29354]: i27EU2725212: to=<recipient@destination.com>,

         delay=00:00:01, xdelay=00:00:00, mailer=local, pri=30537, dsn=2.5.0, stat=Sent

Mar 7 06:31:49 server1 ipop3d[29360]: pop3 service init from 10.0.0.1

Mar 7 06:31:49 server1 ipop3d[29359]: Login user=recipient host=bar [10.0.0.1] nmsgs=0/0

Mar 7 06:31:49 server1 ipop3d[29359]: Logout user=recipient host=bar [10.0.0.1] nmsgs=0

         ndele=0

 

The following sample DDL is an example of the basic framework to define a stream using the log file adapter. For details on the statements used in this example, see the topics CREATE FOREIGN DATA WRAPPER, CREATE SERVER, and CREATE FOREIGN STREAM in the SQLstream Streaming SQL Reference Guide.

CREATE SCHEMA schema1;

-- Install the log file adapter

CREATE FOREIGN DATA WRAPPER logfile_wrapper

   LIBRARY 'class com.sqlstream.plugin.logfile.LogStreamControlPlugin'

   LANGUAGE JAVA;

-- Define the external server

CREATE SERVER logfile_server

   FOREIGN DATA WRAPPER logfile_wrapper;

-- Define a stream

CREATE FOREIGN STREAM schema1.mail_stream

   SERVER logfile_server

   OPTIONS (log_path '/path/to/logfile',

            encoding 'UTF-8',

            sleep_interval '10000',

            max_unchanged_stats '10');

 

The following command could produce the output shown in the table following the command.

SELECT STREAM * FROM schema1.mail_stream;

 

MESSAGE

Mar 7 06:30:03 server1 sendmail[29353]: i27EU2725212: from=<sender@source.com>,

        size=881, class=0, nrcpts=1,relay=foo.source.com [192.168.1.1]

Mar 7 06:30:03 server1 sendmail[29354]: i27EU2725212: to=<recipient@destination.com>,

        delay=00:00:01, xdelay=00:00:00, mailer=local

Mar 7 06:31:49 server1 ipop3d[29360]: pop3 service init from 10.0.0.1

Mar 7 06:31:49 server1 ipop3d[29359]: Login user=recipient host=bar [10.0.0.1]

        nmsgs=0/0

Mar 7 06:31:49 server1 ipop3d[29359]: Logout user=recipient host=bar [10.0.0.1]

        nmsgs=0 ndele=0

Server options

Server options are valid only for the CREATE SERVER DDL that defines a specific instance of a log file server. See the Server Options Table.

Note: All foreign servers can now take JndiParams as a parameter, naming the JndiParams property file (to be) located in the $SQLSTREAM_HOME/plugin/jndi directory. Any property in that JndiParams property file will override any option specified in the server options. Servers whose streams use options from the server options as defaults can also have their properties come from the JndiParams property file. Properties of the form <stream name>.property will override the corresponding stream option. If the stream has specified the option JndiPrefix, then that prefix is used instead of <stream name>. $SQLSTREAM_HOME refers to the installation directory for s-Server, such as /opt/sqlstream/5.0.XXX/s-Server.

Log File Adapter Server Options

Name

Type

Description

ROOT

string

Optional absolute path describing a directory. If specified, this option will prevent the creation of any FOREIGN STREAM (associated with this server) whose LOG_PATH option specifies a log file not contained within the given path. Furthermore, a FOREIGN STREAM associated with this server may specify its LOG_PATH relative to the sever's root.

ROOT examples

Server's ROOT

Foreign Stream's LOG_PATH

Legal?

/var/log

/var/log/maillog

Yes

/var/log

http/access_log

Yes

/var/log

../log/maillog

Yes

/var/log

/usr/local/log/foolog

No

/var/log

../../usr/local/log/foolog

No

Stream options

Stream options are valid only for the CREATE FOREIGN STREAM DDL that defines a specific instance of the log file adapter. The table Log File Adapter Stream Options shows the options for foreign streams. See also the note preceding the table.

Note: All foreign servers can now take JndiParams as a parameter, naming the JndiParams property file (to be) located in the $SQLSTREAM_HOME/plugin/jndi directory. Any property in that JndiParams property file will override any option specified in the server options. Servers whose streams use options from the server options as defaults can also have their properties come from the JndiParams property file. Properties of the form <stream name>.property will override the corresponding stream option. If the stream has specified the option JndiPrefix, then that prefix is used instead of <stream name>.

Log File Adapter Stream Options

Option (case insensitive)

Description

Possible values,

default in bold

LOG_PATH

Path to the log file location

No default

FILE_PATTERN

Java Regular Expression used to identify log files.

File pattern property used if LOG_PATH turns out to be a directory.

No default

Java regular expression

See section on Pattern-Based File Name below.

ENCODING

Name of the stream property that identifies the log file's encoding.

UTF-8

string

SLEEP_INTERVAL

Sleep interval between tail reads, in milliseconds.

1000

ROWTIME_ GROUPING

Number of consecutive lines read that get the system ROWTIME

5

FILENAME_ COLUMN

Optional column to be loaded with the current log file name.

 

No default

Name of the column name into which the log file adapter or agent should put the current log file's filename

MAX_ UNCHANGED _STATS

Number of sleep intervals without a change in the log file before tail checks for rotation

5

PARSER

Type of parser to be used to read in data

VARIABLE

FIXED

W3C

REGEX

FASTREGEX

SYSLOG

"PARSER_ COLUMN COUNT"

Enumerates the number of columns to parse from input. The type will be set to varchar(1024).

Use "PARSER_COLUMN COUNT" or PARSER_COLUMNS: not both.

Note: The double-quotes enclosing PARSER_COLUMN COUNT are ''required.'

integer

PARSER_ COLUMNS

Declares Column names and types for parsed data.

Use PARSER_COLUMNS or "PARSER_COLUMN COUNT": not both.

[column_name] type [type name]

e.g., column1 type varchar(20)

PARSER_ DELIMITERS

Defines the characters used to delimit the columns in your data. Any character in the list will be used as a delimiter. For example, after specifying PARSER_DELIMITERS ',;:', every comma, semicolon, and colon in a log record starts a new field.

Note: Either PARSER_DELIMITER or "PARSER_TERMINATION STRING" must be present.

, (default is comma)

any characters

"PARSER_ TERMINATION STRING"

The entire string will be treated as the delimiting sequence.

 

Note: Either "PARSER_TERMINATION STRING" or PARSER_DELIMITER must be present.

 

"PARSER_ ESCAPE CHAR"

Defines the character to be used when delimiter should be ignored. Only used when delimiter is defined.

Note: The double-quotes enclosing PARSER_ESCAPE CHAR are required.

Optional, no default

PARSER_QUOTES

Defines the enclosing characters that define text to be treated as a column of data.

If PARSER_QUOTES defines a single character, that character is used as a start and end quote: that is, that character defines the start and later the end of the text-as-column.

If PARSER_QUOTES defines 2 characters, the first one is the start quote and the second one is the end quote.

For example, PARSER_QUOTES "[]" defines [ as the start of the text-as-column and ] as the end of the text-as-column.

"

"PARSER_TRIM WHITESPACE" 'TRUE'

If "true", leading and trailing whitespaces adjacent to a delimiter are ignored, unless they are enclosed in quotes. If "false" (default), leading and trailing whitespaces are not ignored.

true

false

PREPROCESS_ COMMAND

Shell script and parameters to be invoked before processing each logfile.

Called with an added parameter - the full path of the logfile being processed.

Script should write, as its last line to stdout, the path to the processed logfile.

Script can also write to stderr lines starting with CONFIG, INFO, FINE etc.

Lines will be written to the trace log controlled by com.sqlstream.plugin.logfile.

 

POSTPROCESS_ COMMAND

Shell script and parameters to be invoked before processing each logfile.

Called with two added parameters - first being the full path of the logfile being processed (as returned by PREPROCESS_COMMAND), and the second being the full path of the original logfile.

Script should write, as its last line to stdout, the path to the processed logfile.

Script can also write to stderr lines starting with CONFIG, INFO, FINE etc.

Lines will be written to the trace log controlled by com.sqlstream.plugin.logfile.

 

Rotating Log Support

The Log File Adapter supports log file rotation. Using the SLEEP_INTERVAL and MAX_UNCHANGED_STATS options of the CREATE FOREIGN STREAM statement, the adapter can detect that a log file has not changed and will attempt to re-open the file and thus will continue to service the new log file without closing the stream.

You can specify log files in either of two ways: by using a fixed file name reused with each rotation, or by specifying a Java regular expression as the name. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more detail.

Fixed Base Name

This mode services logs that use a fixed base name. The log files are renamed as they are rotated out, which is standard practice for most system logs and for many major software packages. An example from the Linux OS is /var/log/syslog.

In this mode, the log file name is specified as part of the LOG_PATH, and FILE_PATTERN must not be present. The log file adapter detects that the current file has not changed in the period specified by MAX_UNCHANGED_STATS. The adapter attempts to re-open the file with the fixed name. The tailing function continues with the new file.

-- Define a stream that takes input from a fixed named log file

-- In this case the system log -> syslog

CREATE FOREIGN STREAM schema1.syslog_stream

   SERVER logfile_server

   OPTIONS (log_path '/var/log/syslog',

            encoding 'UTF-8',

            sleep_interval '10000',

            max_unchanged_stats '10');

 

Note: Foreign streams, like all streams, need to be defined within a schema. If you do not have not previously set a schema, the above code block will not work until you do.

Pattern-Based File Name

This mode is used when an existing log is to be rotated out and a new file created with a name different from the previous file. For example, log rotation occurs every night at midnight and a new log based on the date is opened. The current file name might be something like app_log_2009-04-01. At midnight, a new file is created named app_log_2009-04-02. To handle this mode of operation, the FILE_PATTERN option is used. When the FILE_PATTERN option is present, the log file adapter does two things.

1.The string associated with FILE_PATTERN is treated as a Java regular expression. A list of files is gathered from the directory specified by LOG_PATH. The regular expression is used to search the file list for candidate files.
2.Using the results of the regular expression filter, the Adapter selects the file from the candidate pool with the latest time stamp and whose time stamp is later than the current log file as the new log file.

What happens when LOG_PATH specifies a specific file and FILE_PATTERN is present? In this case, the file specified by LOG_PATH is the first file opened for processing. When the file is detected as being dormant, the FILE_PATTERN process is used to identify any new candidates for tailing.

-- Define a stream that takes input from a log with a changing name

-- In this case, the adapter will monitor the /var/log directory for files

-- that are prefixed with "userlog" combined with the current date ->

-- userlog.2009.01.29

CREATE FOREIGN STREAM schema1.user_stream

   SERVER logfile_server

   OPTIONS (log_path '/var/log',

            file_pattern 'userlog\.\d{4}\.\d{2}.\d{2}' ,

            encoding 'UTF-8',

            sleep_interval '10000',

            max_unchanged_stats '10');

Schema with Views

More complex representations of log file data are accomplished by applying a VIEW on top of the basic schema.

A very basic view on this example stream schema1.maillog could be defined as:

  CREATE VIEW schema1.mailview AS

    SELECT SUBSTRING(message FROM 1 FOR 14) AS ENTRY_TIME,

           SUBSTRING(message FROM 16) AS ENTRY

    FROM schema1.mail_stream   ;

 

The view would return the following entries for the example data:

 ENTRY_TIME                                                ENTRY

 Mar  7 06:30:03  server1         sendmail[29353]: i27EU2725212:

                                   from=<sender@source.com>, size=881, class=0, nrcpts=1,

                                  msgid=<20040307142957.17974@foo.source.com>,

                                   proto=SMTP, daemon=MTA, relay=foo.source.com

                                    [192.168.1.1]

 Mar  7 06:30:03         server1         sendmail[29354]: i27EU2725212:

                              to=<recipient@destination.com>,

                              delay=00:00:01,xdelay=00:00:00, mailer=local,

                              pri=30537, dsn=2.5.0, stat=Sent

 Mar  7 06:31:49         server1         ipop3d[29360]: pop3 service init from 10.0.0.1

 Mar  7 06:31:49         server1         ipop3d[29359]: Login user=recipient

                                           host=bar [10.0.0.1] nmsgs=0/0

 Mar  7 06:31:49         server1         ipop3d[29359]: Logout user=recipient

                                host=bar [10.0.0.1] nmsgs=0

                                ndele=0

 

A more complicated view would require converting the log file's date and time data into the SQL TIMESTAMP type. You might also like to determine the boundaries of the server name, which is a variable width field, and provide it as a separate column.

 

CREATE VIEW schema1.mailview AS

 SELECT STREAM

SYS_BOOT.MGMT.CHAR_TO_TIMESTAMP('MMM dd HH:mm:ss',message)

AS ENTRY_TIME,

     SUBSTRING(message FROM 16 FOR

                (POSITION(' ' IN SUBSTRING(message FROM 16)) - 1)) as SERVER_NAME,

     SUBSTRING(message FROM

                (16 + POSITION(' ' IN SUBSTRING(message FROM 16)))) AS ENTRY

 FROM schema1.mail_stream ;

 

This view would return the following entries for the example data:

Mar  7 06:30:03

server1

sendmail[29353]: i27EU2725212:from=<sender@source.com>,size=881, class=0, nrcpts=1,msgid=<20040307142957.17974@foo.source.com>,proto=SMTP, daemon=MTA,relay=foo.source.com [192.168.1.1]

Mar  7 06:30:03

server1

sendmail[29354]:i27EU2725212to=<recipient@destination.com>,delay=00:00:01, xdelay=00:00:00,mailer=local, pri=30537, dsn=2.5.0, stat=Sent

 

Mar  7 06:31:49

server1

ipop3d[29360]: pop3 service init from 10.0.0.1

 

server1

ipop3d[29359]: Login user=recipienthost=bar [10.0.0.1] nmsgs=0/0

As you can see, using SQL to directly parse log file entries is awkward when variable-width fields appear in the log file. In addition to being tricky to write, these SQL statements do not lend themselves to high-performance processing.

To solve this problem, SQLstream provides several functions to parse log file entries at high speed. SQLstream's distributed optimization is able to move the implementation of these functions into the log file adapter itself, allowing for high speed processing of log file entries.