Parsing W3C Data

Parsing W3C Data

Parsing W3C Data

The W3C option lets you parse logs generated by W3C-compliant applications. Log file entries are described using data specifiers defined in the Apache mod_log_config documentation. The W3C parser uses the W3C parser function, described in the topic W3C_LOG_PARSE in the Streaming SQL Reference Guide. That function can be used anywhere in your code. The W3C parser for the Extensible Common Data Framework lets you parse W3C log data as it comes into s-Server. That may be desirable for performance or other reasons.

To use the Extensible Common Data Adapter with W3C files, you set parser to W3C, then pass in groups of filters that will map to columns. The W3C parser takes one additional property, PARSER_FORMAT, which takes the values specified in the table below.

Column names cannot be dynamically assigned with W3C files. You need to declare these as part of a the foreign stream or table.

Note: SQLstream handles Apache log format specifiers without alteration so a script could copy them from an Apache httpd.conf file directly into log.sqlstream.xml.

Sample Foreign Stream to Parse W3C Files

The following example will parse columns called recNo, ts, accountNumber, sourceID, destIP, and customerID from a file in /usr/local/share/sqlstream/data/.

Note: Information on file location, file name pattern and character encoding can also be set as server options.



("recNo" INTEGER,


"accountNumber" INTEGER,

"loginSuccessful" BOOLEAN,

"sourceIP" VARCHAR(32),

"destIP" VARCHAR(32),

"customerId" INTEGER,)

SERVER "FileReaderServer"


(DIRECTORY ''/usr/local/share/sqlstream/datalinks/accesslog/log_data',

            --assume a link to the real location of the data

            filename_pattern 'log.\d{4}(-\d\d){2}', -- e.g. log.2011-09-04',

            encoding 'UTF-8',

            parser 'W3C',

            parser_format 'COMMON'


   DESCRIPTION 'CommonAdapter Foreign Stream';


Sample Properties Implementing ECD Agent to Parse FCLP Files

To parse W3C files with the ECD Agent, configure the options above using the ECD Agent property file with properties similar to the following:

ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)

(DIRECTORY '/tmp',

FILENAME_PATTERN 'transactions\.log',





