Parsing W3C Data

<< Click to Display Table of Contents >>

Navigation:  Integrating Blaze with Other Systems > Reading Data into s-Server > Reading from Other Sources  > File Formats for Reading >

Parsing W3C Data

Previous pageReturn to chapter overviewNext page

The W3C option lets you parse logs generated by W3C-compliant applications. Log file entries are described using data specifiers defined in the Apache mod_log_config documentation. The W3C parser uses the W3C parser function, described in the topic W3C_LOG_PARSE in the Streaming SQL Reference Guide. That function can be used anywhere in your code. The W3C parser for the Extensible Common Data Framework lets you parse W3C log data as it comes into s-Server. That may be desirable for performance or other reasons.

To use the Extensible Common Data Adapter with W3C files, you set parser to W3C, then pass in groups of filters that will map to columns. The W3C parser takes one additional property, PARSER_FORMAT, which takes the values specified in the table below.

Column names cannot be dynamically assigned with W3C files. You need to declare these as part of a the foreign stream or table.

Note: SQLstream handles Apache log format specifiers without alteration so a script could copy them from an Apache httpd.conf file directly into log.sqlstream.xml.

Sample Foreign Stream to Parse W3C Files

The following example will parse columns called recNo, ts, accountNumber, sourceID, destIP, and customerID from a file in /usr/local/share/sqlstream/data/.

Note: Information on file location, file name pattern and character encoding can also be set as server options.

 

CREATE OR REPLACE FOREIGN STREAM "W3C_LOG_PARSE"

("recNo" INTEGER,

"ts" TIMESTAMP NOT NULL,

"accountNumber" INTEGER,

"loginSuccessful" BOOLEAN,

"sourceIP" VARCHAR(32),

"destIP" VARCHAR(32),

"customerId" INTEGER,)

SERVER "FileReaderServer"

OPTIONS

(DIRECTORY ''/usr/local/share/sqlstream/datalinks/accesslog/log_data',

            --assume a link to the real location of the data

            filename_pattern 'log.\d{4}(-\d\d){2}', -- e.g. log.2011-09-04',

            encoding 'UTF-8',

            parser 'W3C',

            parser_format 'COMMON'

            )

   DESCRIPTION 'CommonAdapter Foreign Stream';

 

Sample Properties Implementing ECD Agent to Parse FCLP Files

To parse W3C files with the ECD Agent, configure the options above using the ECD Agent property file with properties similar to the following:

ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)

(DIRECTORY '/tmp',

FILENAME_PATTERN 'transactions\.log',

PARSER=FastRegex

CHARACTER_ENCODING=UTF-8

SKIP_HEADER=TRUE

SEPARATOR=u\000A

FILENAME_PATTERN=log.\d{4}(-\d\d){2}', -- e.g. log.2011-09-04'\d\d)

PARSER_FORMAT=COMMON