Reading from Other Sources

<< Click to Display Table of Contents >>

Navigation:  Integrating Blaze with Other Systems > Reading Data into s-Server >

Reading from Other Sources

Previous pageReturn to chapter overviewNext page

The Extensible Common Data parser reads from a variety of sources and parses data from these sources into rows in an s-Server stream or table. Currently supported sources are:

Reading from the File System
Reading from Sockets
Reading from AMQP\
Reading over HTTP
Reading over Web Sockets
Reading from IBM MQ
Reading from Amazon Kinesis
Reading from Web Feeds

s-Server parses data from these sources into rows of columns in streams or tables. You can read data from these streams or tables by running a SELECT statement against them.

To parse files from these sources using an adapter, you need to set up both a server for the source and set up a foreign stream for the source. One server object can be referenced by multiple foreign streams. To parse files with the Extensible Common Data agent, you launch the agent from its directory at the command line, using the --input flag. See the topic Reading Files in Remote Locations for more information on the Extensible Common Data Agent.

File Formats

s-Server can parse a number of file formats from these sources:

Topic

Parameter Name

Reading CSV

CSV

Reading XML

XML

Reading JSON

JSON

Reading Key Pair Values

KV

Reading ProtoBuf

PROTOBUF

Parsing Variable Column Data

VCLP

Parsing Fixed Column Data

FCLP

Parsing Files with FastRegex Row Filter

FastRegex

Parsing W3C Data

WSC

Note: You can also input data in larger chunks and parse it later using the Parser UDX. This UDX calls the parsers listed above in a function. For more information on using functions, see the topic Transforming Data in s-Server in this guide.

Handing Unparsable Rows

Occasionally, s-Server will encounter rows that it cannot parse. By default, in these cases s-Server fills in default values for columns which don't parse and then logs an informational message to its trace log. (See the topic Using Trace Files in Administrator Guide for more details on the trace log.)

For CSV and JSON only, you can enable a stream option called DISCARD_ROW_ON_PARSE_ERROR. The option causes s-Server to throw away rows with parsing errors to log the surrounding payload to the global error stream. (See the topic Global Error Stream in Streaming SQL Reference Guide for more information.)

To enable this option, add the following line to an ECD foreign stream definition that uses CSV or JSON:

discard_row_on_parse_error 'true',

Steps for Reading Files with the ECD Adapter

1.Designate a transport medium as a server option.
2.Designate foreign stream or table options, including:
a.The location of the file.
b.Options for how the file will be parsed, such as file pattern, separator character, and format-specific options.
3.Run a SELECT against the stream.

General Code Example

The example below demonstrates how to parse a CSV file over the file system. To parse other files over other transport systems, you change server and stream options appropriately.

Creating a Server Object

First, you need to create a server object which references the data wrapper ECDA as well as the type of medium.

CREATE OR REPLACE SERVER "FileReaderServer" TYPE 'FILE'

FOREIGN DATA WRAPPER ECDA;

 

Creating a Foreign Stream

Next, you create a foreign stream object. This object contains connection information for the file, such as directory, filename pattern, filename date format, filename prefix and suffix, character encoding, whether or not to write a header, character separator.

CREATE OR REPLACE SCHEMA "FileSource";

SET SCHEMA '"FileSource"';

 

CREATE OR REPLACE FOREIGN STREAM "FileReaderStream"

("recNo" INTEGER,

"ts" TIMESTAMP NOT NULL,

"accountNumber" INTEGER,

"loginSuccessful" BOOLEAN,

"sourceIP" VARCHAR(32),

"destIP" VARCHAR(32),

"customerId" INTEGER,)

--Columns for the new stream

SERVER "FileReaderServer"

OPTIONS

(parser 'CSV',

character_encoding 'UTF-8',

separator ',',

skip_header 'false',

directory '/tmp',

filename_pattern 'buses\.log');

 

Steps for Parsing Files with the ECD Agent

1.Create a stream (and schema, if necessary) in s-Server into which the agent will input (this is known as the destination stream).
2.Create a properties file options for the agent, such as example.properties. You use this file to configure the agent. This file needs to include information such as the input file's directory, parser, filename pattern, and other identifying information, as well as options for the destination stream, such as schema_name, table_name, and rowtype. See the topic Extensible Common Data Agent Overview for more details. The code below represents a sample property file that reads from a CSV formatted file called example.log in a directory called /var/tmp, and inputs into a destination stream called BIDS in the schema SALES on the default s-Server defined for this agent.

# Location, name, and type of file

DIRECTORY=/var/tmp

FILENAME_PATTERN=example.log

#Parser for file. Options are CSV, JSON, XML, and others.

PARSER=CSV

# Schema, name, and parameter signature of destination stream

SCHEMA_NAME=SALES

STREAM_NAME=BIDS

ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)

 

3.Determine the input/output format you will use. The ECD agent defaults to using the file system, but you can also configure it to read over a network socket, an amqp feed, to and from a kafka server, to and from an mqseries server, or to and from a hadoop file system. A list of options for the --io property is available at the topic Extensible Common Data Agent Overview.
4.Launch the agent with the --input property enabled and a --props property that references the properties file you just created:

./commondataagent.sh --input --io file --props example.properties

 

If you have configured everything correctly, the agent should begin parsing data from the file location and feeding this data into your destination stream.

Note: The location of this directory depends on where you installed the client tools suite. In a default s-Server installation, it is above the $SQLSTREAM_HOME directory, and will be something like /opt/sqlstream/5.0.XXXX/clienttools/EcdaAgent.

Special Columns Generated by Parsers

Some input adapters can have "special" columns which are part of the foreign stream definition but not part of the underlying data format. For example, a column called PARSE_POSITION is produced for all file formats. If you want to include this column in your stream, you need to define it as a BIGINT. This column will not appear in the data format but will be in the result set from selecting the foreign stream.  The Extensible Common Data Adapter will supply the value for this column instead of parsing its value from the underlying data. This is used to supply meta information about what is being parsed and where it came from.

The Extensible Common Data Adapter Agent generates four special row columns when parsing files which provide information on how the row was derived from the file. You can declare any of these columns to make them part of a foreign stream or table.

Special Column

Meaning

PARSE_POSITION

Position in the file where this row originated.

PARSE_LIINE_NUMBER

Line number in file where this row originated.

PARSE_ERROR

Usually null. Otherwise provides an error message explaining what wrong when parsing this row

SOURCE_FILE

The file this row originated.