Writing to Other Destinations

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Writing Data Out of s-Server >

Writing to Other Destinations

Previous pageReturn to chapter overviewNext page

Show/Hide Hidden Text

Drawing on data in s-Server, the Extensible Common Data file writer writes a variety of file types into designated locations.

Writing to the File System
Writing from Sockets
Writing to Kafka
Writing to AMQP
Writing to Amazon Kinesis
Writing to IBM MQ
Writing to Mail Servers
Writing to Hadoop

To write to files with the Extensible Common Data Adapter, you set up an adapter instance through SQL. To write to files with the Extensible Common Data agent, you launch the agent from its directory at the command line, using the --output flag.

All adapter or agent implementations involve configuring options. For adapters, you configure and launch the adapter in SQL, using either server or foreign stream/table options. For agents, you configure such options using a properties file and launch the agent at the command line. Many of the options for the ECD adapter and agent are common to all I/O systems. The CREATE FOREIGN STREAM topic in the Streaming SQL Reference Guide has a complete list of options for the ECD adapter.

Option for Constructing the Output File Name

The following options are valid for all formats.

Option

Meaning

FILENAME_PREFIX

Prefix to be added to the file name.

FILENAME_DATE_FORMAT

Date format, if any, to be included in the file name.

5_2_indicator

Version 5.3 removes this feature and substitutes FILENAME_FORMAT as described below.

FILENAME_SUFFIX

Suffix to be added to the file name.

 

5_2_indicator

Version 5.3 Options

Option

Meaning

FILENAME_PREFIX

Prefix to be added to the file name.

<columnName>_FORMAT

If indicated, TIMESTAMP/DATE/TIME values are formatted using the specification described at:      

https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html. Brief summary of specification follows:

H

Two digit hour for 24-hour clock (use a leading zero when needed).

I

Two digit hour for 12-hour clock (use a leading zero when needed).

M

Two digit minute within hour (use a leading zero when needed).

S

Two digit second within minute (use a leading zero when needed, and use "60" is for leap seconds).

L

Three digit millisecond within second formatted (use leading zeroes when needed).

N

Nine digit nanosecond within second (use leading zeroes when needed).

s

Seconds since 1 January 1970 00:00:00 UTC. That is, Long.MIN_VALUE/1000 to Long.MAX_VALUE/1000.

Q

Milliseconds since 1 January 1970 00:00:00 UTC. That is, Long.MIN_VALUE to Long.MAX_VALUE.

Note: If <columnName>_FORMAT is not specified, then s-Server uses the default format for the data type. The formatter provides optimum performance for default formats. Having said that, if you know that your column matches the default formatting used by java.util.Formatter (see link above), it's best to leave <columnName>_FORMAT false, since the column will still be formatted.

s-Server ignores "<columnName>_FORMAT" for column types other than  TIMESTAMP/DATE/TIME

 

FILENAME_SUFFIX

Suffix to be added to the file name.

Options for Rotating Output Files

You can rotate output files either by specifying a maximum number of bytes or a maximum amount of time since the last file was written.

Option

Meaning

MAX_BYTES_PER_FILE

Maximum bytes before new file is written.

MAX_TIME_DELTA_PER_FILE

Time in milliseconds before new file is written, e.g. 60000 (one minute).

Writing Files with the Extensible Common Data Adapter

To write files with the Extensible Common Data Adapter, you need to take the following steps:

1.Designate a transport medium as a server option: file system, network socket, Kafka, AMQP, Websphere MQSeries, a mail server, or Hadoop.
2.Designate foreign stream or table options, including:
a.A location for the file.
b.Options for how the file will be written, such as prefixes, suffixes, and format-specific options.
c.Options for how the file will rotate.
3.Run a pump containing an INSERT statement against the stream. (This is when the file actually writes.)

Note: Because of the nature of streaming data, you will need to set up a pump in order to move rows continually from an s-Server stream to another stream, file, Kafka topic, RDBMS table or other location. Pumps are INSERT macros that continually pass data from one point in a streaming pipeline to the other. A model for setting up a pump is provided below. See the topic CREATE PUMP in the s-Server Streaming SQL Reference Guide for more details.

Writing Files with the Extensible Common Data Agent

1.Identify a stream in s-Server from which the agent will draw data to write files.
2.Navigate to the clienttools/EcdaAgent directory.

Note: The location of this directory depends on where you installed the client tools suite. In a default s-Server installation, it is above the $SQLSTREAM_HOME directory, and will be something like /opt/sqlstream/5.0.XXXX/clienttools/EcdaAgent.

3.Create a properties file options for the agent, such as example.properties, and save it. This file needs to include information such as directory, filename pattern, filename date format, filename prefix and suffix, as well as options for the origin stream, such as schema_name, table_name, and rowtype. The code below represents a sample property file that writes to a CSV formatted file with a prefix "MYPREFIX-" in a directory called /var/tmp, and draws from stream called BIDS in the schema SALES on the default s-Server defined for this agent.

# Location, date format, prefix, suffix

DIRECTORY=/var/tmp

FORMATTER=CSV

FILENAME_DATE_FORMAT=YYYY_MM_DD_HH_MM_SS

FILENAME_PREFIX=MYPREFIX-

FILENAME_SUFFIX=.TXT

CHARACTER_ENCODING=UTF-8

FORMATTER_INCLUDE_ROWTIME=FALSE

  # Rotation settings = 1 hour

MAX_TIME_DELTA_PER_FILE=3600000

# Schema, name, and parameter signature of origin stream

SCHEMA_NAME=SALES

TABLE_NAME=BIDS

ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)

 

4.Save the properties file in the /clienttools/EcdaAgent directory.
5.Determine the output format you will use. The ECD agent defaults to using the file system, but you can also configure it to write to files over a network socket, an amqp feed, to and from a kafka server, to and from an mqseries server, or to and from a hadoop file system. A list of options for the --io property is available at the topic Extensible Common Data Agent Overview.
6.Launch the agent with the --output property enabled and a --props property that references the properties file you just created:

./commondataagent.sh --output --io file --props example.properties

 

If you have configured everything correctly, the agent should begin writing data to the file location, and rotating files according to the settings in MAX_BYTES_PER_FILE or MAX_FILENAME_DELTA.

hmtoggle_plus1Fornatting CSV

When writing CSV files, the Extensible Common Data Adapter converts rows into character-separated output based on options supplied through the options sections of the CREATE FOREIGN STREAM statement. It converts streaming tuples into a character-separated file.

Note: For performance reasons, most formatting should be done in native SQL and passed into ECDA, not pushed into ECDA formatters.

To write a CSV file, you need to give the Extensible Common Data Adapter the following information:

A destination directory for the file.
A formatter of CSV
A character encoding type for the file.
How often the file will rotate in terms of bytes or time.

Option

Definition

WRITE_HEADER

Whether to write the column names into a header row. True or False.

CUSTOM_FORMATTER_<column_name>

Allows overriding of individual column's formatting. Specifies a fully qualified Java classname that

implements com.sqlstream.aspen.namespace.common.TypeFormatter<T> where T is the Java type that matches the SQL type of the column, such as java.lang.String to VARCHAR/CHAR and java.lang.Double to DOUBLE