FileWriter Adapter

<< Click to Display Table of Contents >>

Navigation:  Integrating Blaze with Other Systems > Appendix A: Legacy Adapters and Agents >

FileWriter Adapter

Previous pageReturn to chapter overviewNext page

Note: This topic describes a legacy adapter. See the topics Reading from Other Sources and Writing to Other Destinations for current solutions.

The FileWriter Adapter converts input rows into character-separated output based on columnNames and options supplied through the options sections of the CREATE FOREIGN STREAM statement. It operates as a "sink" adapter, converting streaming tuples into an XML or character-separated file. Among its useful applications are the following:

Convert persistent streaming results to an XML, comma-delimited (CSV), pipe-delimited, or tab-delimited file.
Integrate streaming data with scripted applications (like PHP or Ruby) that are good at file handling and XML parsing but less good at interacting with SQLstream's JDBC driver.

FileWriter Features

The FileWriter Adapter can do the following:

Rotate output files based on configurable options (rotation mode, filename pattern, etc.).
Create output formats with fields delimited by commas, pipes, or tabs.
Create arbitrary XML output formats based on a Foreign Stream declaration.
Encapsulate non-XML-conformant data values within CDATA sections
Gracefully handle error conditions, such as read-only volume, permissions errors, or volume out of space, with trace log entries.

Filewriter Options

The FileWriter Adapter implements a standard SQL/MED "server" (called a "foreign server"). To establish the controls and parameters (as listed below) for the output you want, you specify options in the Foreign Stream declarations of the FileWriter server.

The order of the columns in the RowType of incoming tuples specified by the Foreign Stream also specifies the order and names of the tags (element or attribute) used in the output XML. If ROWTIME_COLUMN from Tab(sqlmdopts) is set to true, then the first column output will be ROWTIME.

The following list describes the FileWriter Adapter controls and parameters:
 

The path and filename of the output XML or character-delimited file. Some portion of the path may be inherited from the server declaration.
oThe filename supports a base name, e.g., "output", and a pattern that includes the ROWTIME, e.g., "output_[ROWTIME]" for log rotation.
SimpleDateFormat strings, inheritable from the server declaration, to use in converting ROWTIME to text
ofor filenames.
ofor tuple values.
Rotation modes, such as
oFile-size-based, e.g., rotate whenever current file more than 200K in size,
oSystem-time-based, e.g., rotate every hour or day, or
oROWTIME-based, e.g., rotate whenever the ROWTIME changes
Output type and data mode: either XML or character-delimited:
oIf output type is XML, then data mode is either "element" or "attribute".
oOR
oIf output type is Character-delimited,

then data mode is any one of "csv", "comma", "tab", or "pipe".

The name of the enclosing tag for each tuple's elements. This name may be inherited from the server declaration.
Whether or not the ROWTIME should be included in the output.

SQL/MED options

Option (case insensitive)

Description

Possible values

DATA_MODE

XML files only: Specifies whether to place column data in element attributes or as element data.

Default: element

attribute

(case insensitive)

DOCUMENT_TAG

XML files only: XML tag used for parent of row elements.

Default: Filewriter

[any string] - see example code below

ENTRY_TAG

XML files only: XML tag used for each row element.

Entry

[any string] – see example code below

FILENAME_ DATEFORMAT

Java time format string for date conversion used during file rotation.

yyyy-MM-dd_HH:mm:ss

Uses java SimpleDateFormat

FILENAME_BASE

Target name for output and basename for file rotation
Filewriter appends an underscore to the specified name.

No default

FILENAME_ROTATED

Pattern specifying how to rename file at rotation time.

[FNB] = filename_base described above,

[RT] = filename_dateformat described above.

[FNB]_[RT]

 
Example:
my-[FNB]_rotated-at_[RT]

FILEPATH

Directory where resultant files are located

No default

HEADER_ROW

Non-XML files only:
Include a header line in a character-separated value file.

true

 
Default: false

MAX_IDLE_TIME

Optional timeout in milliseconds to force file rotation:
 
If no data has been written during the specified time, then a flush to the existing or current output file is forced and a new file begins, using the specifications provided in the filename options.

[Integer value]

 

(If no time unit is specified, milliseconds is used.)

MAX_TIME_BETWEEN_FLUSH

Specifies number of seconds after which a flush is done (formerly after every row)

Default:  1-second

(If no time unit is specified, milliseconds is used.)

OUTPUT_ENCODING

Character set to use for file output, uses Canonical Name for java.nio API

Default: UTF-8

See Oracle's encoding documentation.

OUTPUT_TYPE

Type of file to generate

Default: xml

csv

tab

pipe

ROTATION_FILE_ROWS

Specifies maximum number of rows per file, after which a new rotation file is created

No default; any value can be used.

ROTATION_FILE_SIZE

Approximate file size to trigger ROTATION for FILE_SIZE rotation option (above);

 

Actual file sizes may be larger by the amount that can be written in the time specified by MAX_TIME_BETWEEN_FLUSH.

Default: 10m

k - kilobyte

m - megabyte

g - gigbyte

t - terabyte

no unit indicates bytes

Case insensitive

ROTATION_ROW_TIME

 

 

Time unit to trigger file rotation for rowtime rotation option, using  ROWTIME changes, not wall-clock/system time.

 

Also, if a rowtime bound is received that indicates that any next row would cause rotation to happen, then rotation is performed without actually waiting to receive such a row.

 

(Mutually exclusive with ROTATION_SYS_TIME;  you can use one or the other, but not both.)
 
 

 

Time units can be as shown in either Note A or B.

 

See Note C for examples of usage.

 

 

 

 

 

ROTATION_SYS_TIME

 

 

Time unit to trigger file rotation for system rotation (SYS_TIME) option, using wall-clock time, not ROWTIME changes.

 

(Mutually exclusive with ROTATION_ROW_TIME;  you can use one or the other, but not both.)

 

Time units can be as shown in either Note A or B.

 

See Note C for examples of usage.

 

 

ROWTIME_COLUMN

True/false flag indicates if ROWTIME should be included in row element

Default:  false

true

ROWTIME_FORMAT

Java date format string for ROWTIME element.

HH:mm:ss.SSS

(See Oracle's Simple Date Format description.)

Note A:  1h - hours, s - seconds, m - minutes, h - hours, d - days, w - weeks

Specifying no time unit indicates seconds s

Note B: MINUTE, HOUR, DAY, WEEK, MONTH

Note C:  

(1)  When a rotation time t less than one day is specified,  then every day at midnight, midnight + t, midnight + 2t, etc.,  there will be a rotation as long as at least one row was output during that period.

(2)  When the rotation time t specified is an integral multiple N of 1 day, rotation will always happen on midnight ever Nth day starting at Jan 1, 1900.
For example, Jan 1, 1900 happens to be a Monday - so if a rotation time of t=7d is specified, rotation will always happen Sunday/Monday, 12:00 in the evening/morning.

(3)  When the rotation time t specified is greater than 1d, but not an integral multiple of 1d, rotation will happen on regular intervals of size t starting at Jan 1,1900, with no adjustments for daylight savings etc.

Set-up Actions and Examples

The sequence of actions for setting up and using this plugin are described in the sections that follow:

Registering the plugin
Defining the server

Examples:

XML-related code and results
Character-delimited code and results

Registering the plugin (Foreign Data Wrapper)

The FileWriter Adapter is registered by default during the SQLstream installation process. It is present under the Plugins folder as FileWriter.jar. Use CREATE OR REPLACE FOREIGN DATA WRAPPER to install the FileWriter adapter, as in the following code.

CREATE OR REPLACE FOREIGN DATA WRAPPER "FileWriter"

   LIBRARY 'class com.sqlstream.plugin.filewriter.FileWriterControlPlugin'

   LANGUAGE java

   DESCRIPTION 'plugin to write rows to flat file';

Defining the Server

Define one Server for each set of files that use the same options, using the CREATE OR REPLACE SERVER command.

While each distinct file that is being written will have its own distinct foreign stream, distinct foreign servers are needed only when the server options must be different. In this example, the Filewriter is writing data out to a csv file, so the output_type is 'csv'.

CREATE OR REPLACE SERVER "FileWriterServer"

   FOREIGN DATA WRAPPER "FileWriter"

   OPTIONS (

       filepath '/usr/local/share/sqlstream/datalinks/output',

       output_type 'csv',

       output_encoding 'UTF-8')

   DESCRIPTION 'Server for CSV file writer';

 

The sections below illustrate code and results for XML and character-delimited examples.

In the examples that follow, these overrides apply:

The output_type 'xml' overrides the output_type defined above in the FileWriterServer.
The filepath '/var/log' overrides the filepath defined above in the FileWriterServer.

XML-related code and results

The sections below illustrate code and results for XML and character-delimited examples. Note that the output_type 'xml' below overrides the output_type defined in the FileWriterServer and the filepath '/var/log', overrides the filepath defined in the FileWriterServer.

XML-related code:

  CREATE FOREIGN STREAM "HourlyPageCounts" (

       "ip" VARCHAR(15),

       "cnt" INTEGER,

       "pageType" SMALLINT

       )

   SERVER "FileWriterServer"

   OPTIONS (

       filepath '/var/log',

       filename_base 'pageTypeHourlyCounts',

       filename_rotated 'pageTypeHourlyCounts_[RT]',

       filename_dateformat 'yyyy-MM-dd_HH:mm:ss',

       rotation 'rowtime',

       max_idle_time '20000',

       entry_tag 'HourlyCount',

       document_tag 'PageCounts',

       rowtime_column 'true',

       rowtime_format 'HH:mm:ss.SSS',

       data_mode 'element',

       output_type 'xml'

       )

   DESCRIPTION 'Hourly Page Counts';

XML results

With data_mode element, as above, the code above would produce a base file, /var/log/pageTypeHourlyCounts, and rotated files like /var/log/pageTypeHourlyCounts_2009-05-11_13:00:00. The rotated file would change whenever the input ROWTIME changed or when 20 seconds passed without a new input row. (Presumably the input rows would be pumped from a streaming GROUP BY producing periodic aggregates every hour.) The output_type specifies the form of the delivered output. Each input tuple would be formatted on output to produce the following XML fragment:

   <HourlyCount>

     <ROWTIME>13:00:00.000</ROWTIME>

     <ip>10.12.13.14</ip>

     <cnt>45</cnt>

     <pageType>2</pageType>

   </HourlyCount>

 

Tuple entries are enclosed within an optional document_tag, which for the element data_mode results in an XML doc like the following:

   <PageCounts>

     <HourlyCount>

       <ROWTIME>13:00:00.000</ROWTIME>

       <ip>10.12.13.14</ip>

       <cnt>45</cnt>

       <pageType>2</pageType>

     </HourlyCount>

     <HourlyCount>

       <ROWTIME>13:00:00.000</ROWTIME>

       <ip>10.12.13.14</ip>

       <cnt>91</cnt>

       <pageType>3</pageType>

     </HourlyCount>

     ...

     <HourlyCount>

       <ROWTIME>13:00:00.000</ROWTIME>

       <ip>10.12.13.16</ip>

       <cnt>23</cnt>

       <pageType>2</pageType>

     </HourlyCount>

   </PageCounts>

 

If data_mode attribute were specified instead, each input tuple would create output formatted like the following fragment:

   <HourlyCount ROWTIME="13:00:00.000" ip="10.12.13.14" cnt="45" pageType="2" />

 

For the attribute data_mode, the resulting XML doc looks like the following:

   <PageCounts>

     <HourlyCount ROWTIME="13:00:00.000" ip="10.12.13.14" cnt="45" pageType="2" />

     <HourlyCount ROWTIME="13:00:00.000" ip="10.12.13.14" cnt="91" pageType="3" />

     ...

     <HourlyCount ROWTIME="13:00:00.000" ip="10.12.13.16" cnt="23" pageType="2" />

   </PageCounts>

Character-delimited code and results

Character-delimited code:

  CREATE FOREIGN STREAM "HourlyPageCounts" (

       "ip" VARCHAR(15),

       "cnt" INTEGER,

       "pageType" SMALLINT

       )

   SERVER "FileWriterServer"

   OPTIONS (

       filepath '/var/log',

       filename_base 'pageTypeHourlyCounts',

       filename_rotated 'pageTypeHourlyCounts_[RT]',

       filename_dateformat 'yyyy-MM-dd_HH:mm:ss',

       rotation 'rowtime',

       max_idle_time '20000',

       entry_tag 'HourlyCount',

       document_tag 'PageCounts',

       rowtime_column 'true',

       rowtime_format 'HH:mm:ss.SSS',

       output_type 'csv'               -- changed from XML example code

       )

   DESCRIPTION 'Hourly Page Counts';

 

Regarding the above comment "-- changed from XML example code," see the output_type specified in the first example under XML-related code and results.

With output_type csv, as above, the code would produce a base file, /var/log/pageTypeHourlyCounts, and rotated files like /var/log/pageTypeHourlyCounts_2009-05-11_13:00:00. The rotated file would change whenever the input ROWTIME changed or when 20 seconds passed without a new input row. (Presumably the input rows would be pumped from a streaming GROUP BY producing periodic aggregates every hour.) The output_type specifies the form of the delivered output. Each input tuple would be formatted on output to produce the following comma-delimited fragment:

   13:00:00.000,10.12.13.14,45,2

 

If output_type pipe were specified instead of csv, each input tuple would create output formatted like the following fragment:

   13:00:00.000|10.12.13.14|45|2