Reading from the File System

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Reading Data into s-Server > Reading from Other Sources  >

Reading from the File System

Previous pageReturn to chapter overviewNext page

You can read files from any accessible file system location using the Extensible Common Data Adapter (ECDA) or ECD Agent. Because the adapter and agent use inotify to discern changes in a file, both work only in conventional Linux directories. See the the note at the end of this topic.

All adapter or agent implementations involve configuring options. For adapters, you configure and launch the adapter in SQL, using either server or foreign stream/table options. For agents, you configure such options using a properties file and launch the agent at the command line. Many of the options for the ECD adapter and agent are common to all I/O systems. The CREATE FOREIGN STREAM topic in the Streaming SQL Reference Guide has a complete list of options for the ECD adapter.

To read from the file system, you need to create a server object which references the data wrapper ECDA and is of type 'FILE'.

CREATE OR REPLACE SERVER "FileReaderServer" TYPE 'FILE'

FOREIGN DATA WRAPPER ECDA;

 

Note: ECD Adapter server definitions need to reference the ECD foreign data wrapper. You can do so with the syntax FOREIGN DATA WRAPPER ECDA.

Unlike server objects, all foreign streams need to be created in a schema. The following code first creates a schema in which to run the rest of the sample code below, then creates a foreign stream named "FileReaderStream."

CREATE OR REPLACE SCHEMA "FileSource";

SET SCHEMA '"FileSource"';

 

CREATE OR REPLACE FOREIGN STREAM "FileReaderStream"

("recNo" INTEGER,

"ts" TIMESTAMP,

"accountNumber" INTEGER,

"loginSuccessful" BOOLEAN,

"sourceIP" VARCHAR(32),

"destIP" VARCHAR(32),

"customerId" INTEGER)

--Columns for the new stream

SERVER "FileReaderServer"

OPTIONS

(directory 'myDirectory',

--directory for the file

parser 'CSV',

filename_pattern 'myRecord\.csv',

--regex for filename pattern to look for

character_encoding 'UTF-8',

skip_header 'true');

 

Foreign Stream Options for the File System

Option

Description

DIRECTORY

Directory in which file resides or to which you are writing.

FILENAME_PATTERN

Input only. Java regular expression defining which files to read. See https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for more information on Java regular expressions.

STATIC_FILES

Defaults to false. When you set this to true, you indicate that no files will be added or changed after the file reader is started. File reader will exit after the last file is read. This lets you use the file reader as a foreign table, which is finite (as opposed to a foreign stream, which is infinite, and handles files that are continually written to).

Using the ECD Agent to Read from the File System

You can read data from remote locations using the Extensible Common Data Agent. See Reading Files in Remote Locations for more details.

For the file system, the ECD agent takes similar options, but these options need to be formatted in a properties file along the lines of the following. These properties correspond to those defined for the adapter above. A properties file configured to read files might look like the following:

DIRECTORY=/var/tmp

PARSER=CSV

FILENAME_PATTERN=myRecord\.csv

SKIP_HEADER=TRUE

CHARACTER_ENCODING=UTF-8

SCHEMA_NAME="FileSource"

TABLE_NAME="FileReaderStream"

ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)

 

Input Format

The code sample above uses CSV as a parser. To use other file options, see the Parser Types for Reading topic in this guide.

Note: The ECD file reader currently depends on inotify for efficiency. However inotify is not supported on certain file systems, most notably NFS.

If you use the ECD file reader to tail a file on an NFS file system, it will read up to the current high watermark in the file, but will NOT see any new data that is subsequently appended to the file.

To work around this, you will need to transfer files to a directory in the Linux file system on the same server as the ECDA agent or adapter using rsync --append (this implies

--partial and --inplace).