Connecting to a Data Source

<< Click to Display Table of Contents >>

Navigation:  Building Streaming Applications > Tutorial 2: Introduction to Coding with Streaming SQL > Tutorial 2B: Coding Streaming SQL in SQLstream sqlLine >

Connecting to a Data Source

Previous pageReturn to chapter overviewNext page

The first step in nearly all s-Server data analysis is to get data flowing into s-Server. As in previous tutorials, we're going to use a simulated log file located in the /tmp directory of the Linux machine running s-Server. This is the file written to by the Buses demo.

Start the Buses Demo

Open the SQLstream desktop folder and double-click the Stream Bus Data icon.

sbur_tutorial1_2

Once you click the application's icon, an information window opens.

sbur_information_bus_log

Keep this window open for the remainder of the tutorial. As long as this window is open, sample data from buses in the Sydney area flows into a file located at /tmp/buses.log at 50 rows per second. This simulates a log file being continually updated.

This file features data in the following categories:

Logical column in file

Category

Definition

1

id

Identification number for the bus.

2

reported_at

Time location was reported.

3

shift_no

Shift number for the bus's driver.

4

trip_no

Trip number for the bus.

5

route_variant_id

ID number for bus route.

6

waypoint_id

ID number for bus waypoint.

7

last_known_location_state

Location state.

8

lat

Latitude of location.

9

lon

Longitude of location.

10

speed

Reported speed of bus.

11

bearing

Navigational bearing for bus.

12

driver_no

Driver identification for number.

13

prescribed

The direction on the motorway, i.e. into Sydney or out of Sydney.

14

highway

Highway number, if available.

15

created_at

Timestamp for when this bus started reporting locations.

16

updated_at

Timestamp for when this bus last reported a location.

In the following exercise, you'll use SQL to set up a data source and analyze it.

Opening sqlLine

First, you'll need to open sqlLine.

To start sqlLine:

1.Open the desktop folder that installs with s-Server.

2.Double-click the sqlLine icon.

A terminal window opens. You can enter SQL at the command prompt.

sbur_sqlline_initial_launch

There are two main ways to run SQL in sqlLine:

1.By creating a SQL file and using the !run command. This command lets you access SQL scripts on your computer.

2.By directly typing SQL into the sqlLine console.

In the following tutorial, you will type SQL directly into the sqlLine console.

Create a SERVER Object

In the following steps, we're going to create a SERVER object. A server object is stored in the s-Server repository, meaning that once you create it, it will be available for reuse. Server objects contain all the information necessary to connect to a data source, which could be an external database,  a Kakfa or AMQP topic, a log file, a network feed, or any source for column-row pairs.

All SERVER objects use a data wrapper. Data wrappers provide access from within SQLstream s-Server to an external system. In this case, we'll use a built-in data wrapper called ECDA. This data wrapper uses SQLstream's Extensible Common Data Adapter. For more information on this adapter, see the topic Connecting to Other Sources in the Integration Guide.

In the OPTIONS clause for the server, you define specific information on how s-Server interacts with the data source. Options appear in the table below.

Option

Description

FORMAT_TYPE

File format to be processed by adapter. Possible values are CSV, XML, JSON, Custom

CHARACTER_ENCODING

Default: UTF-8

Any Java Supported Encoding. Use encoding implemented by java.nio, see:

http://download.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html?toc=0

SEPARATOR

Defines the characters used to delimit the columns in your data. Any character in the list will be used as a delimiter, including a tab. For example, after specifying SEPARATOR ',;:', every comma, semicolon, and colon in a log record starts a new column.

DIRECTORY

Directory in which file resides or to which you are writing.

FILENAME_PATTERN

Regular expression defining which files to read.

In this case, the SERVER object options contain all the information necessary to access a log file. Some of the important options include the following:

format_type 'CSV' tells s-Server to parse the log file as a comma-separated values file.

separator ',' tells s-Server to look for commas when separating the file into columns.

directory '/tmp' tells s-Server to look for the file in a directory called "/tmp" on the current machine.

filename_pattern 'buses\.log' tells s-Server to look for a file that matches the regular expression "buses\.log'". Here you  want "buses.log" exactly. In Java regular expressions, a period has a special meaning:"all characters." As a result, you need to preface the period with a backslash, which tells s-Server that you mean a period, and not all characters.

To create a SERVER object, enter the following code into the SQLline prompt:

 

CREATE OR REPLACE SERVER "BusFileReaderServer"

FOREIGN DATA WRAPPER ECDAWRAPPER

OPTIONS (classname 'com.sqlstream.aspen.namespace.common.FileSetColumnSet',

       parser 'CSV',

       character_encoding 'UTF-8',

       separator ',',

       skip_header 'false',

       directory '/tmp',

       filename_pattern 'buses\.log'

);

 

 

Press Enter.

You should receive a message along the following lines:

No rows affected (0.553 seconds)

 

The SERVER object is now available for use in s-Server.

sbur_sqlline_create_server

Next: Creating a Foreign Stream to Connect with a Log File