Adding a Streaming Data Source

<< Click to Display Table of Contents >>

Navigation:  Using StreamLab > StreamLab Sources Overview >

Adding a Streaming Data Source

Previous pageReturn to chapter overviewNext page

The Streaming Data Source option lets you select a source that you know is streaming. StreamLab will automatically determine the format for the source.

To add a Streaming Data source:

1.On the Sources page, drag a Streaming Data source from the left column into the center area.
2.Click the new Streaming Data source.
3.Select File System, Socket, AMQP, Kafka, or Kinesis.
4.Enter connection information for the input source. For example, to access a File source, you need to enter directory and filename pattern information for the file.

By default, StreamLab uses the project schema for the new source. If you wish to use a different schema, click the dropdown menu to the right of Schema.Stream. You can also choose a different name for the stream by clicking the dropdown menu that reads "data_1".

5.Click the Discover Format button. This feature examines the file to determine its file format. Currently, the Discovery parser can identify CSV, XML, and JSON files. StreamLab can also work with ProtoBuf files, but you need to add these as their own source.

The Discover Format dialog box opens. You can select an amount for the Discover Format feature to read in bytes and a timeout for the feature. See Troubleshooting Discovery below. In most cases, defaults should device.


6.Click Start. The Discover Format feature runs. The left section of the dialog box should display a format--either CSV, JSON, or XML.
7.Click Accept.
8.Select the indicated format under Format. You can also choose the Line format, which lets you access files line-by-line.
9.Next, fill in the list of columns and their SQL types. You can use the Clipboard to copy column names and types from another form.
10.Test the source by clicking the Sample 5 Rows from Source button.
11.Click the Go Up arrow to exit the Edit Source page.


Troubleshooting Discovery

The Sample Bytes field determines how many bytes Discovery reads before analyzing the input. This number can greatly affect Discovery's performance. If you set it too high and your data is coming in too slowly, you won't see any response from Discovery until it has read these bytes. If you set it too low and it's smaller than the size of a record in your input, Discovery will have difficulty determining your file's format. A good rule of thumb is to set Sample Bytes to about 5X the size of a record in your input, so that Discovery sees multiple records and can make a better guess as to the data types of the columns it finds. For example, if each record is 80 bytes, it would make sense to set Sample Bytes at 4096.