Reading Avro

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Reading Data into s-Server > Reading from Other Sources  > Parser Types for Reading >

Reading Avro

Previous pageReturn to chapter overviewNext page

5_2_indicator Version 5.2 Feature

s-Server parses Apache Avro data similarly to the way it parses JSON data. (Avro schemas are defined with JSON.)

All columns are parsed out of hierarchical Avro records using a specified "PATH" for corresponding Avro fields in the Avro schema. This is the schema stored in all Avro payloads, which is used to parse each message. (Some sources, such as Kafka or AMQP, may break the Avro payload up into one or more serialized Avro records. s-Server handles this by allowing you to indicate that Avro schema may be separate from a given batch of Avro data.)

s-Server parses Avro messages into rows in a stream that you define. See the subtopic How s-Server Uses Paths to Parse JSON in Reading JSON for more details.

s-Server matches data types according to its Ingestion Rules for Avro.

Note: You can also input data in larger chunks and parse it later using the Parser UDX. This UDX calls the parsers listed above in a function. For more information on using functions, see the topic Transforming Data in s-Server in this guide.

Avro processing works like this:

1.s-Server receives a message from a data source such as the file system, kafka, or AMQP. Each message may contain one or more Avro records.

2.As with all data sources, s-Server reads the Avro records into a foreign stream. Each Avro record may get parsed into multiple rows of the stream.

3.s-Server parses these Avro records using an Avro path that you define for each column of the foreign stream. An Avro path describes the location of a field in an Avro record. Avro Paths are analogous to XPath expressions for XML files.

To implement the JSON parser, you create a foreign stream for a source, as described in the topics under Reading from Other Sources. Under stream or column options, you need to set the following stream options:

PARSER  This needs to be AVRO.

AVRO_SCHEMA_FILE This option can either be an HTTP URL to fetch the schema or it can be a path to a file on server host machine or VM.

SCHEMA_HEADER is a required option to indicate if the Avro schema is embedded in the Avro data. This option needs to be set to false for data sources like Kafka or AMQP, where each message can be one or more serialized Avro records without a schema.

ROW_PATH This is the Avro path for the row to be found. The Avro parser uses a row path to find Avro objects containing a row, then the path for each column is used to find the value to be extracted.

<COLUMN_NAME>_PATH Path for each column in the ECDA column set. This defaults to $..<COLUMN_NAME>. Here, a column named 'FOO' would have an option named FOO_PATH that defaulted to $..FOO which would return the first property named FOO under the Avro object found by the ROW_PATH.

5_2_indicator In s-Server 5.2, messages (typically in message queues such as Kafka) may be tagged with AVRO schema fingerprints. Those fingerprints will be matched with that for the AVRO_SCHEMA_FILE. Messages will be skipped if fingerprints don't match that of the AVRO_SCHEMA_FILE.

Example

CREATE OR REPLACE SERVER "AVROREADERSERVER" TYPE 'FILE'

FOREIGN DATA WRAPPER ECDA;

 

CREATE SCHEMA "AVROSCHEMA";

SET SCHEMA '"AVROSCHEMA"';

 

CREATE FOREIGN STREAM "AVROSTREAM" (

 slotname   varchar(32),

 slotname2  varchar(32)

)

SERVER "AVROREADERSERVER"

OPTIONS (

 PARSER 'AVRO',

 AVRO_SCHEMA_FILE '/schemas/2.1/fs.avsc',

 SCHEMA_HEADER 'false',

 ROW_PATH '$',

 SLOTNAME_PATH '$.slots[0:].slotName',

 SLOTNAME2_PATH '$.slots2[0:].slotName'

);

 

 

Ingestion Rules for Avro

Data type mappings between Avro types and SQL types.

Note: Arrays can be flattened. An array is flattened if any column path refers to field(s) of array elements using the <arrayname>[0:] prefix in the path. Only one array field with a ROW_PATH prefix can be flattened in the schema.

All other data types can be parsed as stringified VARCHAR values.

Avro source value

s-Server Data Type

All numeric types

All numeric SQL types

string/utf8

CHAR/VARCHAR

bytes/fixed

BINARY/VARBINARY or DECIMAL

Boolean

boolean

Long

TIMESTAMP (millisFromEpoch)

Int

TIME (millisFromMidnight)

Int

DATE (daysFromEpoch)