Writing to Hadoop/HDFS
The Hadoop ECDA adapter automatically uploads a file to a designated Hadoop file system location.
When you do so, the ECDA adapter formats the file and uploads it to the location designated by the HDFS_OUTPUT_DIR property.
This file can be formatted as CSV, XML, JSON, or BSON. See Writing to Files with the ECD Adapter and Agent for more details. These files do not require further formatting in order to be integrated with Hadoop. (Once in the Hadoop file system, these files will be accessed by second-level systems such as Hive. See https://cwiki.apache.org/confluence/display/Hive/Home for more details on Hive.)
Configuring the ECD Adapter for HDFS
To write data, you first define a server object with connection information, including the directory and information on file rotation. Once you define this server object, you can write to the file system by referencing it. See the topic CREATE SERVER in the s-Server Streaming SQL Reference Guide for more details.
For adapters, you configure and launch the adapter in SQL, using either server or foreign stream/table options. For agents, you configure such options using a properties file and launch the agent at the command line. Many of the options for the ECD adapter and agent are common to all I/O systems. The CREATE FOREIGN STREAM topic in the Streaming SQL Reference Guide has a complete list of options for the ECD adapter.
Note: Because of the nature of streaming data, you will need to set up a pump in order to move rows continually from an s-Server stream to another stream, file, Kafka topic, RDBMS table or other location. Pumps are INSERT macros that continually pass data from one point in a streaming pipeline to the other. A model for setting up a pump is provided below. See the topic CREATE PUMP in the s-Server Streaming SQL Reference Guide for more details.
Required parameters are:
To write data to HDFS, you need to create a server object which references the data wrapper ECDA and is of type 'hdfs'.
CREATE OR REPLACE SERVER "ECDAWriterServer_1" TYPE 'hdfs'
FOREIGN DATA WRAPPER ECDA;
Note: ECD Adapter server definitions need to reference the ECD foreign data wrapper. You can do so with the syntax FOREIGN DATA WRAPPER ECDA.
Unlike server objects, all foreign streams need to be created in a schema. The following code first creates a schema in which to run the rest of the sample code below, then creates a foreign stream named "hadoop_test"."bus_hdfs_sink".
CREATE or REPLACE FOREIGN STREAM "hadoop_test"."bus_hdfs_sink" (
"reported_at" TIMESTAMP NOT NULL,
Options for HDFS
You configure adapter options through foreign streams/tables. You configure agent options through the ECD agent property file.
User name for HDFS.
Address for name node of HDFS, such as hdfs://storm-s3.disruptivetech.com:8020/user/sqlstream/
Sample Property File for ECD Agent
Note: Before using the ECD agent, you need to create a source stream for it. In the below example, you would need to create the foreign stream "HADOOPWRITERSTREAM."
ROWTYPE=RecordType(INTEGER COL1, INTEGER COL2, VARCHAR(256) COL3, VARCHAR(256) COL4)
to invoke the agent
From the directory $SQLSTREAM_HOME/../clienttools/EcdaAgent/ enter the following at the command line
$ ./commondataagent.sh --output --props sample.test.properties --io hdfs