XML Parse UDX

<< Click to Display Table of Contents >>

Navigation:  Integrating Blaze with Other Systems > Transforming Data in s-Server >

XML Parse UDX

Previous pageReturn to chapter overviewNext page

The XMLParser UDX is a User Defined Transform (UDX) that uses an XPATH (the XML Path Language) definition to transform the contents of a single column into distinct columns for each attribute. The basic function is to match XML attributes to SQL column names, extract the data values, and place each attribute as an individual column value.

The inputs to the XMLParse UDX are an input stream supplied as a CURSOR argument, the name of a column to be parsed by the XMLParse, and the XPATH definition of the XML.

The XMLParse output is a new stream containing the individual attribute values as column values.

XMLParse.jar UDX file

The XMLParse UDX is implemented in a Java JAR file located in the $SQLSTREAM_HOME/plugin directory. As with other jar files, it is loaded with the CREATE JAR statement.

Note: $SQLSTREAM_HOME refers to the installation directory for s-Server, such as /opt/sqlstream/5.0.XXX.

 CREATE SCHEMA "test";

 SET SCHEMA '"test"';

 CREATE JAR "test"."XmlParse"

     LIBRARY 'file:plugin/XmlParse.jar'

     OPTIONS(0);

 

Alternately, you can create a link to the XMLParse UDX in the autocp directory.

cd $SQLSTREAM_HOME/plugin/autocp

ln -s ../XmlParse.jar

 

Creating a SQL Function to use the XMLParse UDX

The XMLParse input parameters specify the following information:

The set of input rows,
The column name of the column containing the XML data,
The file name of the XPATH configuration file. The location of this file is

 $SQLSTREAM_HOME/plugin/jndi/xmlparse/<configuration file name>.properties

 

The Table returned by the XMLParse UDX is a set of columns. Each column in the input stream named in the list of output Table columns receives the corresponding value passed through to the output. The column name of the column containing the XML data is parsed according to the XPATH configuration and any attributes corresponding to XML tags appear in the output for each column named in the list of output columns. XML tags that have no matching column names in the list of output Table columns are silently dropped. If the XML data has an array of repeating values, each repeating value will correspond to one UDX output row.

 CREATE FUNCTION "test"."ParseXml"

           (c CURSOR,

            parseColName VARCHAR(20),

            confName VARCHAR(20))

     RETURNS TABLE

     (

                 -- Input values passed to the output

                  "location" VARCHAR(30),

                  "city" VARCHAR(32),

                  "region" VARCHAR(32),

                 -- Output columns parsed from the XML

                  id INTEGER,

                  name VARCHAR(60),

                  title VARCHAR(60),

                  years VARCHAR(60)

   )

   LANGUAGE JAVA

   PARAMETER STYLE SYSTEM DEFINED JAVA

   NO SQL

   EXTERNAL NAME

     '"XmlParse":com.sqlstream.plugin.xmlparse.XmlParse.parseXml';

 

Calling the Function

The following code calls the UDX.

 CREATE STREAM "InputStream" (

 "location" VARCHAR(30),

 "city" VARCHAR(32),

 "region" VARCHAR(32),

 id int,

      XML_DATA VARCHAR(32000));

 SELECT STREAM *

     FROM TABLE("test"."ParseXml"(

         -- First parameter is input stream

            CURSOR(SELECT STREAM * FROM "InputStream"),

         -- Second parameter is the name of the column in the input data that

            contains the XML data

            'XML_DATA',

         -- Third parameter is the name of the XPath definition file.

            'xpath_defn'));

 

Example Properties File and Input

The content of the Properties file passed as the third parameter of the function above is written in XPath. XPath is used for selecting parts of an XML document and is defined by the World Wide Web Consortium, W3C.

An example of the content of the properties file which could be used for the above code would be:

 NAME.xpath=/testxml/row/name

 TITLE.xpath=@title

 YEARS.xpath=@years

 repeatingElement.xpath=/testxml/row/jobfunctions/function

Some Sample Input

 SET SCHEMA '"test"';

 INSERT INTO "InputStream" ("location", "city", "region", id, xml_data)

   VALUES ('US', 'Oakland', 'California', 12345,

    '<testxml><row><name>joe</name><jobfunctions><function title="truck driver" years="1"/>

     <function title="clerk" years="5"/></jobfunctions></row></testxml>');

 

Troubleshooting

Set the appropriate trace level in /var/log/sqlstream/Trace.propertiesTrace.properties for:

com.sqlstream.plugin.xmlparse.XmlParse.parseXml.level=<trace_level>

<trace_level>

Meaning/Usage

WARNING

Error Conditions which disturb successful processing

INFO

unused

FINE

Logs the mapping input columns and output columns

FINER

Logs the mapping between XML attributes and output columns

FINEST

unused

Constraints

The maximum size of the XML document is constrained by the maximum row and column size of 65536.