Transforming Data in s-Server

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems >

Transforming Data in s-Server

Previous pageReturn to chapter overviewNext page

The main way to transform data in s-Server is through SQL. See the SQL Reference guide for more detail. For more sophisticated transformations, you can define custom User Defined Transformations and User Defined Functions. This section discusses the creation of these.

UDXes operate in the middle of processing a stream. They use the JDBC API to exchange data with SQLstream s-Server while stream processing is underway. You call a UDX with one or more streaming arguments, and the UDX returns a streaming result. To use that resulting stream, downstream operations or listeners must SELECT from it. (Code sections appear in the topic Example Code for Implementing Plugins. For a full UDX code example, see the topic Table Lookup UDX in this guide.) UDXes are similar to SQL functions (normal UDFs). The difference is that UDXes return streaming results, whereas UDFs return scalar results.

You can download the SQLstream Software Development Kit API as a Javadoc here.

You can use both UDXes and UDFs to create custom windowed aggregation functions that aggregate values on a sliding window of rows over time. See Blending SQL and Custom Operators for more details.

You can use s-Server with machine learning systems such as SystemML and DataRobot. See Using s-Server for Machine Learning with Integrated Apache SystemML and Building a UDX with DataRobot.

Kalman Filter UDX. Allows customers to run Kalman filters on streams of sensor data. A Kalman filter is a technique for sharpening the measurements produced by blurry sensors.

Linear Interpolation UDX and Quadratic Interpolation UDX. These allow you to reconstruct missing rows.

Matched Filter UDX. Allows you to evaluate a template against a signal, giving a correlation coefficient for how close the match was at any point within the signal.

Parser UDX. Lets you call s-Server's predefined parsers in a function.

AesEncryptDecrypt UDX. Performs Advanced Encryption Standard (AES) Encryption compatible with the MySQL AES_ENCRYPT and AES_DECRYPT functions.

Group Rank UDX. Sorts groups of rows. (GroupRank.jar)

Table Lookup UDX. Looks up external DB entries which match one or more stream columns (stream-table JOIN) (TableLookup.jar)

XML Parse UDX. Parses XML column values into name=value pairs using XPath, streams values in named columns (XmlParse.jar).

URI Parse UDX. Parses decorated URI into name=value parameter pairs, streams values in named columns. (UriParse.jar)

GeoIPFunctions UDF. Lets you determine the country, region, city, latitude, and longitude associated with IP addresses worldwide, along with the postal code, and metro code for US IP addresses. It does so using the MaxMind GeoIp City database. See http://www.maxmind.com/en/city for more details.

See also the UDF/UDX descriptions, discussions, and examples in the topic SQLstream Software Development Kit (SDK) in this guide.

GoogleGeoCode UDF

User-defined Routines and Prepared Statement

When a query using one or more user-defined routines (UDFs, UDXes, etc.) will be executed many times, it should be made inside a PreparedStatement. This advice applies as well to any metadata query and to non-expedited Inserts. Multiple executions without using a PreparedStatement raises the possibility of leaking a class in PermGen space. Using a PreparedStatement for such a query thus minimizes the need to enlarge PermGen space, which is set by default at 64MB but can be changed using the

 -xx MaxPermSize=<size>

 

flag on the start-server command, where <size> can be 64m, 128m, 256m, etc. The "m" is necessary.

See also the topic JDBC Driver in this guide for discussions and links pertaining to SQLstream's StreamingPreparedStatement and StreamingStatement extensions to the standard PreparedStatement interface.

Control Streams

The TableLookup UDX implements stream-table join by looking up entries in an external table that match one or more key columns in the input stream. TableLookup supports a control stream with commands enabling you to suspend the stream-table join, flush the cache, and then resume the stream-table join. After resuming, the cache is repopulated with fresh values from the lookup table, some of which may have changed from the earlier state of the cache.