Transforming Data in s-Server
The main way to transform data in s-Server is through SQL. See the SQL Reference guide for more detail. For more sophisticated transformations, you can define custom User Defined Transformations and User Defined Functions. This section discusses the creation of these.
UDXes operate in the middle of processing a stream. They use the JDBC API to exchange data with SQLstream s-Server while stream processing is underway. You call a UDX with one or more streaming arguments, and the UDX returns a streaming result. To use that resulting stream, downstream operations or listeners must SELECT from it. (Code sections appear in the topic Example Code for Implementing Plugins. For a full UDX code example, see the topic Table Lookup UDX in this guide.) UDXes are similar to SQL functions (normal UDFs). The difference is that UDXes return streaming results, whereas UDFs return scalar results.
You can download the SQLstream Software Development Kit API as a Javadoc here.
|•||AesEncryptDecrypt UDX. Performs Advanced Encryption Standard (AES) Encryption compatible with the MySQL AES_ENCRYPT and AES_DECRYPT functions.|
|•||Group Rank UDX. Sorts groups of rows. (GroupRank.jar)|
|•||Table Lookup UDX. Looks up external DB entries which match one or more stream columns (stream-table JOIN) (TableLookup.jar)|
|•||XML Parse UDX. Parses XML column values into name=value pairs using XPath, streams values in named columns (XmlParse.jar).|
|•||URI Parse UDX. Parses decorated URI into name=value parameter pairs, streams values in named columns. (UriParse.jar)|
|•||GeoIPFunctions UDF. Lets you determine the country, region, city, latitude, and longitude associated with IP addresses worldwide, along with the postal code, and metro code for US IP addresses. It does so using the MaxMind GeoIp City database. See http://www.maxmind.com/en/city for more details.|
See also the UDF/UDX descriptions, discussions, and examples in the topic SQLstream Software Development Kit (SDK) in this guide.
When a query using one or more user-defined routines (UDFs, UDXes, etc.) will be executed many times, it should be made inside a PreparedStatement. This advice applies as well to any metadata query and to non-expedited Inserts. Multiple executions without using a PreparedStatement raises the possibility of leaking a class in PermGen space. Using a PreparedStatement for such a query thus minimizes the need to enlarge PermGen space, which is set by default at 64MB but can be changed using the
flag on the start-server command, where <size> can be 64m, 128m, 256m, etc. The "m" is necessary.
See also the topic JDBC Driver in this guide for discussions and links pertaining to SQLstream's StreamingPreparedStatement and StreamingStatement extensions to the standard PreparedStatement interface.
The TableLookup UDX implements stream-table join by looking up entries in an external table that match one or more key columns in the input stream. TableLookup supports a control stream with commands enabling you to suspend the stream-table join, flush the cache, and then resume the stream-table join. After resuming, the cache is repopulated with fresh values from the lookup table, some of which may have changed from the earlier state of the cache.