Group Rank UDX

<< Click to Display Table of Contents >>

Navigation:  Integrating Blaze with Other Systems > Transforming Data in s-Server >

Group Rank UDX

Previous pageReturn to chapter overviewNext page

This user-defined transfrom (UDX) applies a RANK() function to logical groups of rows and optionally delivers the group in sorted order

Applications of Group_Rank include the following:

To sort results of a streaming GROUP BY
To determine a relationship within the results of a group

Group Rank UDX can do the following actions:

Apply Rank to a specified input column.
Supply either sorted or non-sorted output.
Enable the user to specify a period of inactivity for data flush.

SQL Declarations

The functional attributes and DDL are described in the sections that follow.

Functional attributes for Group_Rank

This Group_Rank transform acts as follows:

Gathers rows until either a rowtime change is detected or a specified idle-time limit is exceeded.
Accepts any streaming rowset.
Uses any column with a basic SQL data type of INTEGER, CHAR, VARCHAR as the column by which to do the ranking
Orders the output rows either in the order received or in ascending or descending order of values in the selected column.

DDL for Group_Rank

  CREATE FUNCTION group_rank(c cursor, 

  rankByColumnName VARCHAR(128),

  rankOutColumnName VARCHAR(128), 

  sortOrder VARCHAR(10), 

 outputOrder VARCHAR(10),

  maxIdle INTEGER, 

 outputMax INTEGER)

  returns table(c.*, "groupRank" INTEGER)

  language java

  parameter style system defined java

  no sql

  external name 'class com.sqlstream.plugin.grouprank.GroupRank.group_rank';

Example of Group_Rank Use

  CREATE VIEW pageCounts1Min AS

    SELECT STREAM

       pageId, COUNT(*) AS hitCount

    FROM AccessStream AS S

    GROUP BY

       FLOOR(S.ROWTIME TO MINUTE), pageId;

    SELECT STREAM "pageId", "hitCount", "groupRank"

    FROM (TABLE(group_rank(CURSOR(SELECT STREAM "pageId", "hitCount"

       FROM pageCounts1Min),

         'hitCount', 'groupRank', 'desc', 'asc', 10, 5)));

Sample Input:

pageId hitCount

condo  51

auto  25

books  200

CDs  202

DVDs  1000

Games  500

Sample output:

pageId hitCount groupRank

DVDs  1000  1

Games  500  2

CDs  202  3

books  200  4

Condo  51  5

Operational Overview

Rows are buffered from the input cursor for each group, i.e., rows with the same rowtimes. Ranking of the rows is done either after the arrival of a row with a different rowtime (or when the idle timeout occurs). Rows continue to be read while ranking is performed on the group of rows with the same rowtime.

The outputMax parameter specifies the maximum number of rows to be returned for each group after ranks are assigned.

By default, group_rank supports column pass through, as the example illustrates by using c.* as the standard shortcut directing pass through of all input columns in the order presented. You can, instead, name a subset using the notation "c.columName", allowing you to reorder the columns. However, using specific column names ties the UDX to a very specific input set while using the c.* notation allows the UDX to handle any input set

The rankOutColumnName parameter specifies the output column used to return ranks. This column name must match the column name specified in the RETURNS clause of the CREATE FUNCTION statement.

UDX Parameters

The parameters to the UDX are:

c

CURSOR to streaming result set

rankByColumnName

String naming the column to use for ranking the group

rankOutColumnName

String naming the column to use for returning the rank.

This string must match the name of the groupRank column in RETURNS clause of the CREATE FUNCTION statement.

sortOrder

Controls ordering of rows for rank assignment.

Valid values are:

'asc' - Ascending based on rank.
'desc' - Descending based on the rank.

outputOrder

Controls ordering of output. Valid values are:

'asc' - Ascending based on rank.
'desc' - Descending based on the rank.

maxIdle

Time limit in milliseconds for holding a group for ranking.

When maxIdle expires the current group is released to the stream. A value of zero indicates no idle time out.

outputMax

Maximum number of rows the UDX will output in a given group.

A value of 0 indicates no limit.