Building a UDX with DataRobot

<< Click to Display Table of Contents >>

Navigation:  Integrating SQLstream Blaze with Other Systems > Transforming Data in s-Server >

Building a UDX with DataRobot

Previous pageReturn to chapter overviewNext page

5_2_indicator Version 5.2 Feature

 

You can use DataRobot's analytic models to build a UDX that you can call in s-Server. This model takes stream columns as input and outputs these columns plus a predictive column, such as "isspeeding" or "isagoodbuy".

In order to do so, you will need to:

1.Log on to app.datarobot.com. You will need credentials for the DataRobot site in order to do so.

2.Identify data that you can use to "train" a DataRobot model. This data should include one column that is predictive. A list of products, for example, might include a column predicting whether a given product is a good buy. The other columns might include information on price on this site, prices from competitor sites, customer ratings, and so on. This should be the same data that you will pass to the DataRobot UDX in s-Server, minus the predictive column. When you run the UDX, s-Server creates the predictive column as output.

3.Upload this data to app.datarobot.com and have DataRobot analyze it.

4.Find a data model that has been ported to Java from app.datarobot.com and download it.

Not all DataRobot models have been ported to Java.

Note: s-Server requires a DataRobot model that has been ported to Java in order to work with s-Server's processing speed.

You will need to do two things with this JAR:

1) Place it in $SQLSTREAM_HOME/lib

Note: $SQLSTREAM_HOME refers to the installation directory for s-Server, such as /opt/sqlstream/5.0.XXX/s-Server.

2) Reference it when you run .

You then write a UDX that incorporates the downloaded JAR. To pass the UDX columns, you reference UDX cursor parameters, which correspond to a Java argument of type java.sql.ResultSet and streaming UDX cursor parameters, which correspond to a Java argument of type com.sqlstream.jdbc.StreamingResultSet.

To write this UDX, you follow the same steps as in Writing a UDX.

When you compile, the compile line needs to reference the DataRobot Jar along with aspen.jar, JDBC, and SimpleUdx.Jar.

$ javac -cp  $SQLSTREAM_HOME/lib/SqlStreamJdbc_Complete.jar:$SQLSTREAM_HOME/lib/aspen.jar:$SQLSTREAM_HOME/lib/<mydatarobot>.jar

$ jar cvf  SimpleUdx.jar /path/to/SimpleUdx.class

 

In the DataRobot JAR, look for the long ID that follows com.datarobot.prediction. You can use a tool like Reflection to find it. The code you will need follows this id. In the actual UDX, you modify the sample block of this code that follows:

String modelName = "com.datarobot.prediction.dr595665fdc808913c829601be.DRModel";

Predictor model =

 (Predictor)Class.forName(modelName).newInstance();

ArrayList<String> doubleKeys =

 new ArrayList(Arrays.asList(model.get_double_predictors()));

       ArrayList<String> stringKeys =

 new ArrayList(Arrays.asList(model.get_string_predictors()));

 

// For debugging purpose only, to double check the list of column

// which are accepted by the model.

System.out.print(Arrays.toString(doubleKeys.toArray()));

System.out.print(Arrays.toString(stringKeys.toArray()));

 

Row r = new Row();

// The Row object stores String and double values separatelly.

r.d = new double[doubleKeys.size()];

 

r.s = new String[stringKeys.size()];

 

r.s[stringKeys.indexOf("DATE")] = "2013-11-07T06:20:48";

r.s[stringKeys.indexOf("COMMENT_ID")] = "LZQPQhLyRh80UYxNuaDWhIGQYNQ96IuCg-AYWqNPjpU";

r.s[stringKeys.indexOf("AUTHOR")] = "Julius NM";

r.s[stringKeys.indexOf("CONTENT")] = "Huh, anyway check out this you[tube] channel: testi02";

 

// Example of a double parameter (it is not used for this predictor).

//r.d[doubleKeys.indexOf("NUMBER")] = 123;

 

double score = model.score(r);

 

 

In the above code, you need to replace the following code with calls to the UDX resultset. (These should populate Row r).

r.s[stringKeys.indexOf("DATE")] = "2013-11-07T06:20:48";

r.s[stringKeys.indexOf("COMMENT_ID")] = "LZQPQhLyRh80UYxNuaDWhIGQYNQ96IuCg-AYWqNPjpU";

r.s[stringKeys.indexOf("AUTHOR")] = "Julius NM";

r.s[stringKeys.indexOf("CONTENT")] = "Huh, anyway check out this you[tube] channel: kobyoshi02";

 

Replace this block with calls to the resultset to get the row values. The result will be the same set of columns, plus a score column that represents the DataRobot prediction.