Customizing the Scrutinizer

<< Click to Display Table of Contents >>

Navigation:  Using StreamLab >

Customizing the Scrutinizer

Previous pageReturn to chapter overviewNext page

The StreamLab scrutinizer uses a series of Java jars that run on the s-Server WebAgent. You can customize the Scrutinizer by writing your own recognizers and split counters. You write these as Java interfaces, extending or implementing one of two Java interfaces:

SplitCounter. Split counters will try and split a cell value and return a count of how many times it can be split.

Recognizer. Recognizers are used to identify different kinds of formats, values and objects of a particular cell in a row.

The methods for both are spelled out in the Scrutinizer Javadoc, located at scrutinizer_api/index.html

Loading Jars into webAgent

When webAgent runs, it checks recognizer.properties and splitcounters.properties, then tries to load classes listed in these files into the instance.

To load a new jar into web agent, you need to:

1.Add the jar to recognizer.properties or splitcounters.properties.
2.Add jars to webAgent's classpath. You do so using the -cp command, along the following lines:

 webagent.sh -cp newSplitter.jar:newRecognizer.jar

 

Once you do so, StreamLab will begin scrutinizing files using the new scrutinizers.

Examples

We provide two examples below, one of a comma-based splitter (as well as BaseSplitCounter) and one of a recognizer for latitudes.

Splitter Example: CommaSVSplitCounter

The example below counts how many times a column can be split on commas.

package com.sqlstream.webagent.scrutinizer.autosplitter;

public class CommaSVSplitCounter extends BaseSplitCounter {

 

   @Override

   public int count() {

       return xSVcount(this.cell, '.');

   }

 

   @Override

   public String getShortName() {

       return "csv";

   }

 

   @Override

   public String getLongName() {

       return "Comma-Separated Values (CSV)";

   }

 

   @Override

   public String getType() {

       return "vclp";

   }

 

   @Override

   public String getTextTrue() {

       return "CSV (comma-separated values)";

   }

 

   @Override

   public String getSeparatorSQL() {

       return "','";

   }

 

   @Override

   public String getEscapeSQL() {

       return "u&'\\005C'";

   }

 

   @Override

   public String getQuoteSQL() {

       return "'\"'";

   }

 

}

 

BaseSplitter

The following is the code for BaseSplitter.

package com.sqlstream.webagent.scrutinizer.autosplitter;

 

/**

* BaseSplitCounter works as a helper class that provides support for future

* splitcounters.

*

* @author federico.wachs@sqlstream.com

*

*/

public abstract class BaseSplitCounter implements SplitCounter {

   protected String cell;

 

   public void _prepare(String cell) {

       this.cell = cell;

   }

 

   public void _clear() {

       this.cell = null;

   }

 

   protected int xSVcount(String str, char delim) {

       int n = str.length();

       if(n < 1) return 0;

 

       int count = 0;

       char quote = '"';  

       char esc = '\\';  

       boolean inQuote = false;

       boolean inEscape = false;

 

       for(int i=0; i < n; i++) {

           char c = str.charAt(i);

 

           if(inEscape)

               inEscape = false;

           else if(inQuote) {

               if(c == quote)

                   inQuote = false;

           } else {

               if(c == esc)

                   inEscape = true;

               else if(c == quote)

                   inQuote = true;

               else if(c == delim)

                   count++;

           }

       }

 

       return count+1;

   }

}

 

Recognizer Example: LatitudeRecognizer

The code below checks columns for a latitude pattern.

package com.sqlstream.webagent.scrutinizer.recognizer;

public class LatitudeRecognizer extends NumberRecognizer {

 

   @Override

   public boolean test() {

       return this.getFloat() != null && this.getFloat() >= -90

               && this.getFloat() <= 90;

   }

 

   @Override

   public String getShortName() {

       return "latitude";

   }

 

   @Override

   public String getLongName() {

       return "Column contains latitudes";

   }

 

   @Override

   public String getTextTrue() {

       return "contains latitudes";

   }

 

}