Java Parameters

<< Click to Display Table of Contents >>

Navigation:  Administering SQLstream Blaze > Configuring s-Server >

Java Parameters

Previous pageReturn to chapter overviewNext page

Most of the SQLstream system is written in Java, so some of its behavior can be controlled using Java properties. You can set some parameter values for the server by using the file $SQLSTREAM_HOME/aspen.properties.

The parameters in this file are commented, and many of them are not implemented by default. Consult this file for a full list of parameters. This help topic describes a subset of parameters.

When writing code for SQLstream plugins, you can access the standard Java parameters through the class com.sqlstream.aspen.core.AspenProperties, which is an extension of the standard Java Properties class. The documentation for the AspenProperties class is included with the SQLstream Software Development Kit (SDK).

You should not edit the file aspen.properties itself. Rather, if you want to override or add a value, put that parameter into a file called aspen.custom.properties in the $SQLSTREAM_HOME/s-Server directory. This enables you to make your own set of changes without having to modify the standard parameters.

When the SQLstream s-Server starts up, it reads its Java parameters in the following order, with each successive source potentially overriding the preceding ones:

1.Parameters from aspen.properties (read only)

2.Parameters from aspen.config (read only)

3.Parameters from aspen.custom.properties

4.Java system parameters with names beginning with "aspen."

Entries in aspen.custom.properties can override the standard settings distributed in aspen.properties or aspen.config, so there is no need to modify those files.

If you need to change the parameters passed to the Java runtime, such as the class path or memory sizes, you will need to make those changes to the script $SQLSTREAM_HOME/bin/defineAspenRuntime.sh.

Parameter substitution

Java parameters can include other variables in their definitions. The syntax for these references is much like that used for variables in a shell script, a dollar sign followed by the name of the referenced variable inside curly braces, such as ${aspen.home.dir}. Undefined parameters are treated as empty strings.

Thus, you can define file paths relative to the SQLstream installation directory, for example:

my.file.name=${aspen.home.dir}/temp/myfile.name

 

Configuring JDBC Driver Connections from Other Hosts

The instructions in this section enable you to configure the SQLstream s-Server to accept JDBC driver connections from other hosts, even if the server is behind a NAT router.

SDP Requirements

The SQLstream JDBC driver connects to SQLstream s-Server using SDP. SDP requires that the hostnames match at both ends of a remote connection. That means that the server must have

An IP address reachable from client systems

A host name that resolves to that address for the client (either through DNS or an explicit host name mapping such as an entry in the client's /etc/hosts file)

Configuration files that use the resolvable host name or the explicit IP address

Here are the configuration requirements:

/etc/hosts

Many Linux systems will, by default, assign a system's host name to the loopback interface (IP address 127.0.0.1). For a server installation that other systems will connect to, you need to ensure that the host name is explicitly assigned to the external IP address:

127.0.0.1 localhost

a.b.c.d hostName.domain hostName

 

$SQLSTREAM_HOME/aspen.properties

The aspen.properties file needs to specify the host name of the server in a way that can be resolved by client systems or else use the IP address:

aspen.sdp.host=<hostName or a.b.c.d>

 

JDBC URI

The client system connects to the server via a URI that uses the host name (or IP address) as specified in aspen.properties:

jdbc:sqlstream:sdp://<hostName>:<port>, autoCommit=false

jdbc:sqlstream:sdp://<a.b.c.d>:<port>, autoCommit=false

 

The port specified in aspen.controlnode.url must match aspen.sdp.port.

Managing Memory Use

You can manage some aspects of memory use through the following parameters:

aspen.aggregate.maximumDeadBucketRatio

aspen.xo.buffersize

aspen.aggregate.maximumDeadBucketRatio

s-Server makes frequent use of SQL windows. The WINDOW clause is a “rolling” query over a given period of time, such as an hour, used to execute queries along the lines of “give me all orders within the past hour” or “give me all vehicle with speeds over 55 for the past five minutes." Because streaming data continually updates, windows are an important part of both querying and performing calculations on streaming data. Windows need to be aggregated in some way, using, for example, SELECT DISTINCT or GROUP BY.

For every logical grouping in a SELECT DISTINCT or GROUP BY statement, s-Server creates a "bucket." This might be all rows with a single column with the same value, such as "customer_id", or all rows with two column values that are the same, such as "customer_id" and "advertiser_id". Buckets use memory, but creating buckets requires processing power. Depending on your data and the way you've structured your GROUP BY statement, you could end up with either

1) a large number of buckets

or

2) a smaller number of buckets that are used over and over again

Buckets that are currently in use are "live"; buckets that are not being used are "dead." You can implement the aspen.aggregate.maximumDeadBucketRatio parameter to keep some "dead" buckets live, by setting a ratio between dead and live buckets.  By default, s-Server maintains a ratio 3.0, for example, which means "maintain a total number of dead buckets of 3x the number of live buckets."

You can change this setting using the aspen.aggregate.maximumDeadBucketRatio parameter, using the following guidelines:

Keeping more buckets "live" uses more memory, but can increase performance if you are reusing the same buckets over and over.  If performance is a problem, try increasing the number.

Keeping more buckets "dead" uses less memory, but can decrease performance if you are using the same buckets over and over. If memory is a problem, decrease the number (even to zero).

xo.buffersize

s-Server uses a scheduler to control when and in which order it performs tasks. Each task consumes one core. With systems that have lots of cores, the scheduler may cause a bottleneck. In these cases, increasing the buffer size will decrease the overhead for the scheduler, by increasing the maximum amount of work that an xo does.

This parameter controls the size of buffers used to transmit between xos. It defaults to 1M. If you have lots of cores, increasing this number may increase performance.

aspen.aggregate.useResultBuffering

This parameter controls whether the streaming aggregate xos will buffer their results so that they can start processing input before downstream xos have processed their output.

By default, this parameter is set to true, and keeping it set to true maintains increased throughput. If you are running low on memory, setting this parameter to false will cause s-Server to use less memory at the expense of throughput.

# aspen.aggregate.useResultBuffering=true

aspen.xo.useDoubleBuffering

This parameter controls whether or not to use double buffering between xos that that do not allocate their own buffers. By default, this is set to true, which allows for more parallelism at the expense of some scheduler overhead. In some cases, setting this to false might reduce performance overhead. This would involve a machine with few cores and/or an application with very long or very many pipelines. In these cases, you may not gain much from double buffering, and setting thei parameter to false may increase performance.