Aggregate Functions

<< Click to Display Table of Contents >>

Navigation:  Streaming SQL Reference Guide > Basic Building Blocks > Functions >

Aggregate Functions

Previous pageReturn to chapter overviewNext page

An aggregate function returns a result calculated not from a single row but rather aggregated from data contained in a finite set of rows, or from information about a finite set of rows. An aggregate function may appear in

the <selection list> portion of a SELECT clause,
an ORDER BY clause, or
a HAVING clause.

An aggregate function is different from an analytic function, which are always evaluated relative to a window that must be specified, and so cannot appear in a HAVING clause. Other differences are described in the next topic.

Aggregate functions operate slightly differently in aggregate queries on tables than when used in aggregate queries on streams:

If an aggregate query on tables contains a GROUP BY clause, then the aggregate function returns one result per group in the set of input rows. Lacking an explicit GROUP BY clause is equivalent to GROUP BY (), and returns only one result for the entire set of input rows.

On streams, an aggregate query must contain an explicit GROUP BY clause on a monotonically increasing expression based on rowtime. Without one, the sole group is the whole stream, which never ends, preventing any result from being reported. Adding a GROUP BY clause based on a monotonically increasing expression breaks the stream into finite sets of rows, contiguous in time, and each such set can then be aggregated and reported.

Whenever a row arrives that changes the value of the monotonically increasing grouping expression, a new group is started and the previous group is considered complete. The s-Server then outputs the value of the aggregate functions. Note that the GROUP BY clause may also include other non-monotonically-increasing expressions, in which case more than one result per set of rows may be produced.

Performing an aggregate query on streams is often referred to as streaming aggregation, as distinct from the 'windowed aggregation' discussed here. Both are discussed in the topic Application Design Considerations in the s-Server Concepts Guide. Stream-to-stream joins are discussed at Stream to Stream Joins; Time-based and row-based windows are discussed in the Glossary.

If an input row contains a null in a column used as an input to a data analysis function, the data analysis function ignores the row (except for COUNT).

Differences Between Aggregate and Analytic Functions

Function Type

Outputs

Rows or Windows Used

Notes

Aggregate Functions

One output row per group of input rows

All output columns are calculated over the same window or same group of rows

COUNT DISTINCT is not allowed in streaming aggregation; i.e., statements of the following type are not allowed:

SELECT COUNT(DISTINCT x) ... FROM ... GROUP BY ...

Analytic Functions

One output row for each input row.

Each output column may be calculated using a different window or partition

COUNT DISTINCT can be used as an analytic function and in windowed aggregation.

Streaming Aggregation and Rowtime bounds

Normally, an aggregate query generates a result when a row arrives that changes the value of the monotonically increasing expression in the GROUP BY. For example, if the query is grouped by FLOOR( rowtime TO MINUTE ), and the rowtime of the current row is 9:59.30, then a new row with a rowtime of 10:00.00 will trigger the result.

Alternately, a rowtime bound can be used to advance the monotonically increasing expression and enable the query to return a result. For example, if the query is grouped by FLOOR( rowtime TO MINUTE ), and the rowtime of the current row is 9:59.30, then an incoming rowtime bound of 10:00.00 will also trigger the result.

See below for examples of rowtime bounds in streaming aggregation, and the Concepts Guide for more information on rowtime bounds in general.

Aggregate function list

SQLstream s-Server supports the following aggregate functions:

AVG
COUNT
MAX
MIN
SUM
STDDEV_POP
STDDEV_SAMP
VAR_POP
VAR_SAMP

See also the Statistical Variance and Deviation Functions.

Examples: Aggregate queries on tables

Assume the following data in table SALES.EMPS:

NAME

DEPTNO

AGE

GENDER

Amy

10

30

F

Beth

20

25

F

Charles

30

40

M

David

10

20

M

Eric

10

40

M

Fiona

30

 

 

Gregory

40

60

M

Harriet

40

55

F

 

The following SQL uses the AVG aggregate function as part of a query to find the average age of all employees:

SELECT

   AVG(AGE) AS AVERAGE_AGE

FROM SALES.EMPS;

 

Result:

AVERAGE_AGE

38

 

To find the average age of employees in each department, we can add an explicit GROUP BY clause to the query:

SELECT

   DEPTNO,

   AVG(AGE) AS AVERAGE_AGE

FROM SALES.EMPS

GROUP BY DEPTNO;

Returns:

DEPTNO

AVERAGE_AGE

10

30

20

25

30

40

40

57

Examples: Aggregate Queries on Streams (Streaming Aggregation)

Assume the following information flowing through the stream WEATHERSTREAM:

ROWTIME

CITY

TEMP

2018-11-01 01:00:00.0

Denver

29

2018-11-01 01:00:00.0

Anchorage

2

2018-11-01 06:00:00.0

Miami

65

2018-11-01 07:00:00.0

Denver

32

2018-11-01 09:00:00.0

Anchorage

9

2018-11-01 13:00:00.0

Denver

50

2018-11-01 17:00:00.0

Anchorage

10

2018-11-01 18:00:00.0

Miami

71

2018-11-01 19:00:00.0

Denver

43

2018-11-02 01:00:00.0

Anchorage

4

2018-11-02 01:00:00.0

Denver

39

2018-11-02 07:00:00.0

Denver

46

2018-11-02 09:00:00.0

Anchorage

3

2018-11-02 13:00:00.0

Denver

56

2018-11-02 17:00:00.0

Anchorage

2

2018-11-02 19:00:00.0

Denver

50

2018-11-03 01:00:00.0

Denver

36

2018-11-03 01:00:00.0

Anchorage

1

 

If you want to find the minimum and maximum temperature recorded anywhere each day (globally regardless of city), the minimum and maximum temperature can be calculated using the aggregate functions MIN and MAX respectively. To indicate that we want this information on a per-day basis (and to provide a monotonically increasing expression as the argument of the GROUP BY clause), we use the FLOOR function to round each row's rowtime down to the nearest day:

SELECT STREAM

   FLOOR(WEATHERSTREAM.ROWTIME to DAY) AS FLOOR_DAY,

   MIN(TEMP) AS MIN_TEMP,

   MAX(TEMP) AS MAX_TEMP

FROM WEATHERSTREAM

GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY);

 

The result of the aggregate query is:

FLOOR_DAY

MIN_TEMP

MAX_TEMP

2018-11-01 00:00:00.0

2

71

2018-11-02 00:00:00.0

2

56

 

Note that there is no row for 2018-11-03, even though the example data does include temperature measurements on that day. This is because the rows for 2018-11-03 cannot be aggregated until all rows for that day are known to have arrived, and that will only happen when either a row with a rowtime of 2018-11-04 00:00:00.0 (or later) or a rowtime bound of 2018-11-04 00:00:00.0 (or later) arrives. If and when either did arrive, the next result would be:

FLOOR_DAY

MIN_TEMP

MAX_TEMP

2018-11-03 00:00:00.0

1

36

 

Let's say that instead of finding the global minimum and maximum temperatures each day, we want to find the minimum, maximum, and average temperature for each city each day. To do this, we use the SUM and COUNT aggregate functions to compute the average, and add CITY to the GROUP BY clause:

SELECT STREAM

   FLOOR(WEATHERSTREAM.ROWTIME TO DAY) AS FLOOR_DAY,

   CITY,

   MIN(TEMP) AS MIN_TEMP,

   MAX(TEMP) AS MAX_TEMP,

   SUM(TEMP)/COUNT(TEMP) AS AVG_TEMP

FROM WEATHERSTREAM

GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY), CITY;

 

The result of the aggregate query is:

FLOOR_DAY

CITY

MIN_TEMP

MAX_TEMP

AVG_TEMP

2018-11-01 00:00:00.0

Anchorage

2

10

7

2018-11-01 00:00:00.0

Denver

29

50

38

2018-11-01 00:00:00.0

Miami

65

71

68

2018-11-02 00:00:00.0

Anchorage

2

4

3

2018-11-02 00:00:00.0

Denver

39

56

47

 

In this case, the arrival of rows for a new day's temperature measurements triggers the aggregation of the previous day's data, grouped by CITY, which then results in one row being produced per city included in the day's measurements.

Here again, a rowtime bound 2018-11-04 00:00:00.0 could be used to prompt a result for 2018-11-03 prior to any actual measurements for 2018-11-04 coming in:

FLOOR_DAY

CITY

MIN_TEMP

MAX_TEMP

AVG_TEMP

2018-11-03 00:00:00.0

Anchorage

1

1

1

2018-11-03 00:00:00.0

Denver

36

36

36