GROUP BY clause
For example, GROUP BY < column name-or-expression, ... >
|•||the expression can be an aggregate and|
|•||any column name used in the GROUP BY clause must also be in the SELECT statement.|
Additionally, a column that is not named in or derivable from the GROUP BY clause cannot appear in the SELECT statement except within aggregations, such as SUM (allOrdersValue).
What derivable means is that a column specified in the GROUP BY clause enables access to the column you want to include in the SELECT clause. If a column is derivable, the SELECT statement can specify it even though it is not explicitly named in the GROUP BY clause.
Example: If the key to a table is in the GROUP BY clause, then any of that table's columns can can appear in the select-list because, given that key, such columns are considered accessible.
The GROUP BY clause groups selected rows based on the value of the grouping expressions, returning a single summary row of information for each group of rows that have identical values in all columns.
Note that for these purposes, the value NULL is considered equal to itself and not equal to any other value. These are the same semantics as for the IS NOT DISTINCT FROM operator.
Here is a non-streaming example:
SELECT orderid, sum(amount)
GROUP BY orderid
A query that has a GROUP BY or HAVING clause, or has group aggregate expressions in its SELECT or ORDER BY clause, is said to be an aggregate query. A group aggregate expression is a call to a group aggregate function, such as SUM, COUNT or MIN, with no OVER clause. (A call to an aggregate function with an OVER clause is called a window aggregate expression, and is a different matter.)
Group aggregate expressions apply a specified function to the same expression in every row of a group and return a single value. For example, the following non-streaming query groups employees into departments and returns a summary row for each department. The totals in that row are derived from the data on employees in that particular department.
SELECT deptno, SUM(sal), SUM(sal + bonus), COUNT(*)
GROUP BY deptno
The form GROUP BY () aggregates on zero grouping expressions, producing a single grand total for all rows. It is not valid for streaming queries, since there is no monotonically increasing or time-based expression limiting the rows in the group. If a query has a HAVING clause or aggregate expressions but no GROUP BY clause, GROUP BY () is implied.
GROUP BY can be used in a streaming query as long as one of the grouping expressions is a non-constant monotonically increasing or time-based expression. This requirement is necessary in order for SQLstream s-Server to make progress, as explained below.
A monotonically increasing expression is one that always moves in the same direction: it either ascends-or-stays-the-same, or it descends-or-stays the same; it doesn't reverse direction. It does not need to be strictly ascending or strictly descending, that is, every value always above the previous one or every value always below the previous one. A constant expression falls under the definition of monotonic--it is technically ascending--but is clearly unsuitable for these purposes. For more information about monotonicity, see Monotonically Increasing Expressions and Operators.
Consider the following query:
SELECT STREAM prodId, COUNT(*)
GROUP BY prodId
The query is intended to compute the number of orders for each product, as a stream. However, since Orders is an infinite stream, SQLstream s-Server can never know that it has seen all orders for a given product, can never complete a particular row's total, and therefore can never output a row. Rather than allow a query that can never emit a row, SQLstream s-Server's validator rejects the query.
The syntax for streaming GROUP BY is as follows:
GROUP BY <monotonically increasing or time-based expression> ,
<column name-or-expression, ...>
where any column name used in the GROUP BY needs to be in the SELECT statement; the expression can be an aggregate. Additionally, a column name that does not appear in the GROUP BY cannot appear in the SELECT statement except within aggregations, or if, as above, access to the column is derivable from a column that is specified in the GROUP BY.
For example, the following query, which computes the product counts per hour, uses the monotonically increasing expression FLOOR(Orders.ROWTIME TO HOUR) is therefore valid:
SELECT STREAM FLOOR(Orders.ROWTIME TO HOUR) AS theHour, prodId, COUNT(*)
GROUP BY FLOOR(Orders.ROWTIME TO HOUR), prodId
One of the expressions in the GROUP BY must be monotonically increasing or time-based. For example GROUP BY FLOOR(S.ROWTIME) TO HOUR will yield one output row per hour for the previous hour's input rows. The GROUP BY can specify additional partitioning terms. For example, GROUP BY FLOOR(S.ROWTIME) TO HOUR, USERID will yield one output row per hour per USERID value. If you know for a fact that an expression is monotonic, you can declare it so by using the MONOTONIC function. If the actual data are not monotonic, the resulting system behavior is indeterminate: results may not be as expected or desired.
See the topic MONOTONIC function in this guide for more details.
Duplicate rowtimes can occur in a stream, and as long as the ROWTIME value is the same, the GROUP BY operation will keep accumulating rows. In order to emit a row, the ROWTIME value has to change at some point.