Aggregate functions perform a calculation on a set of values and return a single value. Aggregate functions can be used as expressions only in the following:
avg(col)
The average of the values in a group. The null values will be ignored.
count(*)
count(col)
The number of items in a group. The null values will be ignored.
max(col)
The maximum value in a group. The null values will be ignored.
min(col)
The minimum value in a group. The null values will be ignored.
sum(col)
The sum of all the values in a group. The null values will be ignored.
collect(*)
collect(col1, col2 ...)
Returns an array with all columns or the whole record (when the parameter is *) values from the group.
Examples:
a
of the current window. Assume the column a
is of an int type, the result will be
like: [{"r1":[32, 45]}]
sql
SELECT collect(a) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
Get the whole array of the current window. The result will be
like: [{"r1":{"a":32, "b":"hello"}, {"a":45, "b":"world"}}]
SELECT collect(*) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
Get the second element's column 'a' value within the current window. The result will be like: [{"r1":32}]
SELECT collect(*)[1]->a as r1 FROM test GROUP BY TumblingWindow(ss, 10)
deduplicate(col, false)
Returns the deduplicate results in the group, usually a window. The first argument is the column as the key to deduplicate; the second argument is whether to return all items or just the latest item which is not duplicate. If the latest item is a duplicate, the sink will receive an empty map. Set the sink property omitIfEmpty to the sink to not triggering the action.
Examples:
a
. The result will be
like: [{"r1":{"a":32, "b":"hello"}, {"a":45, "b":"world"}}]
sql
SELECT deduplicate(a, true) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
a
value which is not duplicate during the last hour. The result will be
like: [{"r1":32}]
, [{"r1":45}]
and [{}]
if a duplicate value arrives. Use the omitIfEmpty sink property to
filter out those empty results.
sql
SELECT deduplicate(a, false)->a as r1 FROM demo GROUP BY SlidingWindow(hh, 1)
stddev(col)
Returns the population standard deviation of expression in the group, usually a window. The argument is the column as the key to stddev.
stddevs(col)
Returns the sample standard deviation of expression in the group, usually a window. The argument is the column as the key to stddevs.
var(col)
Returns the population variance (square of the population standard deviation) of expression in the group, usually a window. The argument is the column as the key to var.
vars(col)
Returns the sample variance (square of the sample standard deviation) of expression in the group, usually a window. The argument is the column as the key to vars.
percentile(col, percentile)
Returns the percentile value based on a continuous distribution of expression in the group, usually a window. The first argument is the column as the key to percentile. The second argument is the percentile of the value that you want to find. The percentile must be a constant between 0.0 and 1.0.
percentile_disc(col, percentile)
Returns the percentile value based on a discrete distribution of expression in the group, usually a window. The first argument is the column as the key to percentile_disc. The second argument is the percentile of the value that you want to find. The percentile must be a constant between 0.0 and 1.0.