aggregate_functions.md 4.0 KB

Aggregate Functions

Aggregate functions perform a calculation on a set of values and return a single value. Aggregate functions can be used as expressions only in the following:

  • The select list of a SELECT statement (either a sub-query or an outer query).
  • A HAVING clause.

AVG

avg(col)

The average of the values in a group. The null values will be ignored.

COUNT

count(*)
count(col)

The number of items in a group. The null values will be ignored.

MAX

max(col)

The maximum value in a group. The null values will be ignored.

MIN

min(col)

The minimum value in a group. The null values will be ignored.

SUM

sum(col)

The sum of all the values in a group. The null values will be ignored.

COLLECT

collect(*)
collect(col1, col2 ...)

Returns an array with all columns or the whole record (when the parameter is *) values from the group.

Examples:

  • Get an array of column a of the current window. Assume the column a is of an int type, the result will be like: [{"r1":[32, 45]}] sql SELECT collect(a) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
  • Get the whole array of the current window. The result will be like: [{"r1":[{"a":32, "b":"hello"}, {"a":45, "b":"world"}]}]

    SELECT collect(*) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
    
  • Get the second element's column 'a' value within the current window. The result will be like: [{"r1":32}]

    SELECT collect(*)[1]->a as r1 FROM test GROUP BY TumblingWindow(ss, 10)
    

DEDUPLICATE

deduplicate(col, false)

Returns the deduplicate results in the group, usually a window. The first argument is the column as the key to deduplicate; the second argument is whether to return all items or just the latest item which is not duplicate. If the latest item is a duplicate, the sink will receive an empty map. Set the sink property omitIfEmpty to the sink to not triggering the action.

Examples:

  • Get the whole array of the current window which is deduplicated by column a. The result will be like: [{"r1":{"a":32, "b":"hello"}, {"a":45, "b":"world"}}] sql SELECT deduplicate(a, true) as r1 FROM test GROUP BY TumblingWindow(ss, 10)
  • Get the column a value which is not duplicate during the last hour. The result will be like: [{"r1":32}], [{"r1":45}] and [{}] if a duplicate value arrives. Use the omitIfEmpty sink property to filter out those empty results. sql SELECT deduplicate(a, false)->a as r1 FROM demo GROUP BY SlidingWindow(hh, 1)

STDDEV

stddev(col)

Returns the population standard deviation of expression in the group, usually a window. The argument is the column as the key to stddev.

STDDEVS

stddevs(col)

Returns the sample standard deviation of expression in the group, usually a window. The argument is the column as the key to stddevs.

VAR

var(col)

Returns the population variance (square of the population standard deviation) of expression in the group, usually a window. The argument is the column as the key to var.

VARS

vars(col)

Returns the sample variance (square of the sample standard deviation) of expression in the group, usually a window. The argument is the column as the key to vars.

PERCENTILE

percentile(col, percentile)

Returns the percentile value based on a continuous distribution of expression in the group, usually a window. The first argument is the column as the key to percentile. The second argument is the percentile of the value that you want to find. The percentile must be a constant between 0.0 and 1.0.

PERCENTILE_DISC

percentile_disc(col, percentile)

Returns the percentile value based on a discrete distribution of expression in the group, usually a window. The first argument is the column as the key to percentile_disc. The second argument is the percentile of the value that you want to find. The percentile must be a constant between 0.0 and 1.0.