Aggregate Functions

Aggregate functions refer to statistical calculations performed on a set of values, returning a single numerical result. This article first prepares a table, and the following function examples will use this table for calculation results.

f1	f2	f3
10	20	30
20	30	40
30	null	50
40	50	60
50	60	70
60	70	null

avg

Syntax	avg(N)
Parameter	Numeric type parameter, optional type: number
Return Type	Number
Description	Returns the average value of the specified numeric column
Example	avg({f1}), returns 35

corr

Syntax	corr(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Returns the correlation coefficient between two numeric columns
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(1000,1001)], corr({arg1}, {arg2}), returns 1

count

Syntax	count(ARG)
Parameter	The parameter to be calculated; optional types: any type
Return Type	Number
Description	Returns the number of values in the specified column
Example	count({f2}), returns: 5 (null values are not counted)

covar_pop

Syntax	covar_pop(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Returns the population covariance of two numeric columns
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(1000,1001)], covar_pop({arg1}, {arg2}) returns 138058.47222222222

covar_samp

Syntax	covar_samp(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Returns the sample covariance of two numeric columns
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(1000,1001)], covar_samp(arg1, arg2) returns 165670.16666666666

distinct_count

Syntax	distinct_count(ARG)
Parameter	The parameter to be calculated; supported types: any type
Return Type	Number
Description	Returns the number of unique (non-duplicate) results in the specified column
Example	distinct_count({f2}), returns: 5
Unsupported Data Sources	None

first

Syntax	first(ARG)
Parameter	The parameter to be calculated, optional type: ANY
Return Type	ANY
Description	Retrieves the value of the first record in the specified column; the value is not fixed
Example	first({f1}), returns: 10

last

Syntax	last(ARG)
Parameter	The parameter to be calculated; optional types: any type
Return Type	Any type
Description	Retrieves the value of the last record in the specified column. The value is not fixed.
Example	last({f1}), returns: 60

list_collect

Syntax	list_collect(ARG)
Parameter	The parameter to be calculated; optional types: any
Return Type	Array
Description	Aggregate function that collects the values of arg into an array. (The return format is an array type. If you see a null value, try converting it to a string to view it.)
Example	list_collect({f1}), returns: an array containing all values of the f1 field

list_collect_flatten

Syntax	list_collect_flatten(ARR)
Parameter	Array type parameter, optional type: array
Return Type	Array
Description	Aggregate function that collects and flattens the parameters of arr into an array. (The return format is an array. If you see a null value, try converting it to a string to view it.)
Example	list_collect_flatten({f1}), returns: extracts all elements from the f1 array and combines them into a new array.

max

Syntax	max(ARG)
Parameter	Parameter to be calculated, optional types: any type
Return Type	Any type
Description	Returns the maximum value of the specified numeric column
Example	max({f1}), returns: 60
Unsupported Data Sources	None

max_by

Syntax	max_by(ARG, COMPARE)
Parameter Description	1. Parameter to be calculated, optional types: any type; 2. Comparison value, optional types: any type
Return Type	Any type
Description	Returns the value of the arg column corresponding to the maximum value in the compare column. If there are multiple maximum values, one is selected at random.
Example	max_by({f1}, if(isnull({f2}), -1, {f2})), returns: 60

median

Syntax	median(ARG)
Parameter	Numeric parameter, optional type: number
Return Type	Number
Description	Returns the median of the specified numeric column
Example	median({f1}), returns: 35

min

Syntax	min(ARG)
Parameter	The parameter to be calculated; optional types: any type
Return Type	Any type
Description	Returns the minimum value of the specified numeric column
Example	min({f1}), returns: 10

min_by

Syntax	min_by(ARG, COMPARE)
Parameter Description	1. Parameter to be calculated, optional type: any type; 2. Comparison value, optional type: any type
Return Type	Any type
Description	Returns the value of the arg column corresponding to the row with the minimum value in the compare column. If there are multiple minimum values, one is selected at random.
Example	min_by({f1},{f2}), returns: 10

mode

Syntax	mode() within group(ORDER BY expr2 [ DESC ] )
Parameter Description	expr2: Expression used for sorting, optional types: ANY; DESC: Optional parameter, specifies descending order, default is ascending order
Return Type	ANY
Description	Returns the value that appears most frequently; if there is a tie, returns the first one
Example	For the data [1, 2, 2, 3, 3], mode() within group(ORDER BY value) returns: 2 (2 and 3 have the same frequency, returns the first 2)

nth

Syntax	nth(ARG, I)
Parameter Description	1. Parameter to be calculated, optional type: any type; 2. Integer parameter, optional type: number
Return Type	Any type
Description	Retrieves the i-th value
Example	nth({f1}, 2), returns: 30 (index starts from 1)

percentile

Syntax	percentile(ARG, P)
Parameter Description	1. Numeric type parameter, allowed type: number; 2. Decimal between 0 and 1, allowed type: number
Return Type	Number
Description	Calculates the percentile
Example	percentile({f1}, 50), returns: 30

regr_avgx

Syntax	regr_avgx(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Returns the average value of the independent variable (sum(ARG2)/N)
Example	regr_avgx({f1},{f2}), returns: 46

regr_avgy

Syntax	regr_avgy(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Returns the average value of the dependent variable (sum(ARG1)/N)
Example	regr_avgy({f1},{f2}), returns: 36

regr_count

Syntax	regr_count(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	The number of input rows where both expressions are not null
Example	regr_count({f1},{f2}), returns: 5

regr_intercept

Syntax	regr_intercept(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	The intercept of the linear equation of the least squares fit determined by the (arg2, arg1) pairs for arg1
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(1000,1)], regr_intercept(arg1, arg2) returns 665.666666666666

regr_r2

Syntax	regr_r2(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	Square of the correlation coefficient
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(10,1)], regr_r2({arg1}, {arg2}) = 0.06323185011709602

regr_slope

Syntax	regr_slope(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	The slope of the least-squares fit linear equation determined by the (arg2, arg1) pairs
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(10,1)], regr_slope(arg1, arg2) = -0.42857142857142855

regr_sxx

Syntax	regr_sxx(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	sum(arg2^2) - sum(arg2)^2/N (the "sum of squares" of the independent variable)
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(10,1)], regr_sxx(arg1, arg2) = 17.5

regr_sxy

Syntax	regr_sxy(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	sum(arg2 * arg1) - sum(arg2) * sum(arg1)/N (the "sum of products" of the independent variable and the dependent variable)
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(10,1)], regr_sxy(arg1, arg2) = -7.5

regr_syy

Syntax	regr_syy(ARG1, ARG2)
Parameter Description	1. ARG1, optional type: number; 2. ARG2, optional type: number
Return Type	Number
Description	sum(arg1^2) - sum(arg1)^2/N (the "sum of squares" of the dependent variable)
Example	(arg1, arg2) = [(1,2),(2,3),(3,4),(4,5),(5,6),(10,1)], regr_syy(arg1, arg2) = 50.833333333333336

set_collect

Syntax	set_collect(ARG)
Parameter	The parameter to be calculated; optional types: any type
Return Type	Array
Description	Aggregate function that collects the values of arg into a deduplicated array. (The return format is an array. If you see null displayed, try converting it to a string to view the result.)
Example	set_collect([1, 2, 2, 3]), returns: [1, 2, 3]

set_collect_flatten

Syntax	set_collect_flatten(ARR)
Parameter	Array type parameter, optional type: array
Return Type	Array
Description	Aggregate function that collects and flattens the parameters of arr into a deduplicated array. (The return format is an array. If you see a null value, try converting it to a string to view it.)
Example	set_collect_flatten([[1, 2], [2, 3]]) returns: [1, 2, 3]

stddev_pop

Syntax	stddev_pop(N)
Parameter	Numeric parameter, allowed type: number
Return Type	Number
Description	Returns the population standard deviation of a numeric column
Example	n = [(1000), (1001), (1003), (1004), (1005), (1000)], stddev_pop(n) = 1.9507833184532709

stddev_samp

Syntax	stddev_samp(N)
Parameter	Numeric parameter, optional type: number
Return Type	Number
Description	Returns the sample standard deviation of a numeric column
Example	n = [(1000),(1001),(1003),(1004),(1005),(1000)], stddev_samp(n) = 2.1369760566432809

string_agg

Syntax	string_agg(S, DELIMITER)
Parameter Description	1. S: The field to be concatenated, optional types: any type; 2. DELIMITER: The delimiter, optional type: string
Return Type	String
Description	Concatenates the field S into a string, separated by DELIMITER. For Oracle, the within group clause is required.
Example	string_agg(['a', 'b', 'c'], ','), returns: 'a,b,c'

sum

Syntax	sum(N)
Parameter	Numeric parameter, allowed type: number
Return Type	Number
Description	Returns the sum of the specified numeric column
Example	sum({f1}), returns: 210

var_pop

Syntax	var_pop(N)
Parameter	Numeric parameter, accepted type: number
Return Type	Number
Description	Returns the population variance of a numeric column
Example	n = [(1000), (1001), (1003), (1004), (1005), (1000)], var_pop(n) = 3.8055555555555554 (Note: The original example description was incorrect; this is the actual population variance)

var_samp

Syntax	var_samp(N)
Parameter	Numeric parameter, optional type: number
Return Type	Number
Description	Returns the sample variance of a numeric column (the square of the sample standard deviation)
Example	n = [(1000), (1001), (1003), (1004), (1005), (1000)], var_samp(n) = 4.5666666666666667

Adding Filter Conditions

It is supported to use the filter clause after an aggregate calculation expression to achieve filtering effects. The syntax is as follows, where filter_clause is the filter condition:

aggregate_name(expression ...) FILTER (WHERE filter_clause )

Example 1:

sum({f1}) FILTER (WHERE {f1} > 10)

This means summing the data where the f1 field is greater than 10.

Example 2:

avg({f1}) FILTER (WHERE {f1} > 10 AND {f2} = 'abc')

This means averaging the data where the f1 field is greater than 10 and the f2 field equals 'abc'.

Adding Sorting Clauses

Adding sorting clauses allows you to sort the aggregation results. The functions that support sorting clauses are: first, last, mode, percentile, string_agg. The syntax is as follows, where order_clause is the sorting condition.

aggregate_name(expression ...) within group(ORDER BY order_clause)

Example 1:

first({f1}) within group(ORDER BY {f2})

This means sorting the f1 field by the f2 field and taking the first value of the f1 field in each group.

Example 2:

last({f1}) within group(ORDER BY {f2})

This means sorting the f1 field by the f2 field and taking the last value of the f1 field in each group.

Example 3:

percentile({f1}, 0.5) within group(ORDER BY {f2})

This means sorting the f1 field by the f2 field and taking the value of the f1 field at the 50% position in each group.

User Manual

SDK

ChatBot

Integration

Data Reporting

Create Dataset

Dataset Management

Function List

Dashboard Creation

Chart Controls

Advanced Chart Calculations

Functionality

Display Controls

App Settings

Data Agent

Model Providers

Aggregate Functions ​

avg ​

corr ​

count ​

covar_pop ​

covar_samp ​

distinct_count ​

first ​

last ​

list_collect ​

list_collect_flatten ​

max ​

max_by ​

median ​

min ​

min_by ​

mode ​

nth ​

percentile ​

regr_avgx ​

regr_avgy ​

regr_count ​

regr_intercept ​

regr_r2 ​

regr_slope ​

regr_sxx ​

regr_sxy ​

regr_syy ​

set_collect ​

set_collect_flatten ​

stddev_pop ​

stddev_samp ​

string_agg ​

sum ​

var_pop ​

var_samp ​

Adding Filter Conditions ​

Adding Sorting Clauses ​