Window Functions
The syntax for window functions:
function_name([expression [, expression ... ]]) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
- [ PARTITION BY expr1 ] Optional parameter. PARTITION BY groups the queried data, and the window function processes each group independently.
- [ ORDER BY expr2 [ DESC ] ] Optional parameter. Sorts by expr2.
- [ DESC ] Optional parameter. By default, sorting is in ascending order. Using the DESC parameter will sort the results in descending order.
In this introduction, the following table will be used as sample data for illustration.
id | zh_name | director | prime_genre | runtime | rate_num | votes | stars | pubdate |
---|---|---|---|---|---|---|---|---|
1 | Interstellar | Christopher Nolan | Drama | 169 | 7.75 | 416252 | Matthew McConaughey | 2014-11-12 |
2 | Schindler's List | Steven Spielberg | Drama | 195 | 7.97 | 329749 | Liam Neeson | 1993-11-30 |
3 | Flirting Scholar | Lee Lik-Chi | Comedy | 102 | 4.25 | 260928 | Stephen Chow | 1993-07-01 |
4 | Identity | James Mangold | Drama | 90 | 9.57 | 271970 | John Cusack | 2003-04-25 |
5 | The Lord of the Rings: The Return of the King | Peter Jackson | Drama | 201 | 5.24 | 255167 | Viggo Mortensen | 2004-03-12 |
6 | City Lights | Charlie Chaplin | Drama | 87 | 0.11 | 34216 | Charlie Chaplin | 1931-01-30 |
7 | The Man from Earth | Richard Schenkman | Drama | 87 | 4.15 | 147835 | David Lee Smith | 2007-11-13 |
8 | The Gods Must Be Crazy | Jamie Uys | Comedy | 109 | 2.67 | 83463 | Marius Weyers | 1980-09-10 |
9 | A Chinese Odyssey Part Two: Cinderella | Jeffrey Lau | Comedy | 95 | 3.91 | 395680 | Stephen Chow | 2014-10-24 |
10 | The Chorus | Christophe Barratier | Drama | 97 | 9.92 | 396189 | Gérard Jugnot | 2004-10-16 |
dense_rank
Syntax | dense_rank() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] ) |
---|---|
Parameter Description | 1. expr1: Optional parameter, expression used for partitioning; allowed types: any type; 2. expr2: Optional parameter, expression used for sorting; allowed types: any type; 3. DESC: Optional parameter, specifies descending order, default is ascending order |
Return Type | Number |
Description | Returns the rank of the current window (without gaps), counting starts from 1. The difference between dense_rank and rank is that when there are duplicate ranks, dense_rank assigns the next consecutive rank, while rank skips the numbers that would have been assigned to the duplicates. |
- Example
dense_rank() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)
The result is as follows:
runtime | prime_genre | value |
---|---|---|
169 | Drama | 4 |
195 | Drama | 5 |
102 | Comedy | 2 |
90 | Drama | 2 |
201 | Drama | 6 |
87 | Drama | 1 |
87 | Drama | 1 |
109 | Comedy | 3 |
95 | Comedy | 1 |
97 | Drama | 3 |
lag
Syntax | lag(ARG, I) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] ) |
---|---|
Parameter Description | 1. ARG: Window frame, optional type: any type; 2. I: Specifies the distance between the record and the current record, optional type: number; 3. expr1: Optional parameter, expression for partitioning, optional type: any type; 4. expr2: Optional parameter, expression for sorting, optional type: any type; 5. DESC: Optional parameter, specifies descending order, default is ascending order |
Return Type | Any type |
Description | Window function that retrieves the arg data of the record at a specified distance i before the current record |
- Example
lag({zh_name}, 2) OVER (ORDER BY {runtime} ASC)
The result is as follows:
zh_name | runtime | value |
---|---|---|
Interstellar | 169 | Flirting Scholar |
Schindler's List | 195 | The Gods Must Be Crazy |
Flirting Scholar | 102 | A Chinese Odyssey Part II |
Identity | 90 | City Lights |
The Lord of the Rings: The Return of the King | 201 | Interstellar |
City Lights | 87 | null |
The Man from Earth | 87 | null |
The Gods Must Be Crazy | 109 | The Chorus |
A Chinese Odyssey Part II | 95 | The Man from Earth |
The Chorus | 97 | Identity |
lead
Syntax | lead(ARG, I) |
---|---|
Parameter Description | 1. Parameter to be calculated, optional type: any type; 2. Integer parameter, optional type: number |
Return Type | Any type |
Description | Window function that retrieves the value of arg from the record that is i positions after the current record |
- Example
lead({zh_name}, 2) OVER (ORDER BY {runtime} ASC)
The result is as follows:
zh_name | runtime | value |
---|---|---|
Interstellar | 169 | The Lord of the Rings: The Return of the King |
Schindler's List | 195 | null |
Flirting Scholar | 102 | Interstellar |
Identity | 90 | The Chorus |
The Lord of the Rings: The Return of the King | 201 | null |
City Lights | 87 | Identity |
The Man from Earth | 87 | A Chinese Odyssey Part Two: Cinderella |
The Gods Must Be Crazy | 109 | Schindler's List |
A Chinese Odyssey Part Two: Cinderella | 95 | Flirting Scholar |
The Chorus | 97 | The Gods Must Be Crazy |
nth_value
Syntax | nth_value(ARG, I) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] ) |
---|---|
Parameter Description | 1. ARG: Window frame, optional type: any type; 2. I: The row number of the data you want to retrieve, optional type: number; 3. expr1: Optional parameter, expression used for partitioning, optional type: any type; 4. expr2: Optional parameter, expression used for sorting, optional type: any type; 5. DESC: Optional parameter, specifies descending order, default is ascending order |
Return Type | Any type |
Description | Retrieves the i-th arg value within the window |
- Example
NTH_VALUE({zh_name}, 2) OVER( PARTITION BY {prime_genre} ORDER BY {runtime} DESC )
The result is as follows:
zh_name | runtime | prime_genre | value |
---|---|---|---|
Interstellar | 169 | Drama | Schindler's List |
Schindler's List | 195 | Drama | Schindler's List |
Flirting Scholar | 102 | Comedy | Flirting Scholar |
Identity | 90 | Drama | Schindler's List |
The Lord of the Rings: The Return of the King | 201 | Drama | null |
City Lights | 87 | Drama | Schindler's List |
The Man from Earth | 87 | Drama | Schindler's List |
The Gods Must Be Crazy | 109 | Comedy | null |
A Chinese Odyssey Part Two: Cinderella | 95 | Comedy | Flirting Scholar |
The Chorus | 97 | Drama | Schindler's List |
rank
Syntax | rank() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] ) |
---|---|
Parameter Description | 1. expr1: Optional parameter, expression for partitioning, allowed types: any type; 2. expr2: Optional parameter, expression for sorting, allowed types: any type; 3. DESC: Optional parameter, specifies descending order, default is ascending order |
Return Type | Number |
Description | Returns the rank of the current window (with gaps), counting starts from 1 |
- Example
rank() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)
The result is as follows:
runtime | prime_genre | value |
---|---|---|
169 | Drama | 5 |
195 | Drama | 6 |
102 | Comedy | 2 |
90 | Drama | 3 |
201 | Drama | 7 |
87 | Drama | 1 |
87 | Drama | 1 |
109 | Comedy | 3 |
95 | Comedy | 1 |
97 | Drama | 4 |
row_number
Syntax | row_number() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] ) |
---|---|
Parameter Description | 1. expr1: Optional parameter, expression for partitioning, allowed types: any type; 2. expr2: Optional parameter, expression for sorting, allowed types: any type; 3. DESC: Optional parameter, specifies descending order, default is ascending order |
Return Type | Number |
Description | Returns the row number of the current window, counting starts from 1 |
- Example
ROW_NUMBER() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)
The result is as follows:
runtime | prime_genre | value |
---|---|---|
169 | Drama | 5 |
195 | Drama | 6 |
102 | Comedy | 2 |
90 | Drama | 3 |
201 | Drama | 7 |
87 | Drama | 2 |
87 | Drama | 1 |
109 | Comedy | 3 |
95 | Comedy | 1 |
97 | Drama | 4 |
Aggregate Functions Supporting Window Calculations
Some aggregate functions can also be used as window functions. An aggregate function performs window calculations only when it is followed by an OVER clause. The list of functions is as follows:
Function Name | Description |
---|---|
SUM | Calculates the sum of a numeric column |
AVG | Calculates the average value of a numeric column |
MAX | Calculates the maximum value of a numeric column |
MIN | Calculates the minimum value of a numeric column |
CORR | Calculates the correlation coefficient between two numeric columns |
COVAR_POP | Calculates the population covariance between two numeric columns |
COVAR_SAMP | Calculates the sample covariance between two numeric columns |
REGR_AVGX | Calculates the mean of the independent variable in linear regression |
REGR_AVGY | Calculates the mean of the dependent variable in linear regression |
REGR_COUNT | Calculates the number of non-null data points in linear regression |
REGR_INTERCEPT | Calculates the intercept of the linear regression equation |
REGR_R2 | Calculates the coefficient of determination (R²) in linear regression |
REGR_SLOPE | Calculates the slope of the linear regression equation |
REGR_SXX | Calculates the sum of squares of the independent variable in linear regression |
REGR_SXY | Calculates the sum of cross-products of the independent and dependent variables in linear regression |
REGR_SYY | Calculates the sum of squares of the dependent variable in linear regression |
STDDEV_POP | Calculates the population standard deviation of a numeric column |
STDDEV_SAMP | Calculates the sample standard deviation of a numeric column |
VAR_POP | Calculates the population variance (square of the population standard deviation) of a numeric column |
VAR_SAMP | Calculates the sample variance (square of the sample standard deviation) of a numeric column |
STRING_AGG | Aggregates and concatenates string columns in a specified manner |
JSON_AGG | Aggregates data into a JSON format array |
COUNT | Calculates the number of rows (can specify a column to ignore NULL values, or use * to include all rows) |
order by Clause
Tip
Here is an important concept related to window functions: for each row, the set of rows in its partition is called its window frame.
Some window functions operate only on the rows within the window frame, not the entire partition. For such functions, if ORDER BY
is used, the frame includes all rows from the start of the partition up to the current row, as well as any subsequent rows that are equal to the current row in the ORDER BY
clause.
For all aggregate functions, when using an ORDER BY
expression in the OVER clause, the final returned result will differ in various ways from not using the ORDER BY
expression.
- Example 1
Get the total number of students in each class.
COUNT(1) OVER (PARTITION BY {class})
- Example 2
Concatenate the names of students in each class into a JSON array in order of their scores.
JSON_AGG({student_name}) OVER (PARTITION BY {class} ORDER BY {student_score})
Notes on Window Functions
Precautions when using window functions as chart metrics:
- Window functions are not aggregate functions and cannot be used as aggregate functions.
- When there are dimension columns in the chart, if the window function uses the OVER clause [PARTITION BY expr1]/[ORDER BY expr2], any non-aggregate expr1 or expr2 must be a dimension column of the chart, otherwise an error will occur.
- Suppose the chart has dimensions a and b, and the metric is MAX(COUNT({Book ID})) OVER (***). The calculation logic is as follows:
- Group by a and b, calculate the value of COUNT({Book ID}), and obtain an intermediate result: (a, b, count_tmp)
- If the metric is MAX(COUNT({Book ID})) OVER (), the result is: find the maximum value in the count_tmp column
- If the metric is MAX(COUNT({Book ID})) OVER (PARTITION BY a), the result is: from the (a, tmp) array, use the value of a as the partition basis, and find the maximum tmp value within each group
- If the metric is MAX(COUNT({Book ID})) OVER (PARTITION BY b), the result is: from the (b, tmp) array, use the value of b as the partition basis, and find the maximum tmp value within each group