Skip to content

Window Functions

The syntax for window functions:

function_name([expression [, expression ... ]]) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )

  • [ PARTITION BY expr1 ] Optional parameter. PARTITION BY groups the queried data, and the window function processes each group independently.
  • [ ORDER BY expr2 [ DESC ] ] Optional parameter. Sorts by expr2.
  • [ DESC ] Optional parameter. By default, sorting is in ascending order. Using the DESC parameter will sort the results in descending order.

In this introduction, the following table will be used as sample data for illustration.

idzh_namedirectorprime_genreruntimerate_numvotesstarspubdate
1InterstellarChristopher NolanDrama1697.75416252Matthew McConaughey2014-11-12
2Schindler's ListSteven SpielbergDrama1957.97329749Liam Neeson1993-11-30
3Flirting ScholarLee Lik-ChiComedy1024.25260928Stephen Chow1993-07-01
4IdentityJames MangoldDrama909.57271970John Cusack2003-04-25
5The Lord of the Rings: The Return of the KingPeter JacksonDrama2015.24255167Viggo Mortensen2004-03-12
6City LightsCharlie ChaplinDrama870.1134216Charlie Chaplin1931-01-30
7The Man from EarthRichard SchenkmanDrama874.15147835David Lee Smith2007-11-13
8The Gods Must Be CrazyJamie UysComedy1092.6783463Marius Weyers1980-09-10
9A Chinese Odyssey Part Two: CinderellaJeffrey LauComedy953.91395680Stephen Chow2014-10-24
10The ChorusChristophe BarratierDrama979.92396189Gérard Jugnot2004-10-16

dense_rank

Syntaxdense_rank() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
Parameter Description1. expr1: Optional parameter, expression used for partitioning; allowed types: any type;
2. expr2: Optional parameter, expression used for sorting; allowed types: any type;
3. DESC: Optional parameter, specifies descending order, default is ascending order
Return TypeNumber
DescriptionReturns the rank of the current window (without gaps), counting starts from 1. The difference between dense_rank and rank is that when there are duplicate ranks, dense_rank assigns the next consecutive rank, while rank skips the numbers that would have been assigned to the duplicates.
  • Example
text
dense_rank() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)

The result is as follows:

runtimeprime_genrevalue
169Drama4
195Drama5
102Comedy2
90Drama2
201Drama6
87Drama1
87Drama1
109Comedy3
95Comedy1
97Drama3

lag

Syntaxlag(ARG, I) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
Parameter Description1. ARG: Window frame, optional type: any type;
2. I: Specifies the distance between the record and the current record, optional type: number;
3. expr1: Optional parameter, expression for partitioning, optional type: any type;
4. expr2: Optional parameter, expression for sorting, optional type: any type;
5. DESC: Optional parameter, specifies descending order, default is ascending order
Return TypeAny type
DescriptionWindow function that retrieves the arg data of the record at a specified distance i before the current record
  • Example
text
lag({zh_name}, 2) OVER (ORDER BY {runtime} ASC)

The result is as follows:

zh_nameruntimevalue
Interstellar169Flirting Scholar
Schindler's List195The Gods Must Be Crazy
Flirting Scholar102A Chinese Odyssey Part II
Identity90City Lights
The Lord of the Rings: The Return of the King201Interstellar
City Lights87null
The Man from Earth87null
The Gods Must Be Crazy109The Chorus
A Chinese Odyssey Part II95The Man from Earth
The Chorus97Identity

lead

Syntaxlead(ARG, I)
Parameter Description1. Parameter to be calculated, optional type: any type;
2. Integer parameter, optional type: number
Return TypeAny type
DescriptionWindow function that retrieves the value of arg from the record that is i positions after the current record
  • Example
text
lead({zh_name}, 2) OVER (ORDER BY {runtime} ASC)

The result is as follows:

zh_nameruntimevalue
Interstellar169The Lord of the Rings: The Return of the King
Schindler's List195null
Flirting Scholar102Interstellar
Identity90The Chorus
The Lord of the Rings: The Return of the King201null
City Lights87Identity
The Man from Earth87A Chinese Odyssey Part Two: Cinderella
The Gods Must Be Crazy109Schindler's List
A Chinese Odyssey Part Two: Cinderella95Flirting Scholar
The Chorus97The Gods Must Be Crazy

nth_value

Syntaxnth_value(ARG, I) OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
Parameter Description1. ARG: Window frame, optional type: any type;
2. I: The row number of the data you want to retrieve, optional type: number;
3. expr1: Optional parameter, expression used for partitioning, optional type: any type;
4. expr2: Optional parameter, expression used for sorting, optional type: any type;
5. DESC: Optional parameter, specifies descending order, default is ascending order
Return TypeAny type
DescriptionRetrieves the i-th arg value within the window
  • Example
text
NTH_VALUE({zh_name}, 2) OVER( PARTITION BY {prime_genre} ORDER BY {runtime} DESC )

The result is as follows:

zh_nameruntimeprime_genrevalue
Interstellar169DramaSchindler's List
Schindler's List195DramaSchindler's List
Flirting Scholar102ComedyFlirting Scholar
Identity90DramaSchindler's List
The Lord of the Rings: The Return of the King201Dramanull
City Lights87DramaSchindler's List
The Man from Earth87DramaSchindler's List
The Gods Must Be Crazy109Comedynull
A Chinese Odyssey Part Two: Cinderella95ComedyFlirting Scholar
The Chorus97DramaSchindler's List

rank

Syntaxrank() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
Parameter Description1. expr1: Optional parameter, expression for partitioning, allowed types: any type;
2. expr2: Optional parameter, expression for sorting, allowed types: any type;
3. DESC: Optional parameter, specifies descending order, default is ascending order
Return TypeNumber
DescriptionReturns the rank of the current window (with gaps), counting starts from 1
  • Example
text
rank() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)

The result is as follows:

runtimeprime_genrevalue
169Drama5
195Drama6
102Comedy2
90Drama3
201Drama7
87Drama1
87Drama1
109Comedy3
95Comedy1
97Drama4

row_number

Syntaxrow_number() OVER( [ PARTITION BY expr1 ] [ ORDER BY expr2 [ DESC ] ] )
Parameter Description1. expr1: Optional parameter, expression for partitioning, allowed types: any type;
2. expr2: Optional parameter, expression for sorting, allowed types: any type;
3. DESC: Optional parameter, specifies descending order, default is ascending order
Return TypeNumber
DescriptionReturns the row number of the current window, counting starts from 1
  • Example
text
ROW_NUMBER() OVER (PARTITION BY {prime_genre} ORDER BY {runtime} ASC)

The result is as follows:

runtimeprime_genrevalue
169Drama5
195Drama6
102Comedy2
90Drama3
201Drama7
87Drama2
87Drama1
109Comedy3
95Comedy1
97Drama4

Aggregate Functions Supporting Window Calculations

Some aggregate functions can also be used as window functions. An aggregate function performs window calculations only when it is followed by an OVER clause. The list of functions is as follows:

Function NameDescription
SUMCalculates the sum of a numeric column
AVGCalculates the average value of a numeric column
MAXCalculates the maximum value of a numeric column
MINCalculates the minimum value of a numeric column
CORRCalculates the correlation coefficient between two numeric columns
COVAR_POPCalculates the population covariance between two numeric columns
COVAR_SAMPCalculates the sample covariance between two numeric columns
REGR_AVGXCalculates the mean of the independent variable in linear regression
REGR_AVGYCalculates the mean of the dependent variable in linear regression
REGR_COUNTCalculates the number of non-null data points in linear regression
REGR_INTERCEPTCalculates the intercept of the linear regression equation
REGR_R2Calculates the coefficient of determination (R²) in linear regression
REGR_SLOPECalculates the slope of the linear regression equation
REGR_SXXCalculates the sum of squares of the independent variable in linear regression
REGR_SXYCalculates the sum of cross-products of the independent and dependent variables in linear regression
REGR_SYYCalculates the sum of squares of the dependent variable in linear regression
STDDEV_POPCalculates the population standard deviation of a numeric column
STDDEV_SAMPCalculates the sample standard deviation of a numeric column
VAR_POPCalculates the population variance (square of the population standard deviation) of a numeric column
VAR_SAMPCalculates the sample variance (square of the sample standard deviation) of a numeric column
STRING_AGGAggregates and concatenates string columns in a specified manner
JSON_AGGAggregates data into a JSON format array
COUNTCalculates the number of rows (can specify a column to ignore NULL values, or use * to include all rows)

order by Clause

Tip

Here is an important concept related to window functions: for each row, the set of rows in its partition is called its window frame.
Some window functions operate only on the rows within the window frame, not the entire partition. For such functions, if ORDER BY is used, the frame includes all rows from the start of the partition up to the current row, as well as any subsequent rows that are equal to the current row in the ORDER BY clause.
For all aggregate functions, when using an ORDER BY expression in the OVER clause, the final returned result will differ in various ways from not using the ORDER BY expression.

  • Example 1

Get the total number of students in each class.

COUNT(1) OVER (PARTITION BY {class})
  • Example 2

Concatenate the names of students in each class into a JSON array in order of their scores.

JSON_AGG({student_name}) OVER (PARTITION BY {class} ORDER BY {student_score})

Notes on Window Functions

Precautions when using window functions as chart metrics:

  1. Window functions are not aggregate functions and cannot be used as aggregate functions.
  2. When there are dimension columns in the chart, if the window function uses the OVER clause [PARTITION BY expr1]/[ORDER BY expr2], any non-aggregate expr1 or expr2 must be a dimension column of the chart, otherwise an error will occur.
  3. Suppose the chart has dimensions a and b, and the metric is MAX(COUNT({Book ID})) OVER (***). The calculation logic is as follows:
    • Group by a and b, calculate the value of COUNT({Book ID}), and obtain an intermediate result: (a, b, count_tmp)
    • If the metric is MAX(COUNT({Book ID})) OVER (), the result is: find the maximum value in the count_tmp column
    • If the metric is MAX(COUNT({Book ID})) OVER (PARTITION BY a), the result is: from the (a, tmp) array, use the value of a as the partition basis, and find the maximum tmp value within each group
    • If the metric is MAX(COUNT({Book ID})) OVER (PARTITION BY b), the result is: from the (b, tmp) array, use the value of b as the partition basis, and find the maximum tmp value within each group

User Manual for Hengshi Analysis Platform