# Window Function¶

The Window Function performs calculation across a set of related table rows. The set of related rows is called a Partition or a Window Frame.

## When to use¶

Use window functions when you want to calculate something based on the relation of rows with respect to each other. Such scenarios may include:

1. Cumulating values across a set of rows or finding minimum, averages etc.
2. Ranking rows, finding difference between previous and current row
3. Find relationship among values within a Partition.

## Quick Start¶

Look at the following data: In the above example, there are 3 partitions of related rows. The rows in each partition share the same entry in the column Group Name namely Laptop, Smartphone and Tablet

For example, we would like to determine the following:

1. The total revenue for every group in Group Name
2. Regroup the data into 3 major price groups - Price Group

To calculate the total revenue for every group in Group Name, the sum of the column price needs to be evaluated in each partition. The data appears as shown below: The data has been partitioned by the column Group Name and calculated total of column Revenue within each column. This is achieved by choosing Apply Partitioning option and choosing the right column to Partition By

Note how values in Group Sum is displayed as an aggregate value. This is done by setting the Calculate As option to Aggregate over partition When aggregating data, unlike the group task, the use of a window function does not cause rows to be grouped into a single output row with aggregated values in a new column. The rows retain their separate identities and aggregated values are repeated across each of the rows.

You can can also calculate a running sum of the column Revenue over this partition. This is done by setting the Calculate As option to Accumulate over partition Partitioning is not mandatory. You can calculate the running total over the entire data like this: This is done by not choosing to partition over a column, by unchecking the Apply Partitioning option. Regroup the data into 3 major price groups - Price Group

The data appears as shown below: This is done by the NTile function. Note that there is no partition chosen, and the only option used is sort by Revenue column.

Notice that the sort option in the configuration says Revenue(asc). This option decides the column based on which the data is grouped and sorted.

The configuration is shown below: ## Supported Options¶

Following options are provided by Window Functions:

1. Calculate: To specify the function you want to use.
2. Of Column: To specify the column which the function should be use. This is only visible when the function selected in Calculate section requires a column.
3. Apply Partitioning: To specify if the data should be partitioned.
4. Partition by: To specify which column to partition. This is only visible when Apply Partitioning is chosen.
5. Calculation Type: To specify the type of operation over partition that the function chosen in Calculate section will operate on. Supports two options, Accumulate over partition or Aggregate over partition
6. N: The value for N. This shows up only when the function selected is NTILE or NTH_VALUE.
7. Sort: To specify the sort for the window. Check usage for sort here. Also see Sorting in tasks.
8. Apply result into: With this you can apply the result into a new column of name. See result documentation

## Types of Functions¶

The Window functions supported by Mammoth fall into following categories.

### Aggregation functions¶

Refer Quickstart for practical usage of the functions. Also check Accumulate over partition and Aggregate over partition before diving into the function definitions and usage.

All the calculations using the following functions can be divided by a partition, and if a partition is not specified, then the calculations are done considering the entire data as the window.

#### Sum¶

Calculate the sum for a given numeric column.

The data appears as shown below: The configuration is as shown below: #### Average¶

Calculate the average for a given numeric column.

The data appears as shown below: The configuration to calculate the average over every group is shown below: #### Maximum¶

Determine the maximum value in a given numeric column.

The data appears as shown below: The configuration is shown below: #### Minimum¶

Similar to Maximum, calculate the minimum value in a given numeric column.

The data appears as shown below: The configuration is shown below: #### Standard Deviation¶

By definition, Standard deviation is a number used to tell how measurements for a group are spread out from the average/mean, or expected value. A lower value means that most of the numbers are close to the average. A higher value means that the numbers are more spread out.

The data appears as shown below: The configuration is shown below: #### Variance¶

Variance is used for measuring the spread of data, similar to Standard Deviation. It grows very large when there is uneven spread in the distribution of data.

The data appears as shown below: The configuration is shown below: #### Count¶

Calculate the number of occurences of a value in any column. Demonstrating this using the same example as above to check the number of times each month occurs, which can later be filtered using explore cards.

The data apepars as shown below: The configuration is as below: #### Accumulate over partition¶

When you create an operation that accumulates over partition, it would create values in rows that successively apply the same operation over each row and accumulate that value. In the below table the Accumulated column shows marks Accumulated over the Student Partition as sum

Student Subject Marks Accumulated
Alice Math 87 87
Alice Science 92 179
Alice English 87 266
Alice Economics 89 355
Morris Math 93 93
Morris Science 89 182
Morris English 82 264
Morris Economics 97 361

#### Aggregate over partition¶

When you create an operation that aggregates over partition, it would create values in rows that provide the result of operation over all values across all rows for a column. In the below table the Aggregated column shows marks Aggregated over the Student Partition as sum

Student Subject Marks Aggregated
Alice Math 87 355
Alice Science 92 355
Alice English 87 355
Alice Economics 89 355
Morris Math 93 361
Morris Science 89 361
Morris English 82 361
Morris Economics 97 361

### Ranking functions¶

The following ranking functions are supported By Mammoth. Ranking functions always require a column on which the data is sorted.

#### Row Number¶

Evaluate the number of the current row withing its partition, starting from 1. In the example, we assign row numbers for every item on Product Name with partitioned by Group Name.

The data appears as below: The configuration to achieve this is shown below: #### Rank¶

Rank of the current row with gaps. Gaps mean that if there are multiple rows with the same values, the subsequent number that was repeated would be skipped. For example, In a series [A,B,B,C,D], the rank of 2nd B would be 2 while rank of C would be 4.

A good use case would be to calculate the best selling product in each group as demonstrated in the example. The partition by option decides the window, while the sort option decides the column based on which ranks are assigned, and the sort order of the data.

The data appears as shown below: The configuration to achieve this is shown below: #### Dense Rank¶

Similar to rank, but without gaps. For example in a series [A,B,B,C,D], the rank of 2nd B would be 2 while rank of C would be 3

The data appears as shown below: The configuration to achieve this is shown below: #### Percent Rank¶

This represents the relative rank of the current row. Mathematically represented as: (rank - 1) / (total rows - 1). The value of percent rank ranges from 0-1, the first row always having ‘0’ as the value, increasing from there. The best and least grossing products in the given data is computed with this function.

The data appears as shown below: The configuration is as shown below: #### Cumulative Percent Rank¶

Relative rank of the current row: (number of rows preceding or peer with current row) / (total rows). Check the example below, from the cumulative percent rank values, it is evident that over 55% products generate a revenue more than 600.

The data appears as shown below: The configuration is as shown below: #### Ntile¶

Integer ranging from 1 to the argument value, dividing the partition as equally as possible. This requires an additional option N to be set. Please note that setting N to 100 will not show you percentile unless there are more than 100 rows in that partition.

NTile function divides a result set into a number of buckets and assign a bucket number to each row. Check the data below divided almost evenly among the rankings 1 through 4.

The data appears as shown below: The configuration is shown below: ### Relative functions¶

#### First Value¶

Find the first value in a given column in a partition. In the given example, we are determining the least grossing product in every group.

The data appears as shown below: Determine the least grossing product by using sort. This option determines the value by which first value is determined, and the sort order.

The configuration is shown below: #### Last Value¶

Simlar to First Value, this function is used to determine the last value in a given column in a partition.

The data appears as shown below: The configuration is shown below: #### Nth Value¶

The Nth Value allows you to get a value from the Nth row in an ordered set of rows. In the example, we have set the value of ‘N’ to 2, applied partitioning by Group Name and sorted Revenue column in descending order to determine the second best seller.

The data appears as shown below: The configuration is shown below: Find the value in a given column in the next row. This will be empty for the last row in a given window frame. The Lead function can be very useful for comparing the value of the current row with the value of the following row.

The data appears as shown below: The configuration is shown below: #### Lag¶

Find the value in a given column in the previous row. This will be empty for the first row in a given window frame.

The data appears as shown below: The configuration is shown below: 