A lesser-known feature of some SQL dialects is something called the "window function." While MySQL users will be left out in the cold, most other SQL dialects can take advantage of their power. They can be a little tricky to wrap your mind around at first, but certain calculations—which are very complex or impossible without window functions—can become straightforward.

**Intriguing**

To demonstrate the power of window functions, let's take a look at an example set of customer data:

name |
status |
lifetime_spend |

Neil Armstrong | Platinum | 1000.00 |

Buzz Aldrin | Platinum | 2000.00 |

Yuri Gagarin | Platinum | 3000.00 |

John Glenn | Gold | 400.00 |

Alan Shepard | Gold | 500.00 |

Jim Lovell | Gold | 600.00 |

Now, suppose you want to know how the customer ranks in spending, against the other customers in their status. In other words, you're hoping for a result set that looks like this:

name |
status |
lifetime_spend |
status_rank |

Neil Armstrong | Platinum | 1000.00 | 3 |

Buzz Aldrin | Platinum | 2000.00 | 2 |

Yuri Gagarin | Platinum | 3000.00 | 1 |

John Glenn | Gold | 400.00 | 3 |

Alan Shepard | Gold | 500.00 | 2 |

Jim Lovell | Gold | 600.00 | 1 |

**It's About to Get Real**

Logically, this seems like a straightforward thing to do; so one might expect a fairly easy method in SQL. It is easy with a window function, but without one, it gets kind of nasty. If you do a quick Google search you're likely to come across several approaches. Some of them utilize concatenation hacks, clever self joins, and even user-defined variables. Did you even know SQL had variables you could define?

Just to demonstrate one option (which avoids variables for those who haven't used them), you could run a query like this:

`SELECT c1.name,`

c1.status,

c1.lifetime_spend,

COUNT(*) AS status_rank

FROM customer AS c1

JOIN customer AS c2

ON c1.lifetime_spend <= c2.lifetime_spend AND

c1.status = c2.status

GROUP BY c1.name,

c1.status,

c1.lifetime_spend

Demonstrating how this query works is not the primary focus of this entry, so we won't get into a step-by-step explanation. However, it can be helpful to consider how the `JOIN`

works in this query. Before applying any column selections or grouping, you would arrive at a table like this:

c1.name |
c1.status |
c1.lifetime_spend |
c2.name |
c2.status |
c1.lifetime_spend |

Neil Armstrong | Platinum | 1000.00 | Neil Armstrong | Platinum | 1000.00 |

Neil Armstrong | Platinum | 1000.00 | Buzz Aldrin | Platinum | 2000.00 |

Neil Armstrong | Platinum | 1000.00 | Yuri Gagarin | Platinum | 3000.00 |

Buzz Aldrin | Platinum | 2000.00 | Buzz Aldrin | Platinum | 2000.00 |

Buzz Aldrin | Platinum | 2000.00 | Yuri Gagarin | Platinum | 3000.00 |

Yuri Gagarin | Platinum | 3000.00 | Yuri Gagarin | Platinum | 3000.00 |

... | ... | ... | ... | ... | ... |

Now, it's a little easier to see that the number of times each name appears in the first column is the rank for that name, within its status.

**Stop the Pain**

Certainly, SQL queries can get a lot more complex than the one we just saw. However, to do something as straightforward as calculating a rank, it's a lot of fussing about. With a window function, this gets a lot easier and more elegant:

SELECT name,

status,

lifetime_spend,

RANK() OVER(PARTITION BY status ORDER BY lifetime_spend DESC) AS status_rank

FROM customer

That is it—no tricky logic, just exactly what you want.

**Break it Down Now**

Let's break down the concept of a window function, to understand exactly how it works. The general idea is that you perform some sort of calculation over a group of rows in the table. Then, you list the result of that calculation next to each row that is a part of that group.

This is where window functions can get a little confusing. Typically, when you perform a calculation across multiple rows, you need to use it in combination with `GROUP BY`

. Then, your calculation (like `COUNT`

or `SUM`

) is run for each group. For example, you might run:

`SELECT status,`

SUM(lifetime_spend) AS total_spend

FROM customer

GROUP BY status

This would create the following table:

status |
total_spend |

Platinum | 6000.00 |

Gold | 1500.00 |

Window functions perform those same types of calculations across multiple rows, but they return a result for each individual row in the table. In other words, they don't need to be used with `GROUP BY`

. Instead of using `GROUP BY`

, you specify your desired groups of aggregation by using `PARTITION BY`

. For example, you could run:

`ELECT name,`

status,

lifetime_spend,

SUM(lifetime_spend) OVER(PARTITION BY status) AS total_status_spend

FROM customer

This would generate the following table:

name |
status |
lifetime_spend |
total_status_spend |

Neil Armstrong | Platinum | 1000.00 | 6000.00 |

Buzz Aldrin | Platinum | 2000.00 | 6000.00 |

Yuri Gagarin | Platinum | 3000.00 | 6000.00 |

John Glenn | Gold | 400.00 | 1500.00 |

Alan Shepard | Gold | 500.00 | 1500.00 |

Jim Lovell | Gold | 600.00 | 1500.00 |

This is part of the magic of a window function. You can perform aggregation, without having to squish down your table via `GROUP BY`

.

There are also some special window functions that let you do other interesting things. As we have seen, `RANK`

lets us rank rows based on how they compare to other rows in their group. Other useful functions include `LAG`

and `LEAD`

, which let us get values from a row that is before or after the current row. `NTILE`

lets you find the percentile / quartile / decile / etc. a row is in within its group. Other similar functions and their usage are easily found on the web for your particular SQL dialect.

**The Window Function in Your Life**

**Who Doesn't Love Some Good Syntax?**

The syntax of a window function works like this:

**RANK() OVER(PARTITION BY status ORDER BY lifetime_spend DESC)**

**RANK()**: Every window function starts off with the calculation to be performed. These can be the aggregate functions you are probably familiar with like`COUNT`

,`SUM`

,`AVG`

, etc. You can also use the special window functions like`RANK`

,`LAG`

,`LEAD`

,`NTILE`

, etc.**OVER( … )**: Every window function must use`OVER`

, so SQL knows you want to perform a window function.`OVER`

also accepts the other settings for your window function, if needed. It is possible you won't need any additional options, and will just use`OVER()`

.**PARTITION BY status:**If you want to perform your calculation over specific groups, you specify those groups using`PARTITION BY`

.`PARTITION BY`

accepts the name of a column, or multiple columns. The unique values of that column (or columns) is considered a group. In this sense,`PARTITION BY`

is very similar to`GROUP BY`

. If you want the calculation to be performed across the entire table, instead of for specific groups, you can just leave out this statement.

**ORDER BY lifetime_spend DESC:**Some calculations are sensitive to the order of rows within each group. For example, if you want to know something's`RANK`

, you need to tell SQL how it should order the rows, before it can determine who is in "first place," "second place," and so on. If you need to explain how the rows should be ordered within each group, use`ORDER BY`

. Other types of calculations are not sensitive to the order of the rows. For example, it doesn't matter how the rows are ordered when using`SUM`

, because a sum is the same no matter what order it's performed in. If you don't need an`ORDER BY`

, you can exclude it.

**Examples!**

Show the total amount sold, for all orders, next to each row:

SUM(order_value) OVER()

Show the total amount sold, for each store, next to each row for that store:

SUM(order_value) OVER(PARTITION BY store)

Show the rank of each order according to its value, compared to all orders, next to the row for that order:

RANK() OVER(ORDER BY order_value DESC)

Show the rank of each order according to its value, compared to its store, next to the row for that order:

RANK() OVER(PARTITION BY store ORDER BY order_value DESC)

Show the quartile of each order according to its value, compared to its store, next to the row for that order:

NTILE(4) OVER(PARTITION BY store ORDER BY order_value DESC)

Just to get a little crazy, and demonstrate that window functions can do pretty complex things, show the name of the person who purchased the order that is two orders greater in value than the current order, when considering rank by store and department, and say "none found," if the order was in the top 2 already:

LAG(customer_name, 2, 'none found') OVER(

PARTITION BY store, department

ORDER BY order_value DESC

)

**More Power: Rolling Averages, Cumulative Sums, and Beyond**

Though we have covered a lot, there is one more feature of window functions that can give you even more analytical power. It's called the "frame clause," and it allows you to calculate things like rolling averages, cumulative sums, and many other interesting values.

So far, all the grouping we've done is according to the unique values within a column or set of columns (using `PARTITION BY`

). However, you can do even fancier things by making the group relative to each row. In other words, as SQL runs through each row, it will perform a calculation for the surrounding rows. This allows you to do things like get a sum for only the 5 previous rows.

Here is a diagram that demonstrates which rows SQL will consider (in purple) when it reaches a given row (in aqua), if we tell it to look at the 5 previous rows:

1 | 1 | 1 |

2 | 2 | 2 |

3 | 3 | 3 |

4 | 4 | 4 |

5 | 5 | 5 |

6 | 6 | 6 |

7 | 7 | 7 |

8 | 8 | 8 |

9 | 9 | 9 |

10 | 10 | 10 |

**More Syntax**

To create a moving group like this, you'll use a "frame clause," which looks like this:

`SUM(over_value) OVER(`

PARTITION BY store

ORDER BY over_value DESC

ROWS BETWEEN 5 PRECEDING AND CURRENT ROW

)

**ROWS BETWEEN**

Start a frame clause by using `ROWS BETWEEN`

. This indicates that you want a moving group, relative to the current row.

**5 PRECEDING**

This is where you tell SQL the row that will start your moving group. The different options are:

`UNBOUNDED PRECEDING`

: start off at the first row of the partition`X PRECEDING`

: start off x rows before the current row`CURRENT ROW`

: start off at the current row`X FOLLOWING`

: start off x rows after the current row

**AND CURRENT ROW**

This is where you tell SQL the row that will end your moving group. The different options are:

`AND X PRECEDING`

: end x rows before the current row`AND CURRENT ROW`

: end at the current row`AND X FOLLOWING`

: end x rows after the current row`AND UNBOUNDED FOLLOWING`

: end at the last row of the partition

It's relatively common to exclude this section from the frame clause. If you do so, SQL will assume you want to end at the `CURRENT ROW`

.

**ORDER BY and PARTITION BY with a Moving Group**

The way that partitioning and ordering works with a moving group of rows like this can be a little confusing.

The `ORDER BY`

clause is important when you are using a frame clause. SQL needs to know how it should order the rows, before it can decide which rows are before or after any other.

The `PARTITION BY`

clause is not necessarily required when using a frame clause, but there are times you might want to use one. If you do, you should know that the rows which are considered in the calculation must be within the same partition as the current row. This can have an effect near the beginning or end of a partition. For example, if we asked for 2 rows before and 2 rows after the current row:

**More Examples!**

Calculate a rolling average of order size, for all orders, based on the previous 10 orders:

AVG(order_value) OVER(

ORDER BY order_time

ROWS BETWEEN 10 PRECEDING AND CURRENT ROW

)

Calculate a rolling average of order size, for each store, based on the previous 10 orders:

AVG(order_value) OVER(

PARTITION BY store

ORDER BY order_time

ROWS BETWEEN 10 PRECEDING AND CURRENT ROW

)

Calculate a cumulative sum of total amount sold, since the beginning of time:

SUM(order_value) OVER(

ORDER BY order_time

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

Calculate a cumulative sum of total amount sold, starting over each day:

SUM(order_value) OVER(

PARTITION BY order_date

ORDER BY order_time

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

**Using Window Functions in Looker**

One important thing to keep in mind about window functions is that they cannot be placed in a `GROUP BY`

clause. In Looker, dimensions are always placed into the `GROUP BY`

clause. Therefore, you cannot simply place a window function in a dimension. For example, the following LookML will not work properly:

`# Will result in an error`

dimension: value_rank_by_store {

type: number

sql: RANK() OVER(PARTITION BY store ORDER BY order_value DESC) ;;

}

**Make it Work with Derived Tables**

For this reason (generally speaking), window functions are used in derived tables. For example, to add a "value_rank_by_store" dimension to your order table, you might first create a derived table like this one:

view: order_window_facts {

derived_table: {

sql:

SELECT

order_id,

RANK() OVER(PARTITION BY store ORDER BY order_value DESC)

AS value_rank_by_store

FROM

order ;;

}

dimension: order_id {

type: int

sql: ${TABLE}.order_id

primary_key: yes

hidden: yes

}

dimension: value_rank_by_store {

type: number

sql: ${TABLE}.value_rank_by_store

}

}

Then, you could easily join it back to your original order table in your model file:

explore: order {

join: order_window_facts {

sql_on: ${order.id} = ${order_window_facts.order_id} ;;

}

}

**The CliffsNotes Version**

To summarize the above:

- Window functions can be an easy and elegant way to add ranking, rolling averages, cumulative sums, and other powerful calculations to your queries.
- Window functions allow you to perform aggregate calculations (like
`COUNT`

,`SUM`

, and`AVG`

, etc.) against a group of rows, then list the result next to all the rows in that group. - Window functions can also give you new types of calculations like
`RANK`

,`LAG`

, and`NTILE`

. - In Looker, you usually implement window functions via derived tables.