One of Looker’s most powerful features is what we call Symmetric Aggregates. But, because they can *look* pretty gnarly and mostly happen behind the scenes, coming upon them can be a bit confusing. So here’s a quick rundown of why they exist, how they work, and why they’re so "unattractive."

## Why Do You Need Symmetric Aggregates?

SQL, the language of data analysis, is amazingly powerful. But with that great power comes great responsibility, and one of the responsibilities analysts have had to manage is not accidentally calculating incorrect aggregates, such as sums, averages, and counts (There are literally threads all over the internet about this problem).

Doing these calculations incorrectly is surprisingly easy and probably the cause of more analyst heartburn than we’ll ever know. An example makes it clear how you can go wrong.

Imagine you have two tables: `orders`

and `order_items`

. The `order_items`

table records one row for each item in an order, so the relationship between the tables is one-to-many (because one order can have many items, but each item can only be part of one order). Let’s say the `orders`

table looks like this:

order_id | user_id | total | order_date

-------------------------------------------

1 | 100 | 50.36 | 2017-12-01

2 | 101 | 24.12 | 2017-12-02

3 | 137 | 50.36 | 2017-12-02

`SUM(total)`

= `124.84`

.

Now, let’s say the `order_items`

table contains six rows:

`order_id | item_id | quantity | unit_price`

--------------------------------------------

1 | 50 | 1 | 23.00

1 | 63 | 2 | 13.68

2 | 63 | 1 | 13.68

2 | 72 | 1 | 5.08

2 | 79 | 1 | 5.36

3 | 78 | 1 | 50.36

If you wanted to get the count of items ordered, that’d be easy, too. `SUM(quantity)`

= `7`

.

Now, let’s say we joined these two tables using their shared column, `order_id`

, which gives us:

`order_id | user_id | total | order_date | item_id | quantity | unit_price `

---------------------------------------------------------------------------

1 | 100 | 50.36 | 2017-12-01 | 50 | 1 | 23.00

1 | 100 | 50.36 | 2017-12-01 | 63 | 2 | 13.68

2 | 101 | 24.12 | 2017-12-02 | 63 | 1 | 13.68

2 | 101 | 24.12 | 2017-12-02 | 72 | 1 | 5.08

2 | 101 | 24.12 | 2017-12-02 | 79 | 1 | 5.36

3 | 137 | 50.36 | 2017-12-02 | 78 | 1 | 50.36

We can see some new things, such as, three items were ordered on Dec 1 and four items were ordered on Dec 2. Some of our previous calculations, like `SUM(quantity),`

will continue to work fine. But what if we try to get the total spent? **Now we’ve got a problem!**

If we use our previous calculation, `SUM(total)`

, the 50.36 total for `order_id = 1`

is going to be counted twice, because it contained two different items, and the 24.12 total for `order_id = 2`

is going to be counted three times! So `SUM(total)`

will give us `223.44`

, instead of the correct answer: `124.84`

.

**This is bad.** While it’s easy to avoid this kind of mistake when you have two tiny example tables, things get *way* more complicated in real life, with lots of tables and lots of data. This is exactly the kind of miscalculation someone could make without even realizing.

**This is the problem Symmetric Aggregates solves.**

## How Symmetric Aggregates Work

Symmetric Aggregates prevent analysts—and anyone using Looker—from accidentally miscalculating these types of aggregates. This is a huge burden off analysts’ shoulders, knowing that no one is going to charge ahead with incorrect data.

Symmetric Aggregates does this, in plain English, by keeping track of *what *you’re calculating and making sure to count each fact in the calculation the correct number of times. So, in the above example, Symmetric Aggregates realizes that `total`

is a property of `orders`

(not `order_items`

), so it needs to count each order’s total only once, to get the correct answer.

It does this by using a unique primary key analysts define in Looker. That means when Looker is doing calculations on the joined table:

`order_id | user_id | total | order_date | item_id | quantity | unit_price `

---------------------------------------------------------------------------

1 | 100 | 50.36 | 2017-12-01 | 50 | 1 | 23.00

1 | 100 | 50.36 | 2017-12-01 | 63 | 2 | 13.68

2 | 101 | 24.12 | 2017-12-02 | 63 | 1 | 13.68

2 | 101 | 24.12 | 2017-12-02 | 72 | 1 | 5.08

2 | 101 | 24.12 | 2017-12-02 | 79 | 1 | 5.36

3 | 137 | 50.36 | 2017-12-02 | 78 | 1 | 50.36

it says to itself, “Hey, even though there are two rows with `order_id = 1`

, I shouldn’t double count the total, because I’ve already included it in my calculation. And for the three rows with `order_id = 2`

, I should ignore the total in the second and third rows, because they’re repeats.”

It’s worth noting that Symmetric Aggregates depend on a *unique* primary key and the correct join relationship being specified in the model. So, if the results you’re getting look wrong, talk to an analyst to make sure that’s all set up right.

## Why Do Symmetric Aggregates Look That Way

If you haven’t seen Symmetric Aggregates in the wild, they can be a bit unsettling. When they kick in, Looker goes from writing nice, well-behaved SQL like:

SELECT

order_items.order_id AS "order_items.order_id",

order_items.sale_price AS "order_items.sale_price"

FROM order_items AS order_items

GROUP BY 1,2

ORDER BY 1

LIMIT 500

to wild, crazy SQL like:

SELECT

order_items.order_id AS "order_items.order_id",

order_items.sale_price AS "order_items.sale_price",

(COALESCE(CAST( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(users.age ,0)

*(1000000*1.0)) AS DECIMAL(38,0))) +

CAST(STRTOL(LEFT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))

* 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0)) )

- SUM(DISTINCT CAST(STRTOL(LEFT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))

* 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))) )

AS DOUBLE PRECISION)

/ CAST((1000000*1.0) AS DOUBLE PRECISION), 0)

/ NULLIF(COUNT(DISTINCT CASE WHEN users.age IS NOT NULL THEN users.id

ELSE NULL END), 0)) AS "users.average_age"

FROM order_items AS order_items

LEFT JOIN users AS users ON order_items.user_id = users.id

GROUP BY 1,2

ORDER BY 3 DESC

LIMIT 500

The exact format that Symmetric Aggregates takes depends on the dialect of SQL Looker is writing, but they’re all doing the same basic thing: If multiple rows have the same primary key, Symmetric Aggregates only counts them one time. It does this by using the little-known SUM DISTINCT and AVG DISTINCT functions that are part of the SQL standard.

To see how this happens, let’s take the calculation we did above and work it through with Symmetric Aggregates. Of the seven columns in the joined tables, we only need two: the one we’re aggregating (total) and the unique primary key for orders (order_id).

order_id | total

-----------------

1 | 50.36

1 | 50.36

2 | 24.12

2 | 24.12

2 | 24.12

3 | 50.36

Symmetric Aggregates takes the primary key (order_id in this case) and creates a very large number for each, that is guaranteed to be unique and always give the same output for the same input (It generally does this with a hashing function, the details of which are beyond the scope of this article.). So that’d look something like this:

big_unique_number | total

---------------------------------

802959190063912 | 50.36

802959190063912 | 50.36

917651724816292 | 24.12

917651724816292 | 24.12

917651724816292 | 24.12

110506994770727 | 50.36

Then for each row, Looker does this:

SUM(DISTINCT big_unique_number + total) - SUM(DISTINCT big_unique_number)

**And guess what. That works.**

It reliably gives you the correctly aggregated totals, counting each total exactly the right number of times. It isn’t fooled by repeated rows or by multiple orders that have the same total. Don’t believe me? Do the math yourself and you’ll get a better feel for how it works.

There’s no question that the SQL required to do this isn't the prettiest to look at:

with CAST() and md5() and SUM(DISTINCT) and STRTOL()

You certainly wouldn’t want to write it by hand. But luckily, you don’t have to. Looker writes it for you and it just works. No wrong answers, no joins that “aren’t allowed,” and no headaches for analysts.

*P.S. As of Looker 5.2, Symmetric Aggregates have gotten even smarter. Now, when an aggregation is going to work properly without needing Symmetric Aggregates, Looker detects this automatically and doesn’t bother with Symmetric Aggregates. Because Symmetric Aggregates impose some performance costs, this further optimizes the SQL that Looker generates, to make it as efficient as possible, while still guaranteeing the right answer.*