The problem
Looker expects that each view defines exactly one dimension with the primary_key
parameter and that there are no duplicate values for a primary key field within a table. If this constraint is not respected, aggregates across joins can get counted two or more times, leading to unusually large numbers. Looker usually detects these and will throw an error such as the following:
Error: Non-Unique value/primary key (or sql_distinct_key), value overflow or collision when computing sum
This error is described in greater detail in the Error: Non-unique value/primary key (or sql_distinct_key), value overflow or collision when computing sum Help Center article.
Pinpointing the problem
To fix this error, we need to make sure each table has a unique primary key. To test a given field for uniqueness, we can run the following simple query in SQL Runner, which tells us how many rows are in the database table, per the value of that field:
SELECT field, count( * ) FROM table group by 1 order by 2 DESC
If there are any counts above 1 in the results, then that field is not unique and should not be used as the primary key.
Solutions
Ideally, there would be a field in the database that is unique at the row level; or, if there isn't, the database can be updated so that there is one (by adding a primary key column to a table, for example). The primary_key
parameter can be applied to that unique field. This article describes options for creating primary keys from within Looker.
Using compound primary keys
Sometimes, even if no single field in a table is unique, a combination of fields can be unique. In this case, we can make a compound primary key by concatenating two or more fields in the sql
parameter:
dimension: compound_primary_key { primary_key: yes hidden: yes type: string sql: CONCAT(${TABLE}.primary_key_1, ' ', ${TABLE}.primary_key_2) ;; }
Things to keep in mind about this approach:
- This example creates a compound primary key of
type: string
. If the underlying fields are integers, they may need to be cast to strings first. - Note:
CONCAT
works in some databases, like MySQL, but other databases might have a different equivalent. Redshift, for example, uses||
as its concatenation operator.
Using row_number
to generate a primary key for a derived table
If you receive this error with a derived table, you can use the row_number()
window function in Postgres and Redshift databases to make a unique field. This field can then be used as a primary key:
view: derived_table_name { derived_table { sql: SELECT row_number() OVER(ORDER BY created_at) AS prim_key, * FROM orders ;; } dimension: prim_key { type: number primary_key: yes sql: ${TABLE}.prim_key ;; } }
In MySQL, you can use a variable that iterates every row to achieve the same effect:
view: derived_table_name { derived_table { sql: SELECT CAST(@rownum := @rownum + 1 AS UNSIGNED) AS prim_key, t.* FROM orders t, (SELECT @rownum := 0) r ;; } dimension: prim_key { type: number primary_key: yes sql: ${TABLE}.prim_key ;; } }
Designating a primary key on a measure with sql_distinct_key
Even when a primary_key
is defined in each view (and each is in fact unique), you may see the non-unique value/primary key
error when running a query. This can occur when there are multiple one_to_many
joins in an Explore and there is a measure that aggregates a combination of values from two or more different tables. In this case, the combination of fields may no longer be associated with a unique primary_key
from the base table's point of view.
When this happens, you can add the parameter sql_distinct_key
to the measure that was causing the error to concatenate the primary keys from the views that are being combined.
The following example Explore, which joins the views aircraft
, accidents
, and flights
, demonstrates how this error may occur:
explore: aircraft { join: accidents { type: left_outer relationship: one_to_many sql_on: ${aircraft.tail_number} = ${accidents.registration_number} ;; } join: flights { type: left_outer relationship: one_to_many sql_on: ${flights.tail_num} = ${aircraft.tail_number} ;; } }
Now, say we have a unique primary_key
defined for each of the views involved. What will happen if we try to build a metric like the following?
view: accidents { dimension: morning_injuries { sql: CASE WHEN ${flights.depart_hour_of_day}::int < 12 THEN ${number_injured} ELSE 0 END ;; } measure: total_morning_injuries { type: sum sql: ${morning_injuries} ;; } }
The non-unique value/primary key
error will be thrown when total_morning_injuries
is included in a query. Let's take a look at one plane in the data set, N371UA, to see why.
In the data set, the aircraft N371UA has flown over 11,000 times and has two associated accidents (one with zero injuries, and one with 15). To pull in both the flights and the accidents data for the morning_injuries
dimension, Looker needs to join both those tables with the aircrafts
table.
If we want to aggregate morning_injuries
, we only want to count 15 once and zero once. But neither of the primary keys from flights or accidents can uniquely identify the morning_injuries
dimension, since morning_injuries
> combines fields from multiple views. In order to aggregate correctly, we need to write:
measure: total_morning_injuries { type: sum sql: ${morning_injuries} ;; sql_distinct_key: ${flights.primary_key}||${accidents.accident_number} ;; }
The concatenation of the primary keys indicates that there is one flight/accident combination with 15 injuries and one flight/accident combo with zero injuries. With the concatenated sql_distinct_key
, Looker can perform the aggregates correctly!