Why Care About Primary Keys?
Looker has the ability to be a powerful SQL engine for your company. Abstract modeling in LookML allows data teams and IT teams to build general rules which are always true, freeing business analysts to build queries in the wild which are always correct even if the data team never anticipated them. The core driver of this ability is the symmetric aggregate algorithm which solves an industry-wide issue with SQL joins. However, two things must be done correctly to take advantage of the algorithm: primary keys must be accurate in every view that contains a measure (typically all of them), and relationship parameters must be correct in every join.
Getting Your Primary Key Right
In many ways, understanding the primary key of a table is essentially the same as understanding what the table is and what might be done with it. The only thing that needs to be true is that the column (or set of concatenated columns) you choose as the primary key must have no repeated values in it. This is easy to test in Looker's SQL Runner. Here’s one way to do that:
select count(*), count(distinct column_i_think_is_primary_key) from my_table
If the two counts return the same number, you have your primary key. If not, consider what additional column would distinguish between rows and concatenate that column within the count distinct. The concat( ) function call can be different in your database's SQL dialect but this is the general idea:
select count(*), count(distinct concat(column_i_think_is_almost_a_primary_key, additional_column_to_help)) from my_table
Continue concatenating additional columns inside the count distinct until a set is found which can describe a unique row. Then, create a dimension that concatenates the same columns.
When you have found the combination of columns that can serve as a primary key, copy the URL from SQL Runner and paste it in a comment next to the primary_key designation. That way anyone who looks at your view file later on will know the primary key has been validated.
The primary key doesn’t need to be enforced by the database for Looker to use it.
How Does Looker Help Me?
There are mechanisms in Looker to help ensure this is done right. One is a check for primary key uniqueness. Whenever there is a fanout and symmetric aggregates are needed to compute a measure, Looker checks the leveraged primary key for uniqueness. If it is not unique, an error will appear at query run time (but there is no LookML Validator error for this).
Also, if there is no way for Looker to handle a fanout (usually because no primary key is indicated), no measures will appear in the Explore from that view. Simply designate a field as the primary key to let your measures get into the Explore.
Setting Correct Relationship Parameters for Joins
Once you have your primary key identified, be sure to set the relationship parameters correctly as described in the Getting the Relationship Parameter Right article.