Usage
measure: field_name {
allow_approximate_optimization: yes
}
}
Hierarchyallow_approximate_optimization |
Possible Field TypesMeasureDefault Valueno AcceptsA Boolean (yes or no) |
Definition
For dialects that support HyperLogLog sketches, Looker can leverage the HyperLogLog algorithm to approximate distinct counts for aggregate tables.
The allow_approximate_optimization: yes
statement enables Looker to store HyperLogLog sketches in aggregate tables, which means that Looker can use approximations for distinct counts for aggregate awareness.
See the Dialect support for distinct counts with aggregate awareness section on this page for the list of dialects that support distinct counts for aggregate tables using HyperLogLog sketches.
In general, distinct counts can’t be supported with aggregate awareness because you can’t get accurate data if you try to aggregate distinct counts. For example, if you are counting the distinct users on a website, there may be a user who came to the website twice, three weeks apart. If you tried to apply a weekly aggregate table to get a monthly count of distinct users on your website, that user would be counted twice in your monthly distinct count query, and the data would be incorrect.
One workaround for this is to create an aggregate table that exactly matches an Explore query, as described on the Aggregate awareness documentation page. When the Explore query and an aggregate table query are the same, distinct count measures do provide accurate data, so they can be used for aggregate awareness.
The other option is to use approximations for distinct counts. The HyperLogLog algorithm is known to have about a 2% potential error. The allow_approximate_optimization
parameter requires your Looker developers to acknowledge that it’s okay to use approximate data for the measure so that the measure may be calculated approximately from aggregate tables.
With aggregate awareness, there are two cases where distinct counts come into play:
- The first case is with measures of
type: count_distinct
. - The second case is with measures of
type: count
that are actually being rendered by Looker ascount_distinct
measure types. As discussed on the Aggregate awareness documentation page, Looker renderscount
measures ascount_distinct
to avoid fanout miscalculations in Explores that join multiple database tables.
In both of these cases, if your dialect supports HyperLogLog sketches, you can add the allow_approximate_optimization: yes
statement to measures to enable approximate values. You can then include these measures in aggregate tables.
Even for measures defined with
allow_approximate_optimization: yes
, Looker will return exact data when possible. For example, if the dimensions in an Explore query are a perfect match of the dimensions in an aggregate table, Looker can provide exact data for distinct counts, without having to approximate. In this case, you will see in the Explore’s SQL tab that distinct count measures are being used for aggregate awareness without employing the HyperLogLog algorithm.
Example
The apx_unique_count
measure shown in this example is set for allow_approximate_optimization: yes
, which means that the measure can be used in an aggregate_table
.
Dialect support for distinct counts with aggregate awareness
Looker can use distinct counts for aggregate awareness with database dialects that support HyperLogLog sketches. In Looker 22.6, the following SQL dialects are supported for distinct counts with aggregate awareness:
Check your SQL dialect’s documentation to understand the speed and accuracy tradeoffs of this method.