These best practices reflect recommendations shared by a cross-functional team of seasoned Lookers. These insights come from years of experience working with Looker customers from implementation to long-term success. The practices are written to work for most users and situations, but as always use best judgment when implementing.
Query Performance
The following steps are a summary of recommended practices to implement in Looker, that lead to a good query performance:
Ensure that queries are built and executed optimally against your database. Utilize cache whenever possible, to reduce database query traffic.
- Build Explores using many_to_one joins, whenever possible.
- Joining views from the most granular level to the highest level of detail (many_to_one) typically provides the best query performance.
- Avoid joining views into Explores on concatenated primary keys declared in Looker.
- Join on the base fields from the view or persist the view with the compound primary key, using a persistent derived table for faster queries.
- Maximize cache usage, by applying datagroups within Explores using the
persist_with
parameter.- This enables Looker to integrate more closely with the backend data pipeline, so cache usage can be maximized without the risk of analyzing stale data. Named caching policies can be applied to an entire model and/or to individual Explores. By default, Looker caches queries for 1 hour.
- Warm the cache for popular dashboards, using schedules with datagroups.
- When the dashboard runs via the schedule, all Looks and tiles will be run, causing the underlying queries to be loaded into the cache.
- Use the dashboard auto refresh feature strategically.
- If a dashboard uses auto refresh, make sure it refreshes no faster than the ETL processes running behind the scenes.
Query Optimization
There are many opportunities to fine-tune your LookML model, to produce the same results at a lower cost to the database. Here are a few of the common patterns that have greatly improved the performance of our customers.
- This is a technique: Using EXPLAIN for Benchmarking
Looker queries can all be represented in raw SQL. We can use this SQL to detect where performance is lacking. All queries can be accessed via the SQL tab in the Explore UI. Or, you can also change the URL to obtain the underlying SQL. You could change this URL:
https://demo.looker.com/explore/thelook_redshift/users?fields=users.count
https://demo.looker.com/explore/thelook_redshift/users.sql?fields=users.count
The SQL Runner will allow developers to run an EXPLAIN against the query, to detect components that can be optimized. Learn more with our EXPLAIN article here.
- Convert Correlated Subqueries for Dimensions and Joins Into Derived Tables
Whenever there is a subselect in a dimension or a join, there is an opportunity to optimize queries. It is faster to create a derived table and join it in to the explore
to produce the calculated results. It also allows persistence to be applied for even greater performance boosts.
- Utilizing Persistence for PDTs
Looker has many options for queries to return faster, through caching and derived table persistence. This is perfect for use cases that do not require up-to-the-second real time data, and can afford some stale reports. This can greatly increase performance of the queries that are run frequently everyday by everyone.