This article will walk you through the factors that can impact query performance in Looker. Because query performance is strongly tied to the underlying database, the following options will require the assistance of the personnel who set up the database system.
Declaring Indexes
Many queries will greatly benefit from the addition of proper indexes. You can look at the indexes of each table directly in Looker from our SQL runner page by clicking on the gear icon on the tables, then selecting Show Indexes :
The most common columns that can benefit from an index are: important dates and foreign keys. Adding indexes to these columns will increase performance for almost all queries. This also applies for Persistent Derived Tables. LookML has parameters that let you define indexes, sort keys, and distribution keys as appropriate.
Database Engine Configuration
Some databases allow different underlying database engines. For example, MySQL uses InnoDB and MyISAM as the default engine. InnoDB is also optimized for operational applications that require high-frequency transactions.
For analysis, we don't really need to worry about this feature set up, and are OK with table locking, as long as our queries run faster. For this reason, MyISAM is often a faster engine for MySQL analytics. Updates will occur when no queries are being run.
Database Specs and Resource Allocation
Some databases do not have sufficient hardware (or in AWS, the necessary provisioned resources) to deal with the size of the datasets. We recommend that you increase memory, cores, and I/O, (input/output) which will also increase performance of the queries.
Query Optimization
There are many opportunities to fine-tune your LookML model, to produce the same results, but at a lower cost to the database. Here are a few of the common patterns that have greatly improved performance for our customers:
- Using
EXPLAIN
for benchmarking - Convert correlated subqueries for dimensions and joins into derived tables
- Utilizing persistence
Using EXPLAIN for Benchmarking
Looker queries can all be represented in generated (raw) SQL, which is shown in the Explore page. We can use this generated SQL to detect where performance is lacking. All queries can be accessed via the SQL tab in the Explore UI.
The SQL Runner lets developers run an EXPLAIN
against the query to detect components that can be optimized. Learn more with our introduction to EXPLAIN.
Convert Correlated Subqueries for Dimensions and Joins Into Derived Tables
Whenever there is a sub-select in a dimension or a join, there is an opportunity to optimize queries. It is faster to create a derived table and join it in to the Explore you are working on, to produce the calculated results. It also lets persistence be applied, for even greater performance boosts (see the next section).
Utilizing Persistence
Looker has many options for queries to return faster, through caching, persistence, and datagroups. This is perfect for use cases that do not require up-to-the-second real-time data and can afford some stale reports. This can greatly increase performance of the queries that are run frequently every day by everyone.