These best practices reflect recommendations shared by a cross-functional team of seasoned Lookers. These insights come from years of experience working with Looker customers from implementation to long-term success. The practices are written to work for most users and situations, but as always use best judgement when implementing.
Optimizing Column Store Databases
Some recommendations for optimizing column store database setup for running SELECT statements (reads) include:
- Utilize column encoding and compression. This saves on space and improves processing speed. If column encoding/compression is not utilized, database resources will be more scarce than they need to be, limiting cluster performance and increasing storage costs.
- Gather statistics on all tables regularly. Analyzing tables will update column-level metadata.
- Vacuum tables on a regular basis. This reclaims space and resorts rows in either a specified table or all tables in the current database.
- Limit the number of columns returned in queries. Queries will execute faster when operating on a relatively small subset of columns.
- Maximize the use of filters and aggregation in queries. Issue queries that reduce many rows to relatively few in the result set—i.e., lots of filtering and aggregating. To ensure that users apply filters by default within Explores, include Explore-level filters such as the always_filter or conditionally_filter parameters.
Optimizing Row Store Databases
Some recommendations for optimizing row store database setup for running SELECT statements (reads) include:
- Normalize database schemas as much as possible. Transactional databases perform best when redundancy is minimized. Organize data in normalized, snowflake schemas in which all tables are as small as possible.
- Restructure data to reduce the use of sparsely populated tables. A sparse data matrix can impact overall performance. Rather than storing data in a single, sparse table, separate out columns into separate, more populated tables wherever possible. For extremely sparse data tables, consider setting up an entity-attribute-value schema.
- Consider handling some complex calculations within the database in advance. Consider storing some aggregate tables within the database in order to perform these complex calculations in advance. However, still utilize the raw data whenever possible, as this supports greater drilldown capability and explore flexibility.
- Ensure that indexes are present on all tables within the database. Indexes are vital, as these enable the database to order columns such that it can find different rows on the disk quickly. However, ensure that superfluous indexes aren't added, as indexes do take up space on disk.
- Place indexes on foreign keys or timestamps. It is recommended that indexes be placed on foreign keys for commonly joined tables and timestamps for time-ordered tables.
- Lower the isolation level within the database. Lowering the isolation level will decrease replication lag and moderate the database’s use of locking conditions.