After you have secured and configured your database, you’re ready to connect your database to Looker.
Creating a new database connection
Select Connections from the Database section in the Admin panel. On the Connections page, click the Add Connection button. Looker displays the Connection Settings page. The fields that the Connection Settings page displays depend on your dialect setting.
For more information about applying user attributes to connection settings, see the Connections section of the User Attributes documentation page.
For more information about using the PDT Overrides column to configure separate login credentials for PDT processes, see the Configuring Separate Login Credentials for PDT Processes section.
As an example, the following options are available for configuration when you’re connecting Looker to Amazon Redshift.
Name
The name of the connection as you want to refer to it. You should not use the name of any folders. This value does not need to match anything in your database; it is just a label that you assign. You’ll use it in the connection
parameter of your LookML model.
Dialect
The SQL dialect that matches your connection. It’s important to choose the correct value so that you are presented with the proper connection options, and so that Looker can properly translate your LookML into SQL.
SSH Server
The SSH Server option is available if the instance is deployed on Kubernetes infrastructure, and only if the ability to add SSH server configuration information to your Looker instance has been enabled. If this option is not enabled on your Looker instance and you want to enable it, contact your Looker account manager or open a support request in Looker’s Help Center.
The SSH server automatically chooses the localhost port for you, and it is not currently possible to specify the localhost port. If you need to create an SSH connection that requires you to specify a localhost port, contact your Looker account manager or open a support request in Looker’s Help Center.
To connect to your database using an SSH tunnel, turn on the toggle and select an SSH server configuration from the drop-down list.
Remote Host:Port
Your database hostname and the port that Looker should use to connect to your database host.
If you worked with a Looker analyst to configure an SSH tunnel to your database, in the Host field enter "localhost"
, and in the Port field enter the port number that redirects to your database, which your Looker analyst should have provided.
If you apply a user attribute to the Host field, the user attribute cannot have a user access level set to Editable.
If you configured an SSH tunnel to connect to your database, you cannot apply a user attribute to the Remote Host:Port field.
Database
The name of the database on your host. For example, you might have a hostname of my-instance.us-east-1.redshift.amazonaws.com
on which there is a database called sales_info
. You would enter sales_info
in this field. If you have multiple databases on the same host, you may need to create multiple connections to use them (with the exception of MySQL, in which the word database means something a little bit different than in most SQL dialects).
Use OAuth
For Snowflake and Google BigQuery connections, you have the option to use OAuth. This means that your users are required to log in to Snowflake or Google, respectively, in order to issue queries from Looker.
When you select Use OAuth, you will see the OAuth Client ID and OAuth Client Secret fields:
These values are generated from the Snowflake database or from Google. See the documentation page that describes Snowflake OAuth configuration or Google BigQuery OAuth configuration for the full procedure.
Username
The username that Looker should use to connect to your database. You should configure the user ahead of time according to our database configuration instructions.
Password
The password that Looker should use to connect to your database. You should configure the password ahead of time according to our database configuration instructions.
Schema
The default schema that Looker uses when a schema is not specified. This applies when you’re using SQL Runner, during LookML project generation, and when you’re querying tables.
Persistent Derived Tables
Check this box to enable persistent derived tables. This reveals additional PDT fields and the PDT Overrides column. Looker displays this option only if the database dialect supports using PDTs.
Note the following about PDTs:
- PDTs are not supported for Snowflake or Google BigQuery connections that use OAuth.
- Disabling PDTs on a connection does not disable the datagroups that are associated with your PDTs. Even if you disable PDTs, existing datagroups will still run their
sql_trigger
queries against the database. If you want to stop a datagroup from running itssql_trigger
query against your database, you must delete or comment out thedatagroup
parameter from your LookML project, or you can update the PDT And Datagroup Maintenance Schedule setting for the connection so that Looker checks PDTs and datagroups very infrequently or never. - For Snowflake connections, Looker sets the value for the
AUTOCOMMIT
parameter toTRUE
(Snowflake’s default value).AUTOCOMMIT
is required for SQL commands that Looker runs to maintain its PDT registration system.
Temp Database
Although this is labeled Temp Database, you’ll enter either the database name or schema name — as appropriate for your SQL dialect — that Looker should use to create persistent derived tables. You should configure this database or schema ahead of time, with the appropriate write permissions. On the Database configuration instructions documentation page, select your database dialect to see the instructions for that dialect.
Each connection must have its own Temp Database or Schema; they cannot be shared across connections.
Max PDT Builder Connections
The Max PDT Builder Connections setting lets you specify how many concurrent table builds the Looker regenerator can initiate on your database connection. The Max PDT Builder Connections setting applies only to the types of tables for which the Looker regenerator initiates rebuilds:
- Trigger-persisted tables (persistent derived tables and aggregate tables that use the
datagroup_trigger
orsql_trigger_value
persistence strategy). - Persisted tables that use the
persist_for
strategy, but only when thepersist_for
table is part of a cascade of derived tables where it is depended on by a table that uses thedatagroup_trigger
orsql_trigger_value
persistence strategy. In this case, the Looker regenerator will rebuild apersist_for
table, since the table is needed to rebuild another table in the cascade. Otherwise, the regenerator does not initiate builds forpersist_for
tables.
The Max PDT Builder Connections setting defaults to 1 but may be set as high as 10. However, the value cannot be higher than the value set in the Max Connections field or in the per-user-query-limit
set in Looker’s startup options.
Set this value carefully. If the value is too high, you may overwhelm your database. If the value is low, then long-running PDTs or aggregate tables can delay the creation of other persistent tables or slow down other queries on the connection. Databases that support multitenancy — such as BigQuery, Snowflake, and Redshift — may be more performant in handling parallel query builds.
If you want to increase the Max PDT Builder Connections setting, a good rule of thumb is to increase it by an increment of 1. If any unexpected behavior occurs, set it back to the default of 1. Otherwise, if query performance isn’t impacted, you can continue raising it incrementally by 1 and verifying the performance at each increment before further increasing the setting.
Note the following about the Max PDT Builder Connections setting:
- The Max PDT Builder Connections setting applies only to connections required for the rebuilding of tables, not to the connections needed for trigger checks. A trigger check is a query that checks whether the table’s persistence strategy is triggered; because these trigger check queries are always run sequentially, the Max PDT Builder Connections setting does not apply.
- In a clustered Looker instance, the regenerator runs only on the main node. The Max PDT Builder Connections setting applies only to the main node, and therefore sets the limit for the entire cluster.
- The Max PDT Builder Connections setting does not apply to the following types of tables. These types of tables are built consecutively:
- Tables persisted through the
persist_for
parameter (unless the table is depended on by tables using thedatagroup_trigger
orsql_trigger_value
strategies). - Tables in Development Mode.
- Tables rebuilt with the Rebuild Derived Tables & Run option.
- Tables where one depends on another in a dependency cascade. A table cannot build at the same time as a table it depends on. For example, if
table_B
depends ontable_A
, thentable_A
must finish rebuilding beforetable_B
can start to rebuild.
- Tables persisted through the
Always Retry Failed PDT Builds
The Always Retry Failed PDT Builds setting configures how the Looker regenerator attempts to rebuild trigger-persisted tables that failed in the previous regenerator cycle. The Looker regenerator is the process that rebuilds trigger-persisted tables (PDTs and aggregate tables) according to the interval that is configured in the PDT and Datagroup Maintenance Schedule connection setting. When the Always Retry Failed PDT Builds setting is enabled, the Looker regenerator will attempt to rebuild a PDT that failed in the previous regenerator cycle, even if the PDT’s trigger condition is not met. When this setting is disabled, the Looker regenerator will attempt to rebuild a previously failed PDT only when the PDT’s trigger condition is met. Always Retry Failed PDT Builds is disabled by default.
See the Derived tables in Looker documentation page for more information on the Looker regenerator.
Enable PDT API Control
The Enable PDT API Control setting determines whether the start_pdt_build
, check_pdt_build
, and stop_pdt_build
API calls can be used for this connection. When this setting is disabled, these API calls will fail when they reference PDTs on this connection. Enable PDT API Control is disabled by default.
Additional Params
You can include additional Java Database Connectivity (JDBC) parameters for your queries here, if needed.
To reference a user attribute in a JDBC parameter, use the Liquid templating syntax: _user_attributes['name_of_attribute']
. For example:
Here’s how it might look in the Additional Params field in Looker:
Additional JDBC parameters are untested by Looker and may cause unintended behavior.
PDT and Datagroup Maintenance Schedule
This setting accepts a cron
expression that indicates when the Looker regenerator should check datagroups and persisted tables (both aggregate tables and persistent derived tables) that are based on sql_trigger_value
), and see what tables should be regenerated or dropped.
The default value of */5 * * * *
means “check every 5 minutes,” which is the maximum frequency of checks. A cron
expression indicating more frequent checks will cause checks to occur every 5 minutes.
While PDTs are building, Looker does not perform additional trigger checks. Once all PDTs from the last trigger check are built, Looker will resume checking datagroup and PDT triggers based on the PDT and datagroup maintenance schedule.
If your database isn’t up 24/7, you may want to limit checks to times when the database is up. Here are some additional cron
expressions:
cron expression |
Definition |
---|---|
*/5 8-17 * * MON-FRI |
Check datagroups and PDTs every 5 minutes during business hours, Monday through Friday |
*/5 8-17 * * * |
Check datagroups and PDTs every 5 minutes during business hours, every day |
0 8-17 * * MON-FRI |
Check datagroups and PDTs every hour during business hours, Monday through Friday |
1 3 * * * |
Check datagroups and PDTs every day at 3:01am |
A few things to note when you create a cron
expression:
- Looker uses parse-cron v0.1.3, which doesn’t support
?
incron
expressions. - The
cron
expression uses the Looker application timezone to determine when checks are made. - If PDTs aren’t being built, reset the cron string back to the default of
*/5 * * * *
.
Below are some resources to assist with creating cron
strings:
- https://crontab.guru — Help editing and testing
cron
strings. - http://www.crontab-generator.org — Select time settings and the generator creates the corresponding
cron
string.
SSL
Choose whether or not you want to use SSL encryption to protect data as it passes between Looker and your database. SSL is only one option that can be used to protect your data; other secure options are described on the Enabling secure database access documentation page.
Verify SSL Cert
Choose whether you want to require verification of the SSL certificate used by the connection. If verification is required, the SSL Certificate Authority (CA) that signed the SSL certificate must come from the client’s list of trusted sources. If the CA is not a trusted source, the database connection is not established.
If this box is not selected, SSL encryption is still used on the connection, but verification of the SSL connection is not required, so a connection can be established when the CA is not on the client’s list of trusted sources.
Max Connections
Here you can set the maximum number of connections that Looker can establish with your database. For the most part, you are setting the number of simultaneous queries that Looker can run against your database. Looker also reserves up to three connections for query killing. If the connection pool is very small, then Looker will reserve fewer connections.
Set this value carefully. If the value is too high, then you may overwhelm your database. If the value is too low, then queries have to share a small number of connections. Thus many queries may seem slow to users as the queries have to wait for other, earlier queries to return.
The default value (which varies depending upon your SQL dialect) is typically a reasonable starting point. Most databases also have their own settings for the maximum number of connections they will accept. If your database configuration limits connections, ensure that your Max Connections value is equal to or lower than your database’s limit.
Connection Pool Timeout
If your users do request more connections than the Max Connections setting, the requests will wait for others to finish before they are executed. The maximum amount of time that a request will wait is configured here. You should set this value carefully. If it is too low, users may find their queries get cancelled because there isn’t enough time for other user’s queries to finish. If it is too high, large numbers of queries may build up causing users to wait for a very long time. The default value is typically a reasonable starting point.
Cost Estimate
New in Looker 22.6, you can enable cost estimates for Explore queries and SQL Runner queries on Amazon Aurora database connections. Also in Looker 22.6, cost estimates are enabled by default for MySQL connections and MySQL on Amazon RDS connections.
The Cost Estimate checkbox only applies to Snowflake, Amazon Redshift, Amazon Aurora, PostgreSQL, Google Cloud PostgreSQL, and Microsoft Azure PostgreSQL connections. It enables cost estimates for Explore queries and SQL Runner queries on the connection.
BigQuery and MySQL connections also support the cost estimate function, but the feature is always enabled, so there is no Cost Estimate checkbox for BiqQuery and MySQL connections.
See the Exploring data in Looker documentation page for more information.
SQL Runner Precache
In SQL Runner, all table information is pre-loaded as soon as you select a connection and schema. This enables SQL Runner to quickly display table columns as soon as you click on a table name. However, for connections and schema with many tables or with very large tables, you may not want SQL Runner to pre-load all the information.
If you prefer SQL Runner to load table information only when a table is selected, you can deselect the SQL Runner Precache option to disable SQL Runner pre-loading for the connection.
Fetch Information Schema For SQL Writing
For some SQL-writing features such as aggregate awareness, Looker uses your database’s information schema to optimize SQL writing. If the information schema is not cached, Looker may have to occasionally block SQL writing to the database in order to fetch the information schema. For dialects that use Hadoop Distributed File System (HDFS), fetching the information schema may take long enough to significantly affect the performance of your Looker queries. If you know that your information schema is slow, you can disable the Fetch Information Schema For SQL Writing option for your connection. Disabling this feature will prevent some of Looker’s SQL optimization for certain features, so you should enable the Fetch Information Schema For SQL Writing option unless you know that your connection’s information schema is particularly slow.
Disable Context Comment
The Disable Context Comment option applies to BiqQuery connections only. Context comments on Google BigQuery connections are disabled by default because context comments invalidate Google BigQuery’s ability to cache and can negatively impact cache performance. You can enable context comments for a BigQuery connection by deselecting the Disable Context Comment setting on the Connection Settings page for the connection. See the Google BigQuery documentation page for more information.
Database Time Zone
The time zone in which your database stores time-based information. Looker needs to know this so that it can convert time values for users, making it easier to understand and use time-based data. See the Using time zone settings documentation page for more information.
Query Time Zone
The Query Time Zone option is visible only if you have disabled User Specific Time Zones.
When User Specific Time Zones are disabled, the Query Time Zone is the time zone that displays to your users when they query time-based data, and the time zone into which Looker will convert time-based data from the Database Time Zone.
See the Using time zone settings documentation page for more information.
Configuring separate login credentials for PDT processes
If your database supports persistent derived tables, and you have checked the Persistent Derived Tables box in the connection settings, Looker displays the PDT Overrides column. In the PDT Overrides column, you can enter separate JDBC parameters (host, port, database, username, password, schema, and additional parameters) that are specific to PDT processes. This can be valuable for a number of reasons:
- By creating a separate database user for PDT processes, you can use PDTs in your model even if you assign user attributes to your database login credentials.
- PDT processes can authenticate through a separate database user who has a higher priority. This way the database can prioritize the PDT jobs over less-critical user queries.
- Write access can be revoked for the standard Looker database connection, and only granted to a special user that PDT processes will use for authentication. This is a better security strategy for most organizations.
- For databases such as Snowflake, PDT processes can be routed to more powerful hardware that is not shared with the rest of the Looker users. This way, PDTs can build quickly without incurring the cost of running expensive hardware full-time.
For example, the following configuration shows a connection where the username and password fields are set to user attributes. This way, each user can access the database using their individual credentials. The PDT Override column creates a separate user (pdt_user
) with its own password. The pdt_user
account will be used for all PDT processes, with access levels appropriate to PDT creation and update:
While the PDT Overrides column lets you change the database user and other connection properties, a PDT override must read the same data as the default connection, and it must write data to the same place. Looker cannot read data from one location and write it to another.
Testing your connection settings
Once you’ve entered the credentials, click Test These Settings to verify that the information is correct and the database is able to connect.
If your connection does not pass one or more tests:
- Try some of the troubleshooting steps on the Testing database connectivity documentation page.
- If you are running Mongo version 3.6 or earlier on Atlas and you get a communications link failure, see the Mongo Connector documentation page.
- To receive successful connection messages regarding the temp schema and PDTs, you must allow that functionality when you set up your Looker database. Instructions for doing so can be found on the Database configuration instructions documentation page.
Database connections that use OAuth, such as Snowflake and Google BigQuery, require a user login. If you are not logged in to your OAuth user account when you test one of these connections, Looker will show a warning with a Log In link. Click the link to enter your OAuth credentials or to allow Looker access to your OAuth account information.
If you are still having trouble, contact Looker Support for assistance.
Adding your database connection
Once you have configured and tested your database connection settings, click Add Connection. Your database connection is now added to the list on the Connections page.
Next steps
After you have connected your database to Looker, you’re ready to configure sign-in options for your users.