Dialects that use these instructions
Cloudera Impala share the database setup requirements described on this page.
Configuring Looker to connect to Cloudera Impala
Looker connects to any database through a JDBC connection. By default, on Cloudera Impala this is the server running the impalad
daemon on port 21050.
In the Admin section of Looker, select Connections, and then select New Connection.
The Looker connection configuration depends on the security that is being used:
- A cluster not using Kerberos or user authentication
- A cluster requiring LDAP authentication
- A cluster secured with Kerberos, but not using Apache Sentry
- A cluster secured with SSL certificate authentication
Connecting to a cluster without Kerberos or user authentication
To configure a connection that isn’t using Kerberos or user authentication:
- On the Connection Settings page, leave the Username and Password fields blank. (The
*
next to the field names implies that these fields are required, but they are not.) - In the Additional Params field, enter
;auth=noSasl
.
Verifying the connection string
To verify the JDBC connection string in the log files, in the Looker Admin panel, click Log in the left menu. Then filter the log on a term such as jdbc
or noSasl
:
For more information about configuring Cloudera Impala to work with JDBC, see the Cloudera documentation.
Connecting to a cluster requiring LDAP authentication
For a cluster that requires LDAP authentication, including a cluster with Apache Sentry and Kerberos, on the Connection Settings page, enter a Username and Password with access to the schemas Looker will access.
Connecting to a cluster secured with Kerberos, but not using Apache Sentry
The Looker analyst team may need to assist in configuring this correctly.
Usually, Kerberos authentication with Cloudera environments is handled through Apache Sentry. See the Cloudera documentation for more details.
If you want to configure Looker to connect directly to Cloudera Impala using Kerberos authentication, follow the steps on this page.
Setting up the Kerberos client configuration
The first thing to do is ensure the installation of several pieces of software and the presence of several files on the Looker machine.
Kerberos client
Verify that the Kerberos client is installed on the Looker machine by trying to run kinit
. If it’s not, install the Kerberos client’s binaries.
For example, on Redhat/CentOS, this would be:
sudo yum install krb5-workstation krb5-libs krb5-auth-dialog
Java 8
Java 8 must be installed on the Looker machine and in the PATH
and JAVA_HOME
of the Looker user. If necessary, install it locally in the looker
directory.
Java Cryptography Extension
Download and install the Java Cryptography Extension (JCE) for Java 8 from the Oracle website.
- Locate the
jre/lib/security
directory for the Java installation. - Remove the following JAR files from this directory:
local_policy.jar
andUS_export_policy.jar
. - Replace these two files with the JAR files included in the JCE Unlimited Strength Jurisdiction Policy Files download.
It may be possible to use versions of Java prior to Java 8 with the JCE installed, but this is not recommended.
- Locate the
Update
JAVA_HOME
andPATH
in~looker/.bash_profile
to point to the correct installation of Java andsource ~/.bash_profile
or log out and in again.Verify the Java version with
java -version
.Verify the
JAVA_HOME
environment variable withecho $JAVA_HOME
.
gss-jaas.conf
Create a gss-jaas.conf
file in the looker
directory with these contents:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
doNotPrompt=true;
};
If necessary for testing, debug=true
can be added to this file like this:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
doNotPrompt=true
debug=true;
};
krb5.conf
The server running Looker should also have a valid krb5.conf
file. By default, this file is in /etc/krb5.conf
. If it is in another location, that must be indicated in the environment (KRB5_CONFIG
in the shell environment).
You may need to copy this from another Kerberos client machine.
lookerstart.cfg
Point to the gss-jaas.conf
and krb5.conf
files by making a file in the looker
directory (the same directory that contains the looker
startup script) called lookerstart.cfg
that contains the following lines:
If the krb5.conf
file is not at /etc/krb5.conf
then it will also be necessary to add this variable:
For debugging, add these variables:
Then restart Looker with ./looker restart
.
Authenticating with Kerberos
User authentication
If
krb5.conf
is not in/etc/
, then use the environment variableKRB5_CONFIG
to indicate its location.Run the command
klist
to make sure there is a valid ticket in the Kerberos ticket cache.If there is no ticket, run
kinit username@REALM
orkinit username
to create the ticket.The account used with Looker will likely be headless, so you can get a keytab file from Kerberos to store the credential for long-term use. Use a command like
kinit -k -t looker_user.keytab username@REALM
to get the Kerberos ticket.
Automatically renewing the ticket
Set up a cron job that runs every so often to keep an active ticket in the Kerberos ticket cache. How often this should run depends on the configuration of the cluster. klist
should give an indication of how soon tickets expire.
Setting up the Looker connection
In the Admin section of Looker, select Connections, and then select New Connection.
Fill out the connection details as follows (see the Connecting Looker to your database documentation page for more information):
- Name: The name of the connection. This is how the connection will be referred to in the LookML model.
- Dialect: Cloudera Impala or Cloudera Impala 3.1+.
- Host: Hostname.
- Port: Database port (21050 by default).
- Database: The default schema/database that will be modeled. When no database is specified for a table, this will be assumed.
- Username: Leave this blank.
- Password: Leave this blank.
- Persistent Derived Tables: Check if you’d like to use Persistent Derived Tables.
- Temp Database: A temporary schema/database for storing PDTs. This must be created beforehand.
- Additional Params: Additional parameters for the JDBC string.
Indicate the Kerberos principal here, for example
;principal=impala/impala.company.com@REALM
. Three-part principals are standard. The first (impala
) is generally the name of the service, and the last (REALM
) is generally the realm. SSL: Check to use SSL connections. If your SSL certificate is not issued by a widely recognized Certificate Authority and you are using a custom certificate, you will need to:
- Copy the certificate file to the Looker server. This is only available for customer-hosted Looker deployments.
- Add the following parameters to the Additional Params field mentioned above:
sslTrustStore=/path/to/your/trust_store.jks;trustStorePassword=yourpassword
See the Cloudera documentation for more details about how to form the correct JDBC strings for Impala databases.
Database Time Zone: The time zone of data stored in your database. Usually this can be left blank or set to UTC.
It is a best practice to have the server name (impala.company.com
in this example) be the canonical name for the server and its IP address’s reverse DNS lookup results in that name. However, the server name should be whatever is listed in the Kerberos domain controller:
Sometimes the server name is set to be the hostname, and not the fully qualified domain name. In this case it may be necessary to modify the /etc/hosts
and /etc/nsswitch.conf
files to make sure that reverse lookups resolve as intended.
Test the connection to make sure that it is configured correctly.
Debugging
- Cloudera’s documentation about debugging Impala authentication issues.
- When you add debugging to the configuration, the extra debugging information will wind up in
looker/logs/looker.log
.
Resources
- Enabling Kerberos authentication for Impala (Cloudera documentation)
- kinit man page
- klist man page
- gss-jaas.conf documentation
- Krb5LoginModule documentation
Notes
The user connecting to the scratch schema for PDTs must have read/write permissions.
Reference
For more information, see the Configuring Impala to work with JDBC section in the Cloudera documentation.
Feature support
For Looker to support some features, your database dialect must also support them.
In Looker 22.6, Cloudera Impala support the following Looker features:
In Looker 22.6, Cloudera Impala 3.1+ supports the following Looker features:
Next steps
After you have connected your database to Looker, configure sign-in options for your users.