Usage
max_cache_age: "24 hours"
sql_trigger: SELECT max(id) FROM my_tablename ;;
interval_trigger: "12 hours"
label: "desired label"
description: "description string"
}
Hierarchydatagroup |
Default ValueNoneAcceptsAn identifier for your datagroup, plus sub-parameters defining your datagroup properties. |
Definition
Use datagroup
to assign a caching policy to Explores and/or PDTs. If you want different caching policies for different Explores and/or PDTs, then use a separate datagroup
parameter to specify each policy.
You can add a label and a description for the datagroup:
label
: Specifies an optional label for the datagroup. See thelabel
anddescription
section on this page for details.description
: Specifies an optional description for the datagroup that can be used to explain the datagroup’s purpose and mechanism. See thelabel
anddescription
section on this page for details.
Specify the details of the caching policy by using the datagroup
subparameters:
max_cache_age
: Specifies a string that defines a time period. When the age of a query’s cache exceeds the time period, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. See themax_cache_age
section on this page for details.sql_trigger
: Specifies a SQL query that returns one row with one column. If the value returned by the query is different than the query’s prior results, then the datagroup goes into a triggered state. See thesql_trigger
section on this page for details.interval_trigger
: Specifies a time schedule for triggering the datagroup, such as"24 hours"
. See theinterval_trigger
section on this page for details.
Often the best solution is to use max_cache_age
in combination with either sql_trigger
or interval_trigger
. Specify either a sql_trigger
or an interval_trigger
value that matches the data load (ETL) into your database, then specify a max_cache_age
value that will invalidate old data if your ETL fails. The max_cache_age
parameter ensures that if the cache for a datagroup isn’t cleared by sql_trigger
or interval_trigger
, then the cache entries will expire by a certain time. That way, the failure mode for a datagroup will be to query the database rather than serve stale data from the Looker cache.
A datagroup cannot have both
sql_trigger
andinterval_trigger
parameters. If you define a datagroup with both parameters, the datagroup will use theinterval_trigger
value and ignore thesql_trigger
value, since thesql_trigger
parameter requires using database resources when querying the database.For connections using user attributes to specify the connection parameters, you must create a separate connection using the PDT override fields if you want to define a datagroup caching policy using a SQL query trigger.
Without the PDT overrides, you can still use a datagroup for the model and its Explores, as long as you define the datagroup’s caching policy using onlymax_cache_age
, notsql_trigger
.
max_cache_age
The max_cache_age
parameter specifies a string containing an integer followed by “seconds”, “minutes”, or “hours”. This time period is the maximum time period for the cached results to be used by Explore queries that use the datagroup.
When the age of a query’s cache exceeds the max_cache_age
, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results.
When the age of a query’s cache exceeds the max_cache_age
, Looker invalidates the cache. The next time the query is issued, Looker sends the query to the database for fresh results. See the Caching queries and rebuilding PDTs with datagroups documentation page for information on how long data is stored in the cache.
If the Instant Dashboards Looker Labs feature is enabled, queries that are run from a dashboard will always run against the database; the previous runs data will be displayed on the dashboard until the query results are returned, regardless of the max_cache_age
value.
The max_cache_age
parameter defines only when the cache is invalidated; it does not trigger the rebuilding of PDTs. If you define a datagroup with only max_cache_age
, you will get a LookML validation warning if any derived tables are assigned to the datagroup. If you leave a derived table assigned to a datagroup with only a max_cache_age
parameter, the derived table will be built when the table is first queried, but the derived table will sit in the scratch schema indefinitely and never rebuild, even if it is queried again. If your intention is to have a PDT rebuild at a specific time interval, you should add a interval_trigger
parameter to your datagroup to define a PDT rebuild schedule.
sql_trigger
Use the sql_trigger
parameter to specify a SQL query that returns exactly one row with one column. Looker runs the SQL query at intervals specified in the PDT And Datagroup Maintenance Schedule field of the database connection. If the query returns a different value than the previous result, the datagroup goes into a triggered state. Once the datagroup is triggered, Looker rebuilds any PDTs with that datagroup specified in their datagroup_trigger
parameter. After the PDT rebuilding is complete, the datagroup goes into a ready state and Looker invalidates the cached results of any Explores using that datagroup.
Typically, sql_trigger
specifies a SQL query that indicates when a new data load (ETL) has occurred, for example by querying the max(ID)
in a table. You can also use sql_trigger
to specify a certain time of day by querying the current date and adding additional hours to that timestamp as needed to reach the time you want, for example 4 AM.
Looker does not perform time zone conversion for
sql_trigger
. If you want to trigger your datagroup at a specific time of day, set the trigger in the time zone that your database is configured for.
See these examples from the sql_trigger
parameter for ideas on setting up SQL queries to trigger a datagroup.
interval_trigger
You can use the optional interval_trigger
subparameter to specify a time duration for rebuilding. In the interval_trigger
parameter you pass a string containing an integer followed by “seconds”, “minutes”, or “hours”.
label
and description
You can use the optional label
and description
subparameters to add a customized label and a description of the datagroup. You can also localize these subparameters using locale strings files.
These subparameters are displayed on the Datagroups page in the Database section of the Admin panel. See the Admin options - Datagroups documentation page for more information on how these are displayed.
Examples
To create a caching policy that retrieves new results whenever there’s new data available or at least every 24 hours, do the following:
- Use the
orders_datagroup
datagroup (in the model file) to name the caching policy. - Use the
sql_trigger
parameter to specify the query that indicates that there is fresh data:select max(id) from my_tablename
. Whenever the data has been updated, this query returns a new number. - Use the
max_cache_age
setting to invalidate the data if it has been cached for 24 hours. - Use the optional
label
anddescription
parameters to add a customized label and a description of the datagroup.
To use the orders_datagroup
caching policy as the default for Explores in a model, use the persist_with
parameter at the model level, and specify the orders_datagroup
:
To use the orders_datagroup
caching policy for a specific Explore, add the persist_with
parameter under the explore
parameter, and specify the orders_datagroup
. If there is a default datagroup specified at the model level, you can use the persist_with
parameter under an explore
to override the default setting.
To use the orders_datagroup
datagroup caching policy for rebuilding a PDT, you can add datagroup_trigger
under the derived_table
parameter, and specify the orders_datagroup
: