Suppose you want to build an Explore that lets users start from a "one" table and then fan out into more detailed "many" records from multiple tables. In this situation, you can often feel constrained by the fact that, by default, these joins will conflict.
For example, if you have an application where a user can order service and then separately make payments against their account, you will find that there is no direct relationship between orders and payments. So, you will want to let your Explore expose both orders and payments as child entities for a given user, but not with a specific relationship.
In the Outer Join on False — or: How I learned to stop fanning-out and love the NULL Looker Community topic, we talk about this approach; however, that approach tries to handle a broader class of possible queries, creating a model that is very difficult to generate and maintain.
Here, we will show you how to take an existing Explore that starts from a "one" table and has multiple one_to_many
joins, and make it work more smoothly with minimal dynamic SQL code (you could also apply this approach to a specific set of joins within a larger Explore).
Example
Let's say our example Explore started out like this:
explore: users {
join: orders {
relationship: one_to_many
sql_on: ${orders.user_id} = ${users.id} ;;
}
join: payments {
relationship: one_to_many
sql_on: ${payments.user_id} = ${users.id} ;;
}
}
We want the orders and the payments to avoid "mixing" with or "fanning out" one another. This is as simple as adding a join:
explore: users {
join: user_join_paths {
type: cross
relationship: one_to_one
# The user_join_paths view has this SQL that selects two literal rows:
# SELECT 'orders' as path
# UNION ALL
# SELECT 'payments' as path
}
...
Then, we constrain our orders
and payments
joins to their own parts of the query:
join: orders {
sql_on: ${user_join_paths.path} = 'orders'
AND ${orders.user_id} = ${users.id} ;;
relationship: one_to_one
}
join: payments {
sql_on: ${user_join_paths.path} = 'payments'
AND ${payments.user_id} = ${users.id} ;;
relationship: one_to_one
}
}
Although we have changed the specified join relationships from one_to_many
to one_to_one
, this relationship is not technically true in the way these tables join. We are managing the fanout on our own, and this declaration tells Looker not to worry about applying symmetric aggregates.
Outcome
Here is what the intermediate query looks like after the first join:
User | Path |
---|---|
Alice | Orders |
Alice | Payments |
Bob | Orders |
Bob | Payments |
Also, if we join on our orders and payments:
User | Path | Orders.Item | Orders.Amt | Payments.Amt |
---|---|---|---|---|
Alice | Orders | Widget | $5.00 | Ø |
Alice | Orders | Sprocket | $1.00 | Ø |
Alice | Payments | Ø | Ø | $1.00 |
Alice | Payments | Ø | Ø | $1.00 |
Alice | Payments | Ø | Ø | $1.00 |
Bob | Orders | Widget | $5.00 | Ø |
Bob | Payments | Ø | Ø | Ø |
Bob has no orders.
From here, any of our familiar dimensioning and measuring operations on orders or payments will return a valid result:
Be careful that any counts you do are in the form ofCOUNT(${TABLE}.id)
instead ofCOUNT(*)
. You can do this like:
measure: my_count { type:count filters: [id: "-null"] }
As a matter of optimization, we can change that cross join in the middle of our query to a conditional left join, depending on what views a user has selected:
Some database dialects, such as BigQuery, will not allow this conditional join because it doesn't reference both sides of the table. In this case, either the cross join method above or filtering outside of the join are the best options.
join: user_join_paths {
type: left_outer
sql_on: 0=1
{% if orders._in_query %} OR ${user_join_paths.path} = 'orders' {%endif%}
{% if payments._in_query %} OR ${user_join_paths.path} = 'payments' {%endif%}
;;
}
Full Example with FAA Data
explore: carriers {
view_label: "[Carriers]"
join: carrier_joinpaths {
relationship: one_to_one
sql_on: 0=1
{% if accidents._in_query %} OR ${carrier_joinpaths.path} = 'accidents' {%endif%}
{% if flights._in_query %} OR ${carrier_joinpaths.path} = 'flights' {%endif%}
;;
}
join: accidents {
relationship: one_to_one
sql_on: ${carrier_joinpaths.path} = 'accidents'
AND ${accidents.air_carrier} = ${carriers.name}::varchar;;
}
join: flights {
relationship: one_to_one
sql_on: ${carrier_joinpaths.path} = 'flights'
AND ${flights.carrier_id} = ${carriers.id};;
}
}
view: carrier_joinpaths {
derived_table:{
sql:
SELECT 'accidents' as path
UNION ALL
SELECT 'flights' as path
;;
}
dimension: path {hidden:yes}
}
#Watch out for type:count measures in your views!
view: carriers{...}
view: flights{...}
view: accidents{...}
For only a handful of additional lines of LookML, you get some pretty powerful stuff, like filtering in those sub-tables without affecting your other joined tables, and without writing manual derived tables for them:
Frequently Asked Questions
1. Conforming the Paths
What if I want to group by date; which date do I use? For example, what if I want to see each user's payments and orders by date?
Just group the relevant rows from the two join paths by using a COALESCE
statement:
view: combined_date {
# reference this in your explore as join: combined_date { sql: ;; relationship: one_to_one }
dimension_group: combined_date {
sql: COALESCE(
{% if orders._in_query %}orders.date_col,{% endif %}
{% if payments._in_query %}payments.date_col,{% endif %}
NULL
);;
}
Of course, the same goes for any other shared denormalized dimension fields, or even for shared foreign keys to a dimension table for joining.
2. Subtotals
Can I mix subtotals into an Explore?
If you are feeling clever and want to dig a bit more, you can also use this approach to mix subtotals into an Explore. For more reading on subtotals, you can check out this Looker Community topic on Subtotals in Looker, which explains a subtotal approach in detail. From there, you can use this pattern to mix the subtotals in, as one of these paths.
3. Measuring Users
But symmetric aggregates would still be needed for any measure on users, right?
Looking at the example tables shown above, that seems true. After all, you can see the users are necessarily being repeated. But, let's back up a bit first.
The premise of this article is that we would rather apply regular aggregates on a non-fanned-out table than apply symmetric aggregates on a fanned-out table. In other words, dimensions can be taken from fanned-out tables, but, if we plan to be able to measure something, can we work it into the query in a non-fanned-out way? The answer is yes!
In the users example, the user table is fanned out. If we also needed to measure on the user table, we could join it a second time, on its own join path. The first user join, then, would be useful for dimensions for each of the user, order, and payment records; and the second (unfanned) user join could be used for measuring users. This is actually quite easy to do in LookML, by referencing the same view twice and using the fields
parameter to exclude measures from the Explore, and then to only include measures from the path-restricted view.
4. Optimizations on Large Tables
If your tables are large, which is often the case with these types of queries, you will want to make sure your tables have a commonly filtered sort key, partition column, or clustered index. Then, to benefit from these, you will want to push these filters down into subqueries on each individual table, using templated filters, as described in the Templated Filters and Liquid Parameters documentation page. It may sound like we are heading back towards a dynamic SQL code, but this is quite maintainable, because it only has to be done for a specific column per view, rather than for every dimension/measure/filter you add to the view.