The Problem
A common issue users run into while modeling is sorting by pivots. For example, I might want to pivot my top 10 brands by total sales.
The most elegant solution I've found happens to work well for several other use cases. In this article I'll show how we can satisfy each in turn, by progressively adding to an ephemeral derived table.
The Solution
In this example we'll use a simple e-commerce model. Here I've pulled my top brands by revenue:
Now, if I'd like to see how these top 10 perform over time, a simple pivot won't help. It will sort alphanumerically by default:
I would have to filter my brand by each of the brand names that were in the top 10. This can work for a one-off analysis, but won't scale effectively for my team.
Sort by Pivots
To accomplish this, we can have Looker rank our brands for us, using an ephemeral derived table.
The idea is to dynamically calculate statistics at a brand level and then join them back into our model.
The first step is to create the query we'll use for the ranking. I'll often write these by hand, but a quick shortcut is to assemble the query in an Explore, then punch it out to SQL Runner, make any needed modifications, and then bring it into my project as a derived table:
SELECT products.brand AS "brand", COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue", # Included for clarity RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK" FROM public.order_items AS order_items LEFT JOIN public.orders AS orders ON ( orders.id = order_items.order_id) LEFT JOIN public.inventory_items AS inventory_items ON ( order_items.inventory_item_id = inventory_items.id ) LEFT JOIN public.products AS products ON ( inventory_items.product_id = products.id ) WHERE 1=1 GROUP BY 1 ORDER BY 2 DESC LIMIT 20
The query above groups by the dimension by which we wanted to pivot/sort, and establishes a rank function on total revenue. Now, we can bring it into our model as a derived table. You can run this in SQL Runner and then have Looker automatically generate the LookML.
Looker prompted me for a view name and I called our new view brand_rank_by_sales
:
view: brand_rank_by_sales { derived_table: { sql: SELECT products.brand AS "brand", COALESCE(SUM(order_items.sale_price ), 0) AS "total_revenue", RANK() OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0) DESC) AS "RNK" FROM public.order_items AS order_items LEFT JOIN public.orders AS orders ON orders.id = order_items.order_id LEFT JOIN public.inventory_items AS inventory_items ON order_items.inventory_item_id = inventory_items.id LEFT JOIN public.products AS products ON inventory_items.product_id = products.id WHERE 1=1 # This is a "templated filter" - any filter I place on my orders created date will be mirrored here AND {% condition orders.created_date %} orders.created_at {% endcondition %} GROUP BY 1 ORDER BY 2 DESC ;; } measure: count { type: count drill_fields: [detail*] } dimension: brand { type: string sql: ${TABLE}.brand ;; } dimension: total_revenue { type: number sql: ${TABLE}.total_revenue ;; } dimension: rnk { ype: number sql: ${TABLE}.rnk ;; } set: detail { fields: [brand, total_revenue, rnk] } }
Now, we can join it into the rest of our e-commerce Explore, as a many-to-one relationship joined on brand
:
explore: order_items { join: orders { relationship: many_to_one sql_on: ${orders.id} = ${order_items.order_id} ;; } join: users { relationship: many_to_one sql_on: ${users.id} = ${orders.user_id} ;; } join: inventory_items { type: left_outer sql_on: ${order_items.inventory_item_id} = ${inventory_items.id} ;; relationship: one_to_one } join: products { type: left_outer sql_on: ${inventory_items.product_id} = ${products.id} ;; relationship: many_to_one } join: brand_rank_by_sales { type: left_outer sql_on: ${products.brand} = ${brand_rank_by_sales.brand} ;; relationship: many_to_one } }
Now, we have the tools we need to look at the performance of our top 10 brands' revenue over time.
I can now pivot by my brand and simply filter the brand_rank_by_sales.rnk
field, on the condition of less than or equal to 10.
To allow clear sorting by our pivot field, we can combine the rank and the brand name by adding a new dimension that will support being sorted alphanumerically. The CASE WHEN
statement will simply prepend a 0 to ranks less than 10, for proper sorting.
dimension: ranked_brand { type: string sql: CASE WHEN ${rnk} < 10 THEN '0'|| ${rnk} || ') ' || ${brand} ELSE ${rnk} || ') ' || ${brand} END ;; }
Now, we can pivot by our brand_rank_by_sales.ranked_brand
field and filter for the top 10 rank:
Group Our Long Tail into a Single "Other" Bucket
Adding on to the pattern above, I may want to take everything below a certain rank and place it into a long tail or "other" bucket. This can be used for time-series analysis, but is especially helpful in pie charts. You can find more information on creating "other" buckets using table calculationsor using charts.
I'll simply add the following two fields:
filter: other_bucket_threshold { type: number } dimension: ranked_brand_with_tail { type: string sql: CASE WHEN {% condition other_bucket_threshold %} ${rnk} {% endcondition %} THEN ${ranked_brand} ELSE 'x) Other' END ;; }
The benefit of this approach is that the end-user can dynamically adjust the tail threshold, which may vary or need to be tuned for the subset under consideration.
Drill into Your Outliers Using Percentile Rank
Add the following field to your derived table:
,PERCENT_RANK () OVER (ORDER BY COALESCE(SUM(order_items.sale_price ), 0)) AS "percent_rank"
And the following dimensions to the view:
dimension: percent_rank { type: number value_format_name: percent_2 sql: ${TABLE}.percent_rank ;; } dimension: percentile_tiers { type: tier tiers: [0.25,0.5,0.75,0.9,0.95] sql: ${percent_rank} ;; }
Now, we can drill into our outliers: