Running a regression on dates and then using the estimated parameters to produce a forecast presents two challenges: (1) you cannot do math on dates, and (2) it is not straightforward to add future dates to a model. This article walks you though the steps to accomplish this in Looker.
How to Do It in Looker: Option 1
Step 1
Right-join a calendar view to the Explore.
explore: linear_extrapolation
join: mr_calendar_dates
sql: RIGHT OUTER JOIN mr_calendar_dates ON calendar_dates.date_id = order.created_at
Step 2
Create two dimensions in the calendar view, that transform dates into a series of numbers. One of them should go into the past and have NULL values for the future (for training the model), and the other should go into the past and the future (for making predictions). I did it like this:
Note: This SQL will need to be modified according to your SQL dialect.
dimension: is_past {
type: number
sql: CASE WHEN ${date_date} > CURRENT_DATE() THEN NULL ELSE 1 END;;
hidden: yes
}
filter: test {
case_sensitive: yes
}
dimension: days_from_today {
case_sensitive: yes
type: number
sql: DATEDIFF(${date_date}, current_date());;
}
dimension: days_from_today_past{
type: number
sql: DATEDIFF(${date_date}, current_date())*${is_past};;
}
Step 3
We are now ready to go to the Explore. Select the two time dimensions we just made (x) and whatever measure you want to model (y). Take the ln(y), if you want to assume exponential growth. Filter on the date, selecting the amount of time in the past you want to train the model on, and the amount of time in the future you want to predict.
Step 4
Use these formulae to estimate intercept (a) and slope (b) using the time-series with NULL values for the future:
Or, putting this another way, you need sums and sum^2 of the time axis, counts of numbers in one of your columns, and then the sum and sum_product of the predicting set. Then, apply the estimated parameters to the time-series that has future values (and wrap the whole thing in exp() if you were regressing on ln(y)). Using table calculations, you can create this using the listed Functions for Table Calculations Only in this Document.
Step 5
Make the plot deselecting everything but y and predicted y. The x-axis label won’t be perfect, because we have multiple dimensions, and if you want to clean it up a bit, you can turn Axis Value Labels off.
This assumes you have already done the first part of creating the dates in the future.
The code looks like:
For the slope term: slope(${y}, ${x})
For the intercept: intercept(${y}, ${x})
Regression line: intercept + slope * ${x}
Where y
is the variable you want to predict, and x
is the predictor variable (time in this case).
If you want to make an exponential model like in the example, do this:
For the slope term: slope(ln(${y}), ${x})
For the intercept: intercept(ln(${y}), ${x})
Regression line: exp(intercept + slope * ${x})
How to Do it in Looker: Option 2
To create the dates isn’t always necessary; Looker can do the date filling at the query level. Just make sure you filter on a specific timeframe, with an end date in the future.