# About This Block

AB testing lets you compare how a group (often of users) acts under two sets of conditions, and can be integral for making scientifically informed decisions about your business. For example, a web site that wants to increase visitors’ time on its site might consider redesigning its home page. By showing Option A to one group of visitors and Option B to another group, they have a set of data to compare, and can more easily understand how best to meet their goals.

You can also create groups from your existing data to compare. Adding statistical significance to this test ensures you are adding the proper rigor to your analysis, and it helps avoid erroneous conclusions. This block allows you to quickly do this kind of analysis inside Looker!

Please be sure to use this with the guidance of a trained statistician.

# Ideal Data Types

For a proper AB test to work with statistical significance, we need a few things:

- A large sample of users/events/observations from which to draw, so we can get statistically significant results.
- Some numerical or binary (yes/no) dimension of those observations in which we are interested.
- This could be anything from bounce rates on a website, purchases made by shoppers, to number of defects in products, and so on.

- An easy way to split the observations into two groups. This could be by ad campaigns, versions of a website, or method of manufacturing, and so forth.

Once we have sufficient data, all we need to do is choose our variable of interest and our splitter variable, and we are ready to go.

## Expected Output

Once we have chosen our variables, we can dive right into it.

In this simplified example, we are going to see how a user’s gender affects the number of orders they will make in their lifetime as a customer. In the following Look, we have split this up by age group (any arbitrary dimension will work) and can see the significance at each age tier.

## Try it yourself!

To create the AB testing template, we just need to add some filters, dimensions, and several measures, to any existing view file. Note that you can have any number of split variables, as long as they are reflected both as filters and as parts of the sql in the `dimension: a_b`

.

There are many ways to modify the mathematics specific to your use case, so be sure to consult a statistician to ensure the math is correct for you.

We will be using several filtered and otherwise complex measures to do statistical calculations for us. At the root, this patter relies on a filter the user will enter in the UI to determine what variable will split our two groups for the purposes of testing.

Below is example code with the necessary parts blocked out, in which you can place your own variables:

view: ab_test {

# Add any variables here for splitting your data into A and B groups (in this case gender and age)

filter: a_b_gender {}

filter: a_b_age {}

dimension: a_b {

type: yesno

sql:

{%condition a_b_gender %} ${gender} {% endcondition %}

AND {%condition a_b_age %} ${age} {% endcondition %}

-- AND ...

-- AND ...

-- AND ...

-- Add more variables here for splitting

;;

}

measure: count_a {

type:count

filters: {

field: a_b

value: "yes"

}

}

measure: count_b {

type: count

filters: {

field: a_b

value: "no"

}

}

# Put in the measures for your variable of interest (in this case, lifetime orders)

measure: average_lifetime_orders_a {

type: average

sql: 1.0 * ${lifetime_orders} ;;

filters: {

field: a_b

value: "yes"

}

}

measure: average_lifetime_orders_b {

type: average

sql: 1.0 * ${lifetime_orders} ;;

filters: {

field: a_b

value: "no"

}

}

measure: stdev_lifetime_orders_a {

type: number

sql: 1.0 * STDDEV(all CASE WHEN ${a_b} THEN ${user_order_facts.lifetime_orders} ELSE NULL END);;

value_format_name: decimal_2

}

measure: stdev_lifetime_orders_b {

type: number

sql: 1.0 * STDDEV(all CASE WHEN NOT ${a_b} THEN ${user_order_facts.lifetime_orders} ELSE NULL END ;;

value_format_name: decimal_2

}

measure: t_score {

type: number

sql:

1.0 * (${average_lifetime_orders_a} - ${average_lifetime_orders_b}) /

SQRT(

(POWER(${stdev_lifetime_orders_a},2) / ${count_a}) + (POWER(${stdev_lifetime_orders_b},2) / ${count_b})

) ;;

value_format_name: decimal_2

}

measure: significance {

sql:

CASE

WHEN (ABS(${t_score}) > 3.291) THEN '(7) .0005 sig. level'

WHEN (ABS(${t_score}) > 3.091) THEN '(6) .001 sig. level'

WHEN (ABS(${t_score}) > 2.576) THEN '(5) .005 sig. level'

WHEN (ABS(${t_score}) > 2.326) THEN '(4) .01 sig. level'

WHEN (ABS(${t_score}) > 1.960) THEN '(3) .025 sig. level'

WHEN (ABS(${t_score}) > 1.645) THEN '(2) .05 sig. level'

WHEN (ABS(${t_score}) > 1.282) THEN '(1) .1 sig. level'

ELSE '(0) Insignificant'

END ;;

}

# Further Analysis

Note that the preceding code only works for continuous scalar variables as the target variable of interest. For a binary variable, we will need to change our formula for standard deviation. Also remember that the significance level is calculated here only for “sufficiently large data sets.”

To create variations in this analysis and for proper use of these methods, consult with a statistician.

Additionally, there are a lot of other statistical methods out there for other types of analysis. For relatively simple ones like this, it’s most likely you can code it right into LookML.