CUPED-like variance reduction
Its usage in SEO metrics and beyond
August 2023
Bunsen metrics
Presentation title
CUPED:
our history with the method
Section
What is CUPED?
Presentation title
Section
CUPED is a well-established variance reduction technique for experiments (paper).
Unlike methods like outlier removal or winsorization, it doesn’t sacrifice any data integrity.
It can be used together with winsorization, which enhances its effectiveness.
It requires pre-experimentation data - which we generally have in Bunsen.
For a rather unpredictable event like ad clicks, we get ~5% reduction in standard deviation (which is ~10% reduction in variance, and experiment run time). For something like sessions, that 5% can increase to something like 30%.
3
That sounds awesome! Why haven’t we used it?
Presentation title
Section
I (dcjkwon) first learned about CUPED in mid-2019. The DS team back then all agreed that this was awesome and that we wanted to implement it.
But there were several problems:
The result is that CUPED has not seen widespread use in our experimental analysis.
4
So why should we start using it now?
Presentation title
Section
Several new developments in the last year have come together to negate nearly all of the previous obstacles.
All this allowed me a better understanding of the principles behind CUPED, and let me write something CUPED-like in a macro that “just works” for all our metrics.
5
Presentation title
CUPED:
The “official” method
Section
How do we implement CUPED?
Presentation title
Section
You can read the paper here. The part that people remember for implementation goes like this:
Define (eq. 3):
Where Y is the quantity you want to measure (for example, sessions), and
X is the pre-experiment value for that quantity (sessions 1 week before exp), and
Θ is the OLS linear regression coefficient for Y = f(X)
Then the effect size is (eq. 7):
7
How do we implement CUPED?
Example with actual data
Presentation title
Section
For the sake of coding (and my convenience), we will write:
As:
Y_cv = Y_bar - theta * X_bar + theta * E(X).
Then this notebook shows how to implement CUPED line-by-line. Let’s take a look.
8
How do we implement CUPED?
A deeper method:
Presentation title
Section
I, personally, don’t like the above method, because I’m bad at stats notations. Fortunately, there is a deeper, better way of doing CUPED, which the paper mentions. But few pay attention to it because it’s only mentioned in passing, and the key formula can’t be used directly for implementation. The formula is this (eq. 6):
Where f(X) is ANY reasonable regression function for Y based on X. As before, we will write this as
Y_cv = Y_bar - f(X)_bar + E(f(X))
We will next explore the meaning of this formula.
9
Presentation title
CUPED:
An intuitive understanding
Section
Predicted uncertainty is not uncertainty
Presentation title
Section
Predicted uncertainty is not uncertainty.
So the variance in your experiment metric should only take into account the unpredictable part of your measurement. If you predict that:
And when you actually measure the sessions, you get:
Then the variance, or the uncertainty, in your measurement can be calculated on the residuals of your predictions, rather than on the measurements themselves. This gives var([-1, +1]), rather than var([4, 21]), which is a great reduction.
11
Connecting with intuition: the general CUPED equation
Presentation title
Section
Consider the general CUPED equation:
Y_cv = Y_bar - f(X)_bar + E(f(X))
First, let’s remove the cohort-wide aggregation. We’ll save it to the end as the last step, as we do in our metrics. Then, on a per-guv basis, we have:
Y_cv = Y - f(X) + E(f(X))
Remember, f(X) is the result of ANY regression for Y based on X. So we’ll re-label f(X) as Y_pred. Then Y - Y_pred is the residual of your prediction, the part you couldn’t predict. We’ll call it Y_residual.
Y_residual is expected to be zero on average. So we add back in E(f(X)), which is the average predicted value - a constant - for the whole experimental population. This simply makes it so that the Y_cv has the same average value as Y.
12
Connecting with intuition: the linear regression equation
Presentation title
Section
So we’ve rewritten: Y_cv = Y_bar - f(X)_bar + E(f(X))
To:
Y_cv = Y_residual + Y_avg_predicted_value.
Remember the linear regression version of CUPED. There the equation (after removing the bars) was:
Y_cv = Y - theta * X + theta * E(X).
The prediction of linear regression is Y_pred = theta * X - theta * E(X) + E(Y). Rearranging and substituting into the above equation, we get:
Y_cv = Y - Y_pred + E(Y)
= Y_residual + Y_avg_predicted_value
So the linear regression CUPED is just a special case of the more general CUPED equation. Remember, You can use ANY reasonable prediction model for CUPED. Linear regression is just one possibility.
13
Connecting with intuition: the overall program
Presentation title
Section
So here’s how to implement: Y_cv = Y_residual + Y_avg_predicted_value.
Let’s see how it works, in this notebook:
14
The rules for a “reasonable” prediction model
Presentation title
Section
But what about the objection that we’re “cheating” in our “prediction”, by using all the data? It turns out that this is okay. In order for our program to work, we only need:
X to be completely independent of the cohorts.
This was the point of using pre-experiment values. This means that the model, and the predictions, DOES NOT KNOW what cohort it’s making the predictions for. This prevents the model from overfitting to Y at the cohort level, and therefore preserves the genuine “surprise” in the experimental treatment effect that shows up in the Y_residual.
avg(f(X)) has to be equal to avg(Y) for any set of X, Y.
The average of the predictions have to be the same as the average of the actual Y values used to train the model. This means that the model isn’t biased, and it’ll will be true for virtually any ML model.
Let’s again see this demonstrated in the notebook.
15
Presentation title
CUPED:
Conclusion, links, etc.
Section
Everything is CUPED
Presentation title
Section
With this understanding, we can think of all kinds of different experimental analysis techniques as CUPED, depending on how you make your predictions.
Prediction: Y_pred = 0.
Leads to: standard t-test (the vast majority of our metrics)
Prediction: Y_pred = X, where X is the same quantity measured in the pre period.
Leads to: “simplified diff-in-diff”, or post-minus-pre analysis
Prediction: Y_pred calculated by stratifying X, and taking the mean of each strata
Leads to: stratification CUPED (current macro in bunsen_metrics)
Prediction: Y_pred calculated by linear regression on X
Leads to: standard CUPED
17
Everything is CUPED
Presentation title
Section
Prediction: Y_pred based on full-blown ML on multi-dimensional X
Leads to: general ML CUPED
Prediction: Y_pred = X, where X is the paired value in the control cohort. Basically, just stratification trained only on the control cohort.
Leads to: paired t-test
Prediction: Y_pred = AVG(X), where X is both values in the pair. Basically, just stratification trained on all cohorts.
Leads to: paired t-test (be careful about covariances!)
18
Next steps
Presentation title
Section
Currently, the macro in bunsen_metrics implements the stratification method,
and the SEO sessions metrics and the guv-based ad click inventory metrics uses the macro.
The macro is quite flexible. Any other method of prediction can be added to it, provided that it can be implemented in SQL.
More metrics should start to use the macro. It’ll basically get you ~5%-40% reduction in standard deviations, at no sacrifice in data integrity, and minor increase in query complexity.
HOWEVER, there is currently an issue with previous experiment_runs in the same experiment interfering with the CUPED process, because previous values would then have current cohort information, since cohorts are not randomized between experiment runs. We should discuss this.
19
Links
Presentation title
Section
Notebook with CUPED demo, used in this deck:
Variance reduction demo notebook for ad click inventory:
AA p-value demo notebook for ad click inventory:
Variance reduction demo notebook for SEO organic sessions:
First CUPED PR with some discussion in the comments:
Earlier version of this deck with notebook summaries:
Questions? Comments? Contact me! (@dcjkwon)
20