What Is CUPED
You've set up a clean A/B test. The randomization is solid. The metric is well-defined. But two weeks in, you still don't have a statistically significant result — and your stakeholders are asking when you'll have an answer.
The problem might not be your experiment design. It might be noise.
CUPED is a technique that reduces the noise in your experiment without collecting more data — effectively making your experiments faster and more sensitive. In this post, I'll explain what it is, why it works, and when to use it.
1. The Real Bottleneck in A/B Testing
To detect a small effect with confidence, you need a large sample. The standard formula for sample size looks roughly like this:
n ∝ σ² / Δ²
- σ² = variance of your metric
- Δ = the minimum effect size you want to detect
Most teams focus on Δ — they debate what a meaningful lift looks like. But σ² is equally important. If you can reduce the variance of your metric, the required sample size drops — and your experiment reaches significance faster.
CUPED is a method for doing exactly that.
2. The Core Idea: Use What You Already Know
The key insight behind CUPED is this:
Some of the variance in your metric during the experiment is predictable — because you've seen how users behave before the experiment started.
Users are not blank slates when they enter your experiment. A user who spent $200 last month is probably going to spend more this month too, regardless of which variant they're in. That pre-experiment behavior is a signal — and CUPED uses it to remove predictable variance from your outcome metric.
Here's the adjustment:
Y_adjusted = Y - θ · (X - E[X])
Where:
- Y = your outcome metric during the experiment (e.g., revenue per user)
- X = your pre-experiment covariate (e.g., revenue per user in the prior period)
- E[X] = the mean of X across all users
- θ = a coefficient that captures how much X predicts Y
The adjusted metric Y_adjusted has the same mean as Y — so your estimate of the treatment effect is unbiased. But it has lower variance, because the part of Y that X could have predicted has been subtracted out.
3. Why Does Variance Actually Decrease?
Intuitively: if two users both had high revenue last month, their revenue during the experiment will also tend to be high — but for reasons that have nothing to do with your treatment. CUPED removes that shared "baseline noise," leaving only the variation that's actually relevant to detecting your effect.
Mathematically, the variance reduction is:
Var(Y_adjusted) = Var(Y) · (1 - ρ²)
Where ρ is the correlation between your covariate X and your outcome Y.
This means:
- If ρ = 0.7, variance drops by 51% → sample size needed cuts nearly in half
- If ρ = 0.5, variance drops by 25% → meaningful but more modest gain
- If ρ = 0.0, no reduction at all — CUPED doesn't help
The stronger the correlation between past and present behavior, the bigger the benefit.
4. A Concrete Example
Imagine you're running an experiment on an e-commerce platform to test a new checkout flow. Your metric is revenue per user over 14 days.
Without CUPED:
- High variance because users differ wildly in spending — some users buy once for 500 regularly
- You need 50,000 users per variant to reach 80% power
With CUPED:
- You use each user's revenue from the prior 14 days as the covariate
- Past and present revenue are strongly correlated (ρ ≈ 0.65)
- Variance drops by ~42%
- You now need roughly 29,000 users per variant — a 42% reduction in required sample size
Same experiment. Same metric. Faster answer.
5. What Makes a Good Covariate?
Not every pre-experiment variable works equally well. A good covariate for CUPED should be:
Strongly correlated with the outcome metric The whole point is variance reduction via correlation. If your covariate has weak predictive power, CUPED won't help much.
Measured before the experiment started This is non-negotiable. If the covariate is measured during the experiment, it could be influenced by the treatment — which would bias your estimates.
The same metric, from a prior period In most cases, the best covariate is simply the same metric measured before the experiment window. Revenue predicts revenue. Session count predicts session count.
| Outcome metric | Good covariate |
|---|---|
| Revenue (experiment period) | Revenue (prior 30 days) |
| Click-through rate | CTR (prior 2 weeks) |
| Session length | Avg. session length (prior period) |
| Retention (Day 7) | Prior retention rate |
6. CUPED vs. Just Stratifying by User Segment
A natural question: why not just stratify users by their past behavior (high/medium/low spenders) and analyze each group separately?
Stratified analysis works, but it's more rigid. CUPED is essentially a continuous version of stratification — it uses a regression adjustment rather than discrete buckets, which captures more of the variance structure and is easier to implement at scale.
Technically, CUPED is equivalent to running a simple linear regression of Y on X and using the residuals as your adjusted outcome. If you've heard of FWL (Frisch-Waugh-Lovell) theorem, this is exactly what's happening under the hood — but you don't need to know that to use CUPED effectively.
7. Limitations to Keep in Mind
CUPED is powerful, but it's not a cure-all.
New users have no history If your experiment includes users who are brand new to your platform, there's no pre-experiment covariate available for them. You can either exclude them from the CUPED adjustment or use a proxy covariate (e.g., acquisition channel).
It doesn't fix bad randomization CUPED reduces noise — it doesn't fix imbalance between groups. If your treatment and control differ on baseline characteristics, that's a randomization problem, not a variance problem.
Diminishing returns with short lookback windows If your pre-experiment window is too short (e.g., 3 days), the covariate may not be stable enough to be predictive. A 14–30 day lookback typically works well.
Effect size doesn't change CUPED makes it easier to detect an effect — it doesn't change the effect itself. If your treatment genuinely has a small impact, CUPED helps you confirm that sooner. It doesn't make a weak treatment look stronger.
Takeaway
CUPED is one of the highest-leverage tools in experimentation. It requires no changes to your randomization, no extra data collection, and no complex modeling — just a pre-experiment covariate and a simple adjustment.
The intuition in one sentence:
Remove the variance you could have predicted, so you can focus on the variance that actually matters.
If your platform has user history (and most do), there's almost no reason not to use CUPED.
SQL Growth