Logo SQL Growth

How Martingales Make Always-Valid Inference Work

by DataMarvin
3 hours ago
Views: 4
Illustrative Image

In the previous post, we introduced e-values as the key building block of Always-Valid Inference. An e-value is a non-negative random variable whose expected value under the null is at most 1 — and this property, combined with Markov's inequality, guarantees that rejecting when E1/αE ≥ 1/α controls the false positive rate at α, at any stopping time.

But a natural question follows: why does the e-value maintain this property regardless of when you stop? What's the underlying mathematical structure that makes anytime validity possible?

The answer is martingales. This post explains what a martingale is, why it matters for sequential testing, and how it connects to the e-values and confidence sequences you'd actually use in practice.


1. Start with the Intuition: A Fair Game

The best place to start with martingales is gambling.

Imagine you're playing a fair coin-flip game. Each round, you bet $1. Heads: you win $1. Tails: you lose $1. The game is fair — no edge for either side.

Track your cumulative wealth over time: W0,W1,W2,W3W₀, W₁, W₂, W₃, ...

At any point in the game, what do you expect your future wealth to be, given everything that's happened so far? Exactly what it is right now. If you have $50 after 30 rounds, your best prediction of wealth after 31 rounds is still $50 — because the next flip is equally likely to go either way.

This is the defining property of a martingale:

E[Wt+1W1,W2,...,Wt]=WtE[W_{t+1} | W₁, W₂, ..., W_t] = W_t

Your expected future value, given the past, equals your current value. The process has no drift — no systematic tendency to go up or down.

Three types worth knowing:

TypePropertyExample
MartingaleE[futurepast] = present
SubmartingaleE[futurepast] ≥ present
SupermartingaleE[futurepast] ≤ present

The relevant one for hypothesis testing is the supermartingale — a process that tends to drift downward or stay flat, but never systematically upward.


2. What Does This Have to Do with Hypothesis Testing?

Here's the key connection.

A standard p-value under the null hypothesis is uniformly distributed on [0,1][0, 1]. When you monitor it over time — checking it at each new observation — it performs a kind of random walk. Random walks have a well-known property: given enough time, they'll visit any region of their range. So a monitored p-value will eventually dip below 0.05 by chance, even if the null is true.

The e-value does something fundamentally different. Under the null hypothesis, the e-value process E1,E2,E3,...E₁, E₂, E₃, ... is a non-negative supermartingale — its expected future value is always at most its current value.

Under H₀: E[Et+1E1,...,Et]Et E[E_{t+1} | E₁, ..., E_t] ≤ E_t

Why does this matter? Because of a remarkable theorem called Ville's inequality (1939), which is the sequential analog of Markov's inequality:

P(Et1/αfor any t)αE[E0]αP(E_t ≥ 1/α \text{for any } t) ≤ α · E[E₀] ≤ α

Where E0=1E₀ = 1 (the process starts at 1, representing no evidence either way).

This is the mathematical guarantee that makes AVI work. Because the e-value process is a non-negative supermartingale, the probability that it ever crosses 1/α — at any point in time — is bounded by α. Not "at a specific time" — ever.


3. Why Supermartingales Can't Keep Rising

The intuition for why a non-negative supermartingale rarely crosses high thresholds is elegant.

A supermartingale under the null is like a gambler playing against a house that has a slight edge. The gambler's wealth tends to drift downward over time. It might spike occasionally — randomness allows temporary gains — but it can't sustain high levels indefinitely because there's always downward pressure.

More formally: a non-negative process that tends to stay flat or decrease can't reach a large value very often, because reaching a large value would require sustained upward movement — which the supermartingale property prohibits under the null.

This is exactly what you want from a test statistic. Under the null (no real effect), the e-value shouldn't climb. When there is a real effect (the alternative is true), the likelihood ratio tips in your favor — the process is no longer a supermartingale and tends to grow, eventually crossing the threshold and leading you to reject.

Under null (H₀): E_t is a supermartingale → drifts flat or down Under alternative: E_t is a submartingale → tends to grow → eventually crosses 1/α

The test works because the two regimes — null and alternative — produce structurally different dynamics in the e-value process.


4. The Optional Stopping Theorem

Martingales come with a famous result called the Optional Stopping Theorem (OST), which is worth understanding because it's both illuminating and frequently misunderstood.

The OST says: for a martingale M_t and a stopping time τ (the time at which you decide to stop, based on what you've observed so far),

E[Mτ]=E[M0]E[M_τ] = E[M₀]

...under certain regularity conditions. The expected value of the process at the stopping time equals its initial value, regardless of how clever your stopping rule is.

Why this matters for testing

Suppose someone claims: "I'll run the experiment until I see a significant result, then stop." This sounds like it should inflate false positives — and with a standard p-value, it does. But the OST tells us something different for martingales.

If the e-value process is a martingale under the null, then no matter what stopping rule you use — stop when you see significance, stop after 1000 observations, stop on a Tuesday — the expected value of the process at stopping is the same as at the start. You can't manipulate a martingale into having a higher expected value by choosing when to stop.

This is the mathematical reason AVI is robust to optional stopping. The stopping rule doesn't corrupt the inference because the process's statistical properties are preserved regardless of when you stop.

The caveat

The OST requires certain conditions — most importantly, that the process is a true martingale (not just approximately so) and that regularity conditions on the stopping time are satisfied. In practice, e-value processes are often supermartingales rather than exact martingales under the null, but Ville's inequality still applies, giving the anytime validity guarantee.


5. From Martingales to E-Values: Connecting the Pieces

Now we can see how everything connects.

Step 1: Define a likelihood ratio

The likelihood ratio at time t is:

Et=P(X1,...,XtH1)/P(X1,...,XtH0)E_t = P(X₁, ..., X_t | H₁) / P(X₁, ..., X_t | H₀)

How much more probable is the observed data under the alternative than the null?

Step 2: Observe that the likelihood ratio is a martingale under H₀

Under the null hypothesis:

E[Et+1/EtX1,...,Xt]=E[P(Xt+1H1)/P(Xt+1H0)]=1 E[E_{t+1} / E_t | X₁, ..., X_t] = E[P(X_{t+1}|H₁) / P(X_{t+1}|H₀)] = 1

The incremental likelihood ratio has expected value 1 under H₀, which means the process EtE_t is a martingale (actually, exactly a martingale, not just a supermartingale) under the null.

Step 3: Apply Ville's inequality

Since E_t is a non-negative martingale starting at E₀ = 1:

P(Et1/αforsomet)αP(E_t ≥ 1/α for some t) ≤ α

The false positive rate is controlled at α, at any stopping time.

Step 4: The mSPRT extension

In practice, you don't know the exact alternative hypothesis — you don't know what effect size you're testing for. The mSPRT (mixture Sequential Probability Ratio Test) handles this by averaging the likelihood ratio over a prior distribution on the effect size:

Et=[P(dataθ)/P(dataH0)]π(θ)dθ E_t =\int [P(data | θ) / P(data | H₀)] · π(θ) dθ

This mixture is still a martingale (or supermartingale) under H₀ — the martingale property is preserved under mixture — so the anytime validity guarantee carries over. The mSPRT is the e-value construction used in most real-world AVI implementations.


6. Confidence Sequences as Inverted Martingale Tests

Recall from the previous post that confidence sequences — sequences of intervals that are simultaneously valid at all sample sizes — are the interval analog of anytime-valid tests.

The connection to martingales is direct. A confidence sequence [lt,ut][l_t, u_t] can be constructed by inverting a family of martingale tests: for each candidate value θ₀, ask "is the martingale test for H0:θ=θ0H₀: θ = θ₀ rejected at time t?" The confidence sequence at time t is the set of all θ₀ values for which the test is not rejected.

Because each individual test is anytime-valid (using Ville's inequality), the resulting confidence sequence inherits the same property: it simultaneously covers the true parameter at all time points with probability at least 1 − α.


7. A Concrete Picture: What the E-Value Process Looks Like

To make this tangible, consider what the e-value process looks like in an A/B test with a small real effect.

Time → E_t ^ | ····/ 1/α |·······················threshold···········/··· | ··/ | ···/ | ···/· | ····/··· | ···/··· | ··/··· | ·/··· 1 |·/ | +-----------------------------------------> t Start Null effect zone Reject

Under the null (no real effect), the e-value process meanders around 1, occasionally spiking, but the supermartingale property prevents it from sustaining growth toward 1/α. Ville's inequality bounds the probability of ever crossing.

Under the alternative (real effect exists), the process has an upward drift. Given enough data, it will eventually cross 1/α — and because the process is robust to optional stopping, you can stop the moment it crosses and the inference remains valid.


8. The Deeper Intuition: Fairness and Information

There's a deeper way to understand why martingales are the right structure for hypothesis testing.

A martingale represents fair uncertainty — a process that incorporates information as it arrives, without being distorted by the observer's choices. The Optional Stopping Theorem formalizes this: you can't gain an informational advantage over a martingale by choosing when to stop observing it.

In hypothesis testing, what you want is exactly this: a test statistic that reflects the accumulated evidence about the hypothesis, without being inflatable by strategic observation. The martingale structure guarantees that the evidence measure — the e-value — behaves the way evidence should: it grows when the data favor the alternative, stays flat when the data are uninformative, and can't be gamed by clever stopping.

This is the deeper reason AVI is valid. It's not just a mathematical trick — it's using the right kind of process for what evidence accumulation should look like.


Takeaway

The martingale structure is what gives Always-Valid Inference its statistical foundation. The key chain of logic:

Likelihood ratio under H₀ (is a) Non-negative martingale / supermartingale (satisfies) Ville's inequality: P(ever crossing 1/α) ≤ α (gives) Anytime-valid false positive control (robust to) Optional stopping — no matter when you stop

One sentence summary:

Martingales make AVI work because the likelihood ratio under the null is a non-negative martingale — and Ville's inequality guarantees that a non-negative martingale can never cross a high threshold too often, regardless of when you stop watching.

More

Based on Tags

Recent Popular

Most Popular

  • Why You Shouldn't Peek at Your A/B Test Results

    An Introduction t Sequential AB Testing

    Illustrative Image
  • Stratified Sampling in A/B Testing

    Why Random Isn't Always Enough

    Illustrative Image
  • What Is CUPED

    and Why It Makes Your Experiments Faster

    Illustrative Image