Statsig Feature Flags
Feature flags are conceptually simple: wrap a code change in a conditional, and decide at runtime who sees it. But the gap between "conceptually simple" and "production-ready at scale" is where most of the interesting engineering lives.
Statsig's feature flag product — called Feature Gates — is built around a specific philosophy: feature flags and experiments are fundamentally different tools, and conflating them creates problems for both engineering velocity and statistical rigor. This post explains how Statsig Feature Gates work, what makes them distinctive, and when to use them versus other tools in the experimentation stack.
1. What Is a Feature Gate?
A Feature Gate is Statsig's implementation of a feature flag. At its core, it's a boolean check — pass or fail — evaluated at runtime based on user attributes, environment data, and rules you define.
# Server-side check passed = statsig.check_gate(user, "new_checkout_flow") if passed: show_new_checkout() else: show_old_checkout()
The gate evaluates to true or false based on the targeting rules you've set up in the Statsig console. The evaluation happens locally in the SDK — no network request at check-time — which keeps latency negligible even when you have hundreds of flag checks per page.
2. The Anatomy of a Feature Gate
Every Feature Gate evaluation involves three components working together.
The ruleset
When you configure a gate in the Statsig console, you're building a set of rules: what percentage of users see true, which user attributes gate access, which environments the flag applies to. These rules are compiled into a JSON spec that gets distributed to your SDKs.
The SDK evaluation Rather than making a network request each time you check a gate, the SDK downloads the full ruleset at initialization and evaluates rules locally against the current user object. The result: gate checks that run in under 1ms, with no added latency to your request path.
The user object The user object is how the SDK knows who it's dealing with. At minimum, it needs a user ID. But you can pass additional attributes — email, country, device type, app version, subscription tier — and the gate rules can target on any of them.
user = StatsigUser( user_id="user_123", email="user@example.com", country="KR", custom={"subscription": "premium", "app_version": "3.2.1"} )
3. What You Can Target On
Statsig Feature Gates support a wide range of targeting conditions out of the box:
| Condition type | Examples |
|---|---|
| User attributes | email, user ID, custom fields |
| Geographic | country, locale (resolved from IP) |
| Device/browser | device type, browser name, OS version |
| App version | semantic versioning operators (≥, ≤, etc.) |
| Environment | production, staging, development |
| Percentage rollout | random X% of users by ID hash |
| Segments | reusable rule groups (e.g. "all beta users", "EU users") |
The percentage rollout uses consistent hashing on the user ID, so a given user always gets the same gate result — they won't flip between true and false across sessions.
4. When to Use Feature Gates (and When Not To)
Statsig is explicit about the distinction between feature flags and experiments. This isn't just product positioning — it reflects a real architectural difference.
Use a Feature Gate when you need:
- A gradual rollout to safely deploy new code to an increasing percentage of users
- A kill switch that can immediately disable a feature in production without a deploy
- Environment-specific behavior (dogfooding in staging, restricting to internal users first)
- Access control based on user attributes (premium features, regional availability)
- A lightweight A/B test where you just want to measure whether the new feature helps or hurts, without a formal hypothesis
Don't use a Feature Gate when:
- You need to return structured or multi-value data based on targeting rules → use Dynamic Config instead
- You want to test complex hypotheses, measure impact on specific metrics with statistical rigor, or run multi-variant tests → set up an Experiment instead
The core principle: Feature Gates are for shipping decisively. Experiments are for seeking understanding. A gate answers "who sees this feature?" An experiment answers "what does this feature do to user behavior?"
5. The Flag Lifecycle
One of the less visible but genuinely painful problems with feature flags at scale is lifecycle management — flags that were supposed to be temporary accumulate as technical debt, cluttering codebases and creating unpredictable behavior.
Statsig addresses this with explicit lifecycle states:
| Status | Meaning |
|---|---|
| In Progress | Being rolled out and tested |
| Launched | Rolled out to everyone — candidate for code cleanup |
| Disabled | Rolled back from everyone |
| Archived | No longer in use — safe to remove from code |
Every new gate starts as "Temporary" by default, signaling that it's expected to have a finite life. "Permanent" gates — kill switches, permission controls — are explicitly marked as such.
The distinction matters because it creates a workflow for flag cleanup. Once a gate reaches "Launched," the next step is removing the conditional from the code entirely — not leaving it in indefinitely. Statsig also surfaces which flags haven't been evaluated recently, making it easier to identify candidates for archival.
6. Built-in A/B Testing via Pulse
Feature Gates have a lightweight A/B testing capability built in, called Pulse. When a gate is live, Statsig automatically treats the users who pass the gate (see the new feature) as the treatment group, and users who fail it as the control group.
This lets you answer a basic question — "did this feature help or hurt key metrics?" — without setting up a formal experiment. You get metric movement data, exposure counts by variant, and statistical significance indicators, all without any additional configuration.
The tradeoff: Pulse is designed for simple rollout monitoring, not rigorous hypothesis testing. It doesn't support custom primary metrics defined upfront, multi-variant testing, or the CUPED-style variance reduction that Statsig's Experiments product applies. For those use cases, the right tool is an Experiment — not a gate.
7. Scheduled Rollouts and Automated Controls
A common rollout pattern: start at 1%, watch metrics for a few hours, advance to 5%, then 25%, then 100%. Statsig supports this with Scheduled Rollouts — you define the progression timeline upfront and the platform handles the percentage increases automatically.
Combined with Pulse monitoring, this gives you a data-driven rollout loop:
Start: 1% rollout
↓
Monitor: Pulse shows no metric regressions
↓
Advance: 5% → 25% → 100% on schedule
↓
Gate status: Launched → cleanup conditional from code
For situations where you need an immediate human-in-the-loop decision rather than an automated schedule, you can also use Overrides — a bypass list that forces specific users (internal testers, beta users, specific accounts) into true or false regardless of the general rollout percentage.
8. SDK Infrastructure: Why Performance Matters
A non-obvious constraint in feature flagging infrastructure: if checking a flag adds meaningful latency to your request path, the feature flag system defeats its own purpose. If you're A/B testing a new checkout flow, but the flag check itself slows the page by 50ms, you've introduced a confound — you're no longer measuring whether the new checkout is better, you're measuring whether the slower page is worse.
Statsig addresses this with a local evaluation model. SDKs download the full ruleset at initialization and evaluate rules in memory, with no per-check network requests. For server SDKs, the ruleset is polled in the background and kept up to date — typically within seconds of any console change — without blocking request processing.
Statsig reports flag check latency well under 1ms in server environments, and provides bootstrapping support for client-side SDKs to eliminate the initialization wait on first page load.
9. Statsig Feature Gates vs. Datadog Feature Flags
Since both tools were covered in this series, a brief comparison:
| Statsig Feature Gates | Datadog Feature Flags | |
|---|---|---|
| Core strength | Tight integration with A/B testing and analytics | Native observability correlation (APM, RUM) |
| Automated rollbacks | Alert-based, manual or via experiment results | Automated, triggered by live telemetry (SLOs, error rates) |
| Statistical A/B testing | Built-in Pulse + full Experiments product | Planned (via Eppo integration) |
| Flag lifecycle management | Explicit status system + recency-based cleanup | Bits AI-assisted stale flag detection |
| Warehouse native | Yes (Warehouse Native option) | Not natively |
| Best for | Teams wanting flag management + experimentation in one platform | Teams already on Datadog wanting observability-native rollout control |
The honest summary: Statsig is the stronger choice if experimentation depth matters — its stats engine, metric framework, and the distinction between flags and experiments are more mature. Datadog Feature Flags is the stronger choice if you're already deep in the Datadog stack and want automated, telemetry-driven rollback without switching tools.
Takeaway
Statsig Feature Gates are designed around a clear philosophy: feature flags are for shipping, not for experimenting. By keeping the two tools conceptually separate — and technically distinct — Statsig avoids the common failure mode where "we'll just add a flag and measure it" becomes a replacement for rigorous experimentation, and where the codebase slowly fills up with permanent temporary flags.
One sentence summary:
A Statsig Feature Gate is the fastest way to get new code in front of real users safely — with a built-in lifecycle system that forces you to eventually remove it, and a direct path to formal experimentation when you need more than a binary on/off.
SQL Growth