Feature Store vs. Embedding
If you've spent time around ML systems, you've probably heard both terms in the same conversation — sometimes used interchangeably. They're not the same thing. They operate at different levels, solve different problems, and confusing them leads to muddled system design.
This post untangles the two concepts and explains how they relate.
1. The One-Line Difference
| What it is | |
|---|---|
| Feature Store | A system (infrastructure) for storing and serving ML features |
| Embedding | A way of representing data as a dense numeric vector |
The key insight: these are different categories of things. A Feature Store is a system. An embedding is a data representation. One is a library. The other is a type of book you might find inside it.
2. What Is a Feature?
Before separating the two concepts, it helps to be precise about what a "feature" is.
A feature is any variable that gets fed into a machine learning model. That's a broad definition — and intentionally so.
Examples from a fashion e-commerce context:
- Number of purchases in the last 7 days → numeric feature
- User's most browsed category → categorical feature
- Whether the user is on mobile → binary feature
- A 128-dimensional vector representing the user's behavioral patterns → embedding feature
That last one is key. An embedding is a feature. It's a specific type of feature — one that compresses high-dimensional or unstructured data into a dense vector. But it's still a feature, and it can be stored and served alongside all the others.
3. What Is a Feature Store?
A Feature Store is the infrastructure layer that manages features across their lifecycle — from computation to storage to serving.
Without a Feature Store, teams compute the same features independently:
Recommendation team → computes "user's 7-day click count" Search team → computes "user's 7-day click count" (again) Ads team → computes "user's 7-day click count" (again, slightly differently)
With a Feature Store, the feature is computed once and shared:
[Feature pipeline]
↓
[Feature Store]
├── Offline Store (e.g., Hive, S3) → model training, historical lookups
└── Online Store (e.g., Redis, DynamoDB) → low-latency serving at inference time
↓ ↓
Rec model Search model Ads model
The critical function of a Feature Store is keeping offline (training) and online (serving) features consistent. When the feature your model trained on differs from the feature it receives at inference time, you get training-serving skew — and your model performs worse in production than it did in evaluation.
4. What Is an Embedding?
An embedding is a learned representation that maps high-dimensional or unstructured data into a dense, fixed-size vector — in a way that preserves semantic meaning.
The core property: similar things end up close together in vector space.
"black linen shirt" → [0.82, 0.14, -0.33, 0.67, ...] "navy linen shirt" → [0.79, 0.11, -0.30, 0.71, ...] ← similar vector "running shoes" → [0.12, 0.88, 0.55, 0.03, ...] ← very different vector
This geometric property is what makes embeddings useful. Rather than hand-crafting rules about which products are similar, the model learns a space where similarity is encoded in distance.
Embeddings are used for:
- Recommendation: match user embeddings to item embeddings via dot product similarity
- Search: embed a search query and retrieve the closest product embeddings
- Personalization: encode a user's behavior history into a single vector that downstream models can use
5. Where They Overlap — and Where They Don't
The source of confusion is that embeddings are features, so they naturally end up in Feature Stores. But the way they're stored and served is often different from regular features.
Regular features (numeric, categorical):
- Stored as key-value pairs: user_id → [purchase_count: 12, category: "tops", ...]
- Retrieved with a point lookup: "give me all features for user X"
- Works well with standard key-value stores (DynamoDB, Redis)
Embedding features:
- Stored as vectors: user_id → [0.82, 0.14, -0.33, ...]
- Often need a different kind of retrieval: "give me the top-K most similar vectors to this query vector"
- Requires Approximate Nearest Neighbor (ANN) search — a fundamentally different operation from key-value lookup
This is why many systems end up with a separate vector store (OpenSearch, Pinecone, Weaviate, Faiss) alongside their main Feature Store. The Feature Store handles regular features via key-value lookup; the vector store handles embedding retrieval via ANN search.
Lyft's architecture is a good example: their main Feature Store uses DynamoDB + ValKey for regular features, but they added OpenSearch specifically to support embedding-based retrieval.
6. A Practical Mental Model
Think of it this way:
Feature Store = a library Embedding = one type of book in that library (a very dense, compressed one)
The library (Feature Store) organizes and serves everything — checkouts, renewals, discovery. The dense book (embedding) is one of many items it holds, but it needs a special reading room (vector store) because you don't just want to find a specific book — you want to find books similar to this one.
7. When Do You Need Each?
You need a Feature Store when:
- Multiple models or teams share the same features
- You're seeing training-serving skew (model performance degrades in production)
- Feature computation is duplicated across pipelines
- You need low-latency feature serving at inference time
You need embeddings when:
- Your data is unstructured (text, images, behavioral sequences)
- Similarity search is part of the task (recommendation, semantic search, retrieval)
- You want a compact, generalizable representation of complex objects
You need both when:
- You're running a recommendation or search system at scale
- You need to serve both traditional features (user demographics, recent activity counts) and embedding-based similarity in the same inference pipeline
Takeaway
A Feature Store and an embedding are not competing concepts — they operate at different levels and often work together.
One sentence summary:
An embedding is a type of feature. A Feature Store is the system that manages features. The confusion arises because embeddings, unlike regular features, also need vector similarity search — which often requires a separate store.
SQL Growth