Priya Nair·
Two-stage retrieve-then-rank rec engine that actually hits p99 under 80ms with FAISS and a GBDT
Designs a personalized recommendation system with feature engineering, model serving, candidate generation, and real-time ranking infrastructure.
Recommendation Engine Architecture
You are a Machine Learning Architect who built recommendation systems at Netflix and Spotify. Design a recommendation engine architecture.
**Recommendation Domain**: {{recommendation_domain}} (products, content/media, jobs, people/social, travel destinations)
**User Base**: {{user_base}} (number of users, item catalog size, interaction types: click/purchase/rate/share)
**Latency Requirements**: {{latency_requirements}} (p99 serving latency for recommendations, real-time vs batch acceptable)
**Personalization Depth**: {{personalization_depth}} (fully personalized, segment-based, popularity fallback, cold start handling)
Design the complete recommendation system:
1. **System Architecture** - Retrieval + Ranking two-stage architecture with candidate generation and real-time scoring
2. **Feature Store** - Real-time features (user context, session), batch features (user profile, item attributes), feature engineering pipeline
3. **Candidate Generation** - Collaborative filtering (ALS, matrix factorization), content-based filtering, two-tower neural networks, approximate retrieval (FAISS/ScaNN/Annoy)
4. **Ranking Model** - Ranking architecture: logistic regression → GBDT → deep learning (wide & deep, DCN), feature crosses, contextual features
5. **Training Pipeline** - Data collection, negative sampling, feature engineering, model training (batch), hyperparameter tuning, A/B test framework
6. **Model Serving** - TensorFlow Serving, TorchServe, Triton Inference Server, model versioning, canary deployment, shadow testing
7. **Real-Time Personalization** - Session-based recommendations using RNN/Transformer, real-time feature updates, online learning
8. **Cold Start Handling** - Content-based signals for new users, popularity + exploration (epsilon-greedy/UCB/Thompson sampling), onboarding flows
9. **Diversity & Exploration** - MMR (Maximal Marginal Relevance), category diversification, serendipity injection, explore-exploit balance
10. **A/B Testing Framework** - Experiment assignment (user splitting), metrics (CTR, conversion, dwell time, diversity), statistical significance
11. **Evaluation Metrics** - Offline: Precision@K, NDCG, MAP, MRR. Online: CTR, conversion rate, engagement lift, long-term value
12. **Data Pipeline** - Event collection (impressions, clicks, conversions), label generation, training data preparation (hourly/daily)
13. **Scaling Strategy** - Horizontal scaling of retrieval, caching frequent queries, model sharding, pre-computation for heavy users
Include the system architecture diagram and pseudocode for the candidate generation and ranking stages.
Ergebnisse
Recommendation engine for a **media/content** platform (~5M users, ~1M items, p99 < 80ms serving). Two-stage retrieval + ranking — the only architecture that hits that latency at this catalog size.
**Architecture**
```
user ─▶ feature store ─▶ ┌─ candidate gen (two-tower ANN) ─▶ ~500 items ─┐
└─ collaborative filtering (ALS) ──────────────┘
│
ranker (GBDT) ─▶ top 20 ─▶ diversify ─▶ feed
```
**Stage 1 — retrieval (cheap, recall-oriented):**
```python
# Two-tower: precomputed item vectors in FAISS, user vector built online
user_vec = user_tower(session_features) # ~5ms
candidates = faiss_index.search(user_vec, k=500) # approximate NN, ~10ms
candidates += als_model.recommend(user_id, n=200) # collaborative signal
```
**Stage 2 — ranking (expensive, precision-oriented):**
```python
features = feature_store.batch_get(user_id, candidate_ids) # real-time + batch
scores = gbdt.predict(features) # CTR proxy
ranked = mmr_diversify(candidates, scores, lambda_=0.7) # avoid 20 near-dupes
```
**Cold start:** new users get popularity + content-based picks with epsilon-greedy exploration until enough interactions accrue. **Serving:** item vectors precomputed nightly; only the lightweight user tower + ranker run online, which is what buys the p99. **Eval:** offline NDCG@10 and MAP gate model promotion; online CTR and dwell-time lift decide via A/B. Frequent-user feeds are pre-computed and cached to shave tail latency.
Modell: Claude Opus 4
45 Likes14 SavesScore: 24
4 Kommentare
Tobias Keller·
The system design details here are spot on.
Daniel Cohen·
Nice. The schema validation up front prevents a whole class of bugs.
Luca Brunner·
This is going in our internal wiki, thanks for sharing.
Emily Chen·
Solid. The typing is tight and it actually compiles, unlike half the snippets I find.