Wer hat diesen Beitrag geschrieben?

Dieser Beitrag wurde von Priya Nair auf prompt2love veröffentlicht.

Priya Nair·24.5.2026

Two-stage retrieve-then-rank rec engine that actually hits p99 under 80ms with FAISS and a GBDT

Designs a personalized recommendation system with feature engineering, model serving, candidate generation, and real-time ranking infrastructure.

Recommendation Engine Architecture

You are a Machine Learning Architect who built recommendation systems at Netflix and Spotify. Design a recommendation engine architecture. **Recommendation Domain**: {{recommendation_domain}} (products, content/media, jobs, people/social, travel destinations) **User Base**: {{user_base}} (number of users, item catalog size, interaction types: click/purchase/rate/share) **Latency Requirements**: {{latency_requirements}} (p99 serving latency for recommendations, real-time vs batch acceptable) **Personalization Depth**: {{personalization_depth}} (fully personalized, segment-based, popularity fallback, cold start handling) Design the complete recommendation system: 1. **System Architecture** - Retrieval + Ranking two-stage architecture with candidate generation and real-time scoring 2. **Feature Store** - Real-time features (user context, session), batch features (user profile, item attributes), feature engineering pipeline 3. **Candidate Generation** - Collaborative filtering (ALS, matrix factorization), content-based filtering, two-tower neural networks, approximate retrieval (FAISS/ScaNN/Annoy) 4. **Ranking Model** - Ranking architecture: logistic regression → GBDT → deep learning (wide & deep, DCN), feature crosses, contextual features 5. **Training Pipeline** - Data collection, negative sampling, feature engineering, model training (batch), hyperparameter tuning, A/B test framework 6. **Model Serving** - TensorFlow Serving, TorchServe, Triton Inference Server, model versioning, canary deployment, shadow testing 7. **Real-Time Personalization** - Session-based recommendations using RNN/Transformer, real-time feature updates, online learning 8. **Cold Start Handling** - Content-based signals for new users, popularity + exploration (epsilon-greedy/UCB/Thompson sampling), onboarding flows 9. **Diversity & Exploration** - MMR (Maximal Marginal Relevance), category diversification, serendipity injection, explore-exploit balance 10. **A/B Testing Framework** - Experiment assignment (user splitting), metrics (CTR, conversion, dwell time, diversity), statistical significance 11. **Evaluation Metrics** - Offline: Precision@K, NDCG, MAP, MRR. Online: CTR, conversion rate, engagement lift, long-term value 12. **Data Pipeline** - Event collection (impressions, clicks, conversions), label generation, training data preparation (hourly/daily) 13. **Scaling Strategy** - Horizontal scaling of retrieval, caching frequent queries, model sharding, pre-computation for heavy users Include the system architecture diagram and pseudocode for the candidate generation and ranking stages.

Ergebnisse

Recommendation engine for a **media/content** platform (~5M users, ~1M items, p99 < 80ms serving). Two-stage retrieval + ranking — the only architecture that hits that latency at this catalog size. **Architecture** ``` user ─▶ feature store ─▶ ┌─ candidate gen (two-tower ANN) ─▶ ~500 items ─┐ └─ collaborative filtering (ALS) ──────────────┘ │ ranker (GBDT) ─▶ top 20 ─▶ diversify ─▶ feed ``` **Stage 1 — retrieval (cheap, recall-oriented):** ```python # Two-tower: precomputed item vectors in FAISS, user vector built online user_vec = user_tower(session_features) # ~5ms candidates = faiss_index.search(user_vec, k=500) # approximate NN, ~10ms candidates += als_model.recommend(user_id, n=200) # collaborative signal ``` **Stage 2 — ranking (expensive, precision-oriented):** ```python features = feature_store.batch_get(user_id, candidate_ids) # real-time + batch scores = gbdt.predict(features) # CTR proxy ranked = mmr_diversify(candidates, scores, lambda_=0.7) # avoid 20 near-dupes ``` **Cold start:** new users get popularity + content-based picks with epsilon-greedy exploration until enough interactions accrue. **Serving:** item vectors precomputed nightly; only the lightweight user tower + ranker run online, which is what buys the p99. **Eval:** offline NDCG@10 and MAP gate model promotion; online CTR and dwell-time lift decide via A/B. Frequent-user feeds are pre-computed and cached to shave tail latency.

Modell: Claude Opus 4

45 Likes14 SavesScore: 24

4 Kommentare

Tobias Keller·25.5.2026

The system design details here are spot on.

Daniel Cohen·25.5.2026

Nice. The schema validation up front prevents a whole class of bugs.

Luca Brunner·25.5.2026

This is going in our internal wiki, thanks for sharing.

Emily Chen·25.5.2026

Solid. The typing is tight and it actually compiles, unlike half the snippets I find.