Browse by use case
Not sure where to start? Jump straight to the chapters most relevant to your product area. Each section maps a real PM problem to the concepts you'll need.
Search & Ranking
Improving search relevance, ranking results by predicted click or conversion, and evaluating ranking quality.
Ch 7
Ch 9
Ch 11
Ch 5
Supervised Learning
Ranking is a supervised problem — the model learns which result is most relevant from labeled data. Learn classification vs regression and how gradient boosting powers most ranking systems.
Evaluating ML Models
Precision@K, NDCG, and MRR are ranking-specific metrics built on the precision/recall intuition covered here. Learn to translate model metrics into product decisions.
NLP & Large Language Models
Semantic search uses embeddings to match intent, not just keywords. Covers tokenization, embedding similarity, and how RAG retrieval works at a product level.
Metrics Design
Optimizing for clicks can hurt satisfaction. Learn how to design north star, input, and guardrail metrics that don't create perverse incentives in your ranking system.
Recommendations
Personalizing feeds, surfacing relevant content or products, and balancing exploration vs exploitation.
Ch 8
Ch 11
Ch 7
Ch 4
Unsupervised Learning
Collaborative filtering and user segmentation are the backbone of rec systems. Learn how clustering finds user archetypes and how patterns emerge without labels.
NLP & Large Language Models
Embedding models turn items and users into vectors — enabling "find similar" recommendations. Covers how embeddings encode meaning and enable similarity search.
Supervised Learning
Two-tower neural networks train on explicit signals (clicks, purchases) — that's supervised learning applied to recommendations. Learn how models generalize from past behavior to new items.
A/B Testing & Experimentation
Rec systems are hard to A/B test — network effects, novelty bias, and position effects all distort results. Learn how to run experiments that actually tell you if a change is working.
Fraud Detection & Trust & Safety
Flagging suspicious transactions, detecting fake accounts, and building classifiers that survive adversarial behavior.
Ch 7
Ch 9
Ch 8
Ch 12
Supervised Learning
Fraud classifiers are supervised models trained on labeled fraud examples. Covers classification, decision thresholds, and why gradient boosting dominates tabular fraud data.
Evaluating ML Models
In fraud, false negatives (missed fraud) and false positives (blocking real users) have very different costs. Precision/recall tradeoffs and how to set thresholds for asymmetric costs.
Unsupervised Learning
Anomaly detection (isolation forests, autoencoders) catches novel fraud patterns before you have labels for them. Learn how unsupervised models flag unusual behavior.
AI Product Decisions
Covers AI security risks including adversarial inputs and prompt injection — relevant if your trust & safety system uses LLMs for moderation or policy enforcement.
Forecasting & Prediction
Predicting demand, churn, revenue, or user behavior — and knowing when your forecast is trustworthy.
Ch 7
Ch 2
Ch 3
Ch 6
Supervised Learning
Forecasting is regression — predicting a number from features. Covers linear regression, gradient boosting for tabular data, and overfitting traps that make forecasts look better than they are.
Stats Foundations
A forecast is only as useful as your uncertainty estimate. Distributions, variance, and why averages mislead — the foundation for knowing what your prediction interval actually means.
Probability & Uncertainty
Confidence intervals, p-values, and thinking in distributions instead of point estimates. Essential for presenting forecasts honestly to stakeholders.
How ML Actually Works
Overfitting is the silent killer of forecasting models — a model that memorizes past data fails on new data. Learn training vs test splits, overfitting, and what makes a model generalize.
Content Moderation
Classifying harmful content, managing human review queues, and balancing false positives against platform safety.
Ch 7
Ch 9
Ch 11
Ch 12
Supervised Learning
Content moderation classifiers are trained on labeled examples of violating vs non-violating content. Learn classification, decision thresholds, and what "accuracy" hides at class imbalance.
Evaluating ML Models
The cost of a miss (harmful content stays up) vs a false alarm (real content removed) varies by policy. Covers precision/recall, ROC curves, and how to set thresholds for different risk tolerances.
NLP & Large Language Models
LLM-based moderation can catch nuanced policy violations that keyword filters miss — but introduces new attack surfaces. Covers prompting, fine-tuning, and evaluation metrics for text classifiers.
AI Product Decisions
Prompt injection and jailbreaking are active threats to LLM-based moderators. Covers AI-specific security risks and the AI vs rules vs human decision framework.
Generative AI Features
Building LLM-powered features like copilots, summarization, Q&A, and AI assistants into your product.
Ch 11
Ch 12
Ch 10
Ch 9
NLP & Large Language Models
The foundation: tokenization, embeddings, attention, and the real trade-offs between prompting, RAG, and fine-tuning. What each approach costs and when to use which.
AI Product Decisions
Build vs buy, model selection, latency/cost/quality tradeoffs, and prompt injection risks — everything you need to make smart architectural decisions for an AI feature.
Neural Networks & Deep Learning
LLMs are large neural networks. Understanding layers, weights, and why deep learning needs so much data gives you the intuition to have better conversations with your ML team.
Evaluating ML Models
Evaluating LLM outputs is hard — there's no single accuracy number. Covers how to think about evaluation when ground truth is subjective, including BLEU, ROUGE, and human eval.
Churn & Retention
Predicting which users are at risk of churning, segmenting by engagement, and measuring the impact of retention interventions.
Ch 7
Ch 8
Ch 4
Ch 2
Supervised Learning
Churn prediction is a binary classification problem — will this user churn in the next 30 days? Covers logistic regression, gradient boosting, and setting probability thresholds for interventions.
Unsupervised Learning
User segmentation reveals which cohorts are at risk before you have churn labels. Learn how k-means and hierarchical clustering surface user archetypes with different retention profiles.
A/B Testing & Experimentation
Testing retention interventions (onboarding flows, nudges, paywalls) requires understanding long time horizons, selection bias, and when to stop an experiment early.
Stats Foundations
Churn rates are averages that hide bimodal distributions — power users and lurkers behave very differently. Learn why the mean misleads and how to think about the shape of your user population.
Want to start from the beginning?
View all 14 chapters →