Chapter 02

Stats Foundations

Why averages lie, what distributions actually tell you, and the numbers that matter most in product decisions.

⏱ 15 min read 📊 Includes interactive

The problem with "average"

Say you're a PM at a music app. Your analyst reports: "Users listen to an average of 27 minutes of music per day." Sounds healthy. So you ship a feature targeting that "average user."

But look closer. Maybe 60% of users listen for 2–5 minutes (background listening while commuting), and 15% of power users listen for 2–3 hours. The "27-minute user" barely exists — it's a mathematical artifact of averaging two very different groups.

You just designed for a ghost.

PM Insight

Whenever you see an average, ask: what does the distribution look like? Is everyone clustered near the average, or are there distinct groups pulling it in different directions? The answer changes what you build.


What a distribution actually is

A distribution is just a picture of how often each value appears in your data. Instead of collapsing everything into one number, it shows the full shape of behavior.

The most common shape is the normal distribution — the classic bell curve. Most values cluster near the middle, with fewer values at the extremes. Daily step counts, session durations for a well-designed product, and measurement errors all tend to look roughly normal.

But many things in product data don't. Revenue, virality, content engagement — these are often skewed, with a long tail of outliers that drag the average far from where most users actually are.

Interactive — Distribution Explorer

Drag the sliders to shift the mean and change the spread. Watch how the shape changes — and notice when the average stops being useful.


Mean, median, and mode — when each matters

Mean (average)

Add everything up, divide by count. Fast and familiar — but sensitive to extreme values. One power user spending $10,000 can make an average revenue metric look very different from what most users actually do.

Use it when: the distribution is roughly symmetric and outliers aren't distorting it.

Median

The middle value when everything is sorted. Half of users are above it, half below. Much more robust to outliers and skewed distributions.

Use it when: you care about the "typical" user, not the mathematical average. Revenue, session length, and load times are often better reported as medians.

PM Insight

You'll often see load times or error rates reported as P50, P75, P95, P99 — these are percentiles. P50 is the median (50% of values are below it). P95 means 95% of values are below that number — so if your app's P95 load time is 8 seconds, the slowest 5% of users wait 8 seconds or more. For a large product, that's millions of people. Averages hide this completely.

Mode

The most frequently occurring value. Useful for categorical data (most common OS, most popular plan tier) but rarely the right tool for continuous metrics.


Spread: the number that changes everything

Two products can have the same average NPS of 32 — one where most users score it 30–35, and one where half score it 10 and half score it 55. Same mean, completely different products. The difference is variance — how spread out the values are.

Standard deviation is the most common measure of spread. Think of it as the "typical distance" a value sits from the average. You'll sometimes see it written as the Greek letter σ (sigma) — "3 sigma" just means 3 standard deviations away from the mean. In a normal distribution:

But most product metrics aren't normally distributed, so apply this framework with care.

PM Insight

When your data team reports a metric changed, ask about the standard deviation too. A 5% increase in a metric with very high variance might be pure noise. The same 5% increase in a stable, low-variance metric is worth paying attention to.


Skewed distributions and long tails

Many product metrics are right-skewed — most values are small, but a small number of very large values pull the mean to the right. Revenue, viral shares, support tickets — all tend to look like this.

In a right-skewed distribution, the mean is always higher than the median. The gap between them tells you how much outliers are distorting the average.

When you hear "our average revenue per user is $45," ask: what's the median? If the median is $8, your "average user" is mostly a fiction invented by a small number of heavy spenders.

Rule of thumb

Mean > Median → right-skewed, outliers pulling average up. Report median.
Mean ≈ Median → roughly symmetric. Mean is fine.
Mean < Median → left-skewed, unusually low values dragging average down. Report median.


Concrete example: why median holds up under skew

Here are the load times (in seconds) for 10 users opening your app:

User 1–8:   1.1   1.2   1.2   1.3   1.3   1.4   1.4   1.5   seconds
User 9:     4.8   seconds  (slow connection)
User 10:    38.1 seconds  (something went very wrong)
Mean
5.3s
Suggests a slow app
Median (P50)
1.35s
Typical user is fine
P90
38.1s
Worst 10% in pain

The mean (5.3s) makes your app look sluggish for everyone — two outliers dragged it up from 1.3s. If you reported this in a review, you'd likely kick off a performance sprint that 80% of your users don't need.

The median (1.35s) tells the truth: the typical user loads fast. But it hides a real problem — User 10's 38-second experience is genuinely broken. Median alone would let that slip past.

The P90 (38.1s) catches it. You need all three numbers to get the full picture: median for typical experience, a high percentile for tail pain, and you can safely ignore the mean.

Interactive — Outlier impact on mean vs percentiles

Each dot is a user's load time. Add slow users and watch how each statistic responds. Notice which ones move and which ones hold steady.

Mean
Median
P75
P90

Which percentile should you use?

Median (P50) is the right default for skewed data — but it answers only one question: what does the typical user experience? Different product questions need different percentiles.

Percentile What it answers Use for
P25 What does the bottom quarter experience? Spotting disengaged users, catching low-end performance issues
P50 (median) What does the typical user experience? Revenue per user, session length, any metric where "average" is misleading
P75 Where does the top quartile start? Understanding power-user behavior, setting realistic targets
P90 / P95 How bad does it get for the worst 10% / 5%? Latency, load times, errors — anything where tail pain matters
P99 What's the worst-case experience? Infrastructure SLAs, catching catastrophic failures, high-volume systems
IQR (P75−P25) How spread out is the middle of the data? Measuring variability in skewed data — more robust than standard deviation

The three-number summary for performance metrics

For any latency or load time metric, ask for these three together:
P50 — is the typical experience acceptable?
P90 or P95 — how bad is it for the worst-off users?
P99 — are there catastrophic outliers we need to fix?

A fast P50 and a broken P99 means most users are happy but some are having a terrible time. You'd never see that from a mean.

PM Insight

The IQR (interquartile range = P75 minus P25) is a much better measure of spread than standard deviation when your data is skewed. Standard deviation is distorted by outliers for the same reason the mean is. IQR ignores the tails entirely and tells you how much the middle 50% of your users vary — which is usually what you actually want to know.


PM Playbook — Questions to ask


4 questions