Chapter 06
Why Is It Slow?
Latency, throughput, bottlenecks — reading performance from a PM lens.
"Slow" is two different questions
Picture this: a Slack message lands at 9:14am — "the app is slow." An engineer asks "slow how — for one user clicking a button, or for the system under load?" The reporter says "I don't know, just slow." Forty-five minutes of confusion follow.
When a PM says "the app is slow," they usually mean one specific thing: a user clicked a button and waited too long for something to happen. That's latency — meaning how long a single request takes from start to finish. When an engineer asks "can we handle the Black Friday spike?", they're asking a different question: throughput — meaning how many requests the system can process per second before it starts to fall over.
These two questions have different causes and different fixes. A system can be plenty fast for one user and collapse under a thousand. Another system can chug along reliably under load while every individual request feels sluggish. Knowing which question you're actually asking is the first move in any performance conversation.
Where time actually goes in a request
Picture what happens when a user clicks a button: their browser sends a request over the network to your server, your server runs some code, that code probably talks to a database, possibly to one or more internal services, possibly to an external API, then assembles a response and sends it back. Each of those steps takes time.
Most of the time is waiting, not computing
Servers sit idle waiting for the database. The database sits idle waiting for disk. The app sits idle waiting for a downstream API. When something is slow, the question "what is it waiting for?" is almost always more useful than "what is it doing?"
The slow part is rarely the code people are reading
Engineers will instinctively look at the function that returns the page. The actual culprit is often three layers down — a database query, a remote call, a cache miss — and may take a few rounds to find. When an engineer says "we'll need to profile this first," they're not stalling. They're acknowledging that gut feel is usually wrong, and they want real data before spending engineering time on a fix.
PM Insight
"What is it waiting for?" is the single most useful PM question in a perf review. It points the conversation at the thing that's actually causing the delay, not at the code that happens to be on screen.
The percentile vocabulary: p50, p95, p99
Performance numbers are reported as percentiles, not averages. If you only learn one piece of vocabulary from this chapter, learn this one:
1. p50 (median) — typical experience
Half of users see this latency or faster; half see it or slower. The "average user," roughly speaking. Useful as a baseline, useless on its own for understanding tail behavior.
2. p95 — worse than typical, still common
95% of users see this latency or faster; the other 5% see something worse. If your p95 is 3 seconds, one in twenty users is having a degraded experience right now.
3. p99 — the bad day
99% of users see this or faster; 1% see something worse. The slow tail. These are the users who churn, complain on Twitter, or silently never come back.
Why this matters: averages hide outliers. A page with a 200ms average load time can have a p99 of 8 seconds, which means roughly 1 in 100 users had a miserable experience. Tail latency is where users decide your product is broken — even when "the numbers look fine."
The single best perf-review question
"What does the p95 look like, not just the average?" Engineers who already think this way will appreciate the question. Engineers who don't will start once they're asked.
Try it: tail latency simulator
Move the sliders and watch how a request that looks fine at p50 can become painful at p95 and p99.
Likely bottleneck: database
What this scenario suggests
Adjust the inputs above.
The usual suspects
A short list of patterns that account for most "why is this slow?" investigations. None require a CS degree to recognize once you've seen them named:
1. The N+1 query
Loading a list of 100 items, then making 100 separate database calls — one per item — to get their details. Should have been one or two queries; instead it's 101. The single most common database-related perf bug in the history of software.
2. Missing index
Every query that should look something up directly is instead scanning the entire table. Performs fine in development with 100 rows; collapses in production with 10 million.
3. Slow downstream
Your service is fast; the third-party API it calls is slow. You're now only as fast as them. Especially common in checkout flows, address lookups, and payment confirmations.
4. Cold start / cache miss
The first request after a deploy or a quiet period is dramatically slower than the next one because nothing's warmed up yet. Especially common with serverless and container-based architectures.
5. Lock contention or queue wait
Your request is sitting in line behind other requests because some shared resource only handles one thing at a time. The CPU is bored; users are waiting.
An experienced engineer recognizes one of these patterns within minutes of seeing a slow trace. Knowing the names lets you follow along — and ask "is this an N+1?" when the symptoms fit.
How a perf investigation actually works
Real performance work has a predictable rhythm — a four-step loop, repeated until the numbers look right:
1. Measure
Get real data on where time is going. Without measurement, the next step is guessing — and gut feel is wrong roughly half the time.
2. Find the bottleneck
Almost always one specific step dominates. Frequently surprising. The slowest thing is rarely the thing the engineer assumed it would be before the data arrived.
3. Fix the bottleneck
The fix is often very small once you know where to point — a missing index, a single query change, a cache, a config tweak. Big rewrites are usually the wrong response.
4. Re-measure
The bottleneck has moved. Performance work is whack-a-mole; the next slowest thing is now the slowest thing. Stop when the numbers are good enough — not when there's nothing left to optimize.
PM Insight
"We should rewrite the service" is almost never the right response to a perf problem. The fix is usually small once the bottleneck is found — and the rewrite would just move the problem somewhere new.
AI assistants are useful at the edges of this loop — they can scan a long log or trace and surface obvious anomalies, draft hypotheses about likely causes, and speed up the read-the-data step. They don't change the structure of perf work: you still need real measurement, the bottleneck still lives where the data says it does, and the fix still has to be made and verified by someone who understands the system. Performance debugging is one of the areas where AI helps a little, not a lot.
How this changes by stage
At finding fit: fix the slow path that blocks learning; do not turn every rough edge into a platform project.
At operating at scale: performance work should tie back to SLOs, error budgets, and the customer segments harmed by the tail.
PM Playbook — Questions to ask
The next time the team is debugging a slow page or perf incident, try these:
- Is this a latency problem (one slow request) or a throughput problem (many requests)? — disambiguates the actual question
- What does the p95 look like, not just the average? — catches the tail that's killing users
- Where is the time actually going — do we have a profile or trace? — real data over gut feel
- What's it waiting for? — the single most useful framing in a perf conversation
- Could this be an N+1 query? — names the most common database perf bug
- Is this cold-start behavior or steady-state? — distinguishes deploy-related from chronic
- Is this getting worse over time, or has it always been this way? — trend vs baseline
- What's the smallest change that would move the p95? — avoids unnecessary rewrites
These eight questions alone will make you the PM perf engineers actually want in the room.