Chapter 05
Metrics Design
How to build a metrics framework that drives real decisions — and stop optimizing for numbers that don't matter.
The metric that almost killed a product
A growth team at a consumer app was measured on daily active users (DAU). They hit their targets consistently — but the product was slowly dying. Retention was falling. Revenue per user was flat. The team had found many creative ways to drive DAU: aggressive push notifications, dark-pattern re-engagement flows, even counting a user as "active" if they opened a notification without touching the app.
The metric was going up. The business was going down. They had optimized for the measure, not the thing the measure was supposed to represent.
This is Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." It's not a corner case — it's the default outcome of poorly designed metrics systems.
PM Insight
Every metric is a proxy for something you actually care about. The job of metrics design is to choose proxies that are hard to game, move when the real thing moves, and stay stable when the real thing doesn't.
The three-layer metrics framework
Most strong product teams organize their metrics into three layers that answer three different questions:
1. North star metric
The single number that best captures whether your product is delivering value to users and to the business. It's your long-term health indicator. Moving it should feel meaningful, and it shouldn't be easily gamed.
Good north star metrics sit at the intersection of user value and business value. Spotify's is time spent listening. Airbnb's is nights booked. Notion's is weekly active editors (not just viewers). Each captures the core value exchange.
The north star test
Ask: "If this number went up consistently for 6 months, would the business definitely be healthier?" If the answer is "yes, but only if it goes up for the right reasons" — you've found a metric that's gameable and you need guardrails. If yes with no caveats, you have a strong north star.
2. Input metrics
Also called leading indicators or driver metrics. These are the upstream behaviors that, when they move, tend to cause the north star to move. They're more actionable than the north star because teams can directly influence them.
If Spotify's north star is listening time, input metrics might include: new playlist creates per user, podcast starts in first week, search-to-play conversion, discovery feature engagement. Each of these, when it improves, tends to cause listening time to grow.
3. Guardrail metrics
Metrics you're not trying to move, but that you'd care deeply about if they moved against you. They protect against teams optimizing input metrics in harmful ways.
If the podcast team is growing listening time by making it harder to skip episodes, the guardrail is user satisfaction (measured via rating, cancellation rate, or NPS). If the notification team is growing DAU by spamming users, the guardrail is unsubscribe rate.
Interactive — Build a Metrics Tree
Choose a product type and see how a three-layer metrics framework applies.
Leading vs lagging indicators
Lagging indicators measure outcomes — they confirm what already happened. Revenue, churn, NPS. They're important but slow. By the time they move, the cause is often weeks or months in the past.
Leading indicators predict future outcomes — they're correlated with where the lagging metric is heading. They move earlier and give you time to act.
Example: SaaS retention
Lagging: Monthly churn rate — you only know it churned after it churned.
Leading: Feature adoption in week 2, support ticket volume, login frequency
in the 30 days before renewal — these predict churn before it happens and give you time
to intervene.
Example: Consumer engagement
Lagging: 30-day retention — you know the user left, not why or when.
Leading: Number of core actions in days 1–7, return visits in week 1,
connection count (for social products). Users who hit certain leading thresholds
in the first week retain at dramatically higher rates.
PM Insight
The most valuable thing you can do with your data team is find your product's leading indicators for retention. Ask: "What do users who are still active at 90 days do differently in their first two weeks?" That analysis will surface your real input metrics — and tell you exactly what new user experience needs to drive.
Vanity metrics: the ones that feel good but don't decide anything
Vanity metrics are numbers that go up and make people feel good, but don't change any decision you'd make. They're usually easy to move, hard to connect to outcomes, and beloved by stakeholders who want to report wins.
Vanity metrics
- Total registered users
- Page views
- App downloads
- Social media followers
- Emails sent
- Features shipped
Actionable alternatives
- Active users (defined precisely)
- Pages per session, task completion
- D7 / D30 retention after install
- Engagement rate, DM conversions
- Email open → action rate
- Features used, adoption rate
The test for vanity: "If this metric doubled tomorrow, what would we do differently?" If the answer is nothing, it's a vanity metric. Cut it from your review deck.
How teams game metrics — and how to prevent it
Goodhart's Law is automatic. The moment a metric becomes a team's goal, people find ways to hit it that don't require doing the underlying thing. This isn't malice — it's how incentives work. Your job as a PM is to design metrics that are hard to game without doing the real work.
Click below to see common gaming scenarios and their consequences:
Interactive — Spot the Gaming
Click each scenario to reveal what happens when teams optimize for it directly.
PM Insight
The antidote to gaming is pairing every primary metric with at least one guardrail that would catch the gaming behavior. DAU + notification opt-out rate. Feature adoption + 30-day repeat usage. Resolution time + reopen rate. Each pairing makes it harder to move one without the other noticing.
The HEART framework
Google's HEART framework is a structured way to think about user experience quality at scale. It's not a replacement for a metrics tree, but it's a useful lens for making sure you're not missing entire categories of user health.
HEART is most useful early in planning. Running through the five letters will often surface a category of user health you'd otherwise ignore — particularly Happiness and Task Success, which product teams frequently skip in favour of purely quantitative metrics.
Counter metrics: the PM's safety net
A counter metric is a metric that should move in the opposite direction if you're gaming your primary metric. The idea is simple: pair every metric you're optimizing with one that catches the failure mode.
| Optimizing for | Gaming risk | Counter metric |
|---|---|---|
| DAU | Notification spam, dark patterns | Notification opt-out rate, D30 retention |
| Activation rate | Forced onboarding, mandatory steps | D7 retention of activated users |
| Revenue per user | Aggressive upsells, dark pricing patterns | Churn rate, NPS, refund rate |
| Session length | Friction-heavy flows, hard to exit | Task completion rate, frustration signals |
| Click-through rate | Clickbait, misleading previews | Post-click engagement, bounce rate |
Defining "active" — the decision most teams skip
"Active user" is the most-cited and least-defined metric in product. Before you can track it, you need to answer:
- What counts as active? Opening the app? Completing a core action? Using a specific feature?
- Over what time window? Daily, weekly, monthly? The right window depends on your product's natural usage frequency.
- Which users are included? All users? Paying only? Above a certain age or region?
A project management tool with one login per week is highly engaged. A news app with one login per week is churning. "Active" means different things for different products — and using the wrong definition produces metrics that look healthy and aren't.
Rule of thumb: usage frequency dictates window
Daily utility products (chat, email, calendar): DAU / WAU
Weekly workflow tools (project management, docs): WAU
Monthly or event-driven (tax software, travel): MAU or cohort analysis
The window should match how often a healthy user would naturally return.
PM Playbook — Questions to ask
- What's our north star? — one number that captures user value and business health simultaneously
- What are the input metrics that cause the north star to move? — run the analysis; don't assume
- What's the counter metric for each thing we're optimizing? — pair every driver metric with a guardrail
- If this metric doubled, what would we do differently? — if the answer is nothing, cut it from the deck
- How is "active" defined? — never accept an active user number without knowing the definition
- Are we measuring leading or lagging? — your review deck should have both
- What are the 5 ways someone could hit this target without improving the product? — name the gaming scenarios before you set the goal