Chapter 10

What Does It Cost to Run This?

Cloud bills, infra cost literacy, unit economics — running cost is a function of architecture, usage, and time, with architecture as the multiplier you sign off on.

⏱ 17 min read 🧭 Decision

The $75M/year decision

Picture Dropbox in 2014. Roughly two hundred engineers, hundreds of millions of users, and an AWS bill walking steadily toward seventy-five million dollars a year — most of it S3, holding the user files that are, in any meaningful sense, the product. Every uploaded photo and synced folder lands in someone else's storage and pays rent there forever. The line on the chart isn't bending. It's getting steeper, because the company is getting bigger, and the storage footprint grows with every new user who signs up.

Rewind to 2007. Drew Houston and Arash Ferdowsi have about twenty engineers and a product to ship. They pick S3 for object storage, and they pick it for the reason any sane founder picks it: building exabyte-class storage from scratch would have cost them eighteen months they didn't have, against a problem AWS had already solved. The decision was correct. It bought Dropbox the runway to become Dropbox.

Seven years later, that same decision is the line item eating the P&L. The architecture that saved eighteen months in 2007 is locking in seventy-five million dollars a year in 2014, and the curve keeps climbing. So Dropbox does something most companies its size never seriously consider: it decides to build its own storage infrastructure. The project is called Magic Pocket. It takes roughly two and a half years, hundreds of engineers across multiple phases, and something on the order of a hundred million dollars of engineering effort. By 2016, more than ninety percent of user data lives on Dropbox-owned hardware. The run-rate savings are real enough that the 2018 S-1 calls them out as material. The project that cost a hundred million dollars once pays back seventy-five million dollars every year, forever.

Every cloud bill is the present-value bill for an architecture decision made years ago. The PM signs off on the decision; the bill arrives later.

The three dimensions of cost

Dropbox's seven-year arc isn't a one-off horror story. It traces the three dimensions every cloud bill obeys: architecture, usage, and time. Name them now, because the rest of this chapter is just learning to read a bill through these three lenses.

1. Architecture

Architecture is the multiplier. It sets the shape of the cost curve before a single user shows up, and different shapes behave nothing alike. A serverless stack gives you a near-zero baseline and a relatively high per-request cost — you pay almost nothing when idle, and you pay a premium every time someone actually uses the thing. A single-region monolith inverts the deal — a high fixed baseline for the servers running whether anyone's home or not, and cheap marginal cost on every additional request once the box is up. A multi-region microservices fleet is the expensive both-and — high baseline and complex marginal economics, redeemed only by amortization at very large scale, and paid for in engineer-hours forever. You don't pick a price when you pick an architecture. You pick a curve.

2. Usage

Usage is the variable that scales along the curve you've already chosen. It's roughly DAU × requests-per-user × data-per-user, and it tells you where on the curve you currently sit — but it can't change the curve's shape. That distinction matters more than it sounds. If your architecture is linear, 10× user growth is 10× cost; the bill tracks the headcount one-for-one. If your architecture is sub-linear — caching well, amortizing fixed costs, sharing work across requests — that same 10× growth comes in at 4–6×. Same users, same product, very different invoice. Usage is what you're measured on. The curve is what you signed up for.

3. Time

Time is the accumulator. It doesn't multiply cost on its own — it compounds the decisions already made. A choice you make in year one becomes a constraint in year three and a forced migration in year five, and by then the cost of undoing it is an order of magnitude past the cost of choosing differently up front. That's the Dropbox arc in one sentence. Dropbox's S3 architecture in 2007 cost almost nothing. By 2014 it cost $75M a year. The usage scaled; the architecture didn't change; time did its work.

COST = ARCHITECTURE × USAGE × TIME

The architecture decision is the multiplier that compounds.

Where the cost actually goes

Section 2 named the dimensions. This one names where the money goes. Four categories carry almost every cloud bill, and knowing which one scales which way is half the literacy.

Compute

CPU and memory for app servers, background workers, batch jobs — anything that spins up to answer a request or chew through a queue. For most architectures compute scales sub-linearly with users, because a single box amortizes its cost across thousands of requests. A typical request-response API runs somewhere around $0.0001 to $0.001 per request once it's at scale, and that number isn't a guess you need to make — engineering can tell you within an order of magnitude in an afternoon. The PM-shaped read is two questions: what is our cost per request today, and which direction is it trending? If it's drifting up, something architectural is leaking.

Storage and egress

Disk space — object stores, databases, backups — plus network egress, which is data leaving the cloud. Egress is the silent killer. The per-GB price looks small on the pricing page and compounds into something genuinely alarming at scale. AWS charges roughly $0.09 per GB to send data out and roughly nothing to take it in, so at one petabyte a month outbound you're paying $90,000 before any of your servers have done a thing. The PM-shaped read is to ask what fraction of the bill is egress. If it's north of 20 percent, an architectural rethink — CDN, regional placement, response shape — is on the table, and the engineers will already know it.

Vendor fees

Third-party SaaS — observability (Datadog), error tracking (Sentry), email (SendGrid), payments (Stripe), AI APIs (OpenAI, Anthropic). Vendor fees are priced per seat or per event, which means they scale linearly with usage and creep up over time as the company adopts more of them. By the time a product is a few years old, vendor fees are the largest non-compute line on the bill, and nobody set out to make them that — they accumulated one renewal at a time. The AI subsection later in this chapter returns to this category, because AI APIs have their own scaling dynamics that don't behave like traditional SaaS.

People

The cost dimension you forget. Microservices saves compute and costs engineers — every new system needs an owner, an on-call rotation, observability, runbooks, and at least one person who understands it well enough to fix it at 3 AM. People cost is the single largest line item once compute has been amortized down, and it's the one that scales with architecture complexity rather than user count. Doubling your users doesn't double your engineering team. Doubling your services does. The PM-shaped read is two questions again: what's the on-call burden per system today, and how many engineers does it take to keep the architecture you're choosing running for the next three years?

Try it: cost per user calculator

Pick a scenario, then tweak architecture, usage, and AI parameters. The simulator shows cost per user and what happens at 10× scale.

Tune the assumptions

Daily active users

Architecture

Compute cost per request (¢)

Storage GB per user

Egress GB per user/month

Requests per user per day

AI usage

Model tier (if AI enabled)

Cost/user

—

Monthly total

—

At 10× users

—

AI share

stretching

Reading a cloud bill

The decomposition in Section 3 is only useful if you can apply it to a real invoice. The bill itself is a list of services consumed. Three reading moves turn that list into a decision document.

The line items

A cloud bill is a list of services consumed — EC2, S3, RDS, CloudWatch, a dozen others depending on what the team has wired in. Ask to see it broken down by service, not aggregated into one round number. The format alone tells you whether the team is paying attention; an aggregated bill is a team that hasn't asked the question yet. The broken-down version is where every other conversation in this chapter starts — you can't reason about architecture, usage, or time without knowing which services carry the spend.

The growth curve

One month of bill is noise; six months is signal. Pull the last six invoices and look at the growth rate per service, not the absolute number. A service growing 5% month over month is on track for 80% of total cost inside a year — that's the line you want to find before it finds you. A flat curve that suddenly spikes means a feature shipped and changed the shape of demand. The question to land on: which service has the steepest curve, and what shipped six weeks ago that bent it?

The 80/20 cost

Three services account for 80% of any mature cloud bill, and one of those three is preventable — over-provisioned instances, a leaked log stream, a misconfigured replication, a forgotten dev cluster running at production size. The PM-visible move isn't to ask whether to negotiate with AWS; it's to ask which three services dominate the bill and which one is the misconfig. The negotiation question is real, but it's downstream of the preventable-misconfig question by an order of magnitude. Fix the leak first; negotiate the rate second.

PM Insight

Ask to see the cloud bill broken down by service. Three services account for 80% of the cost — and one of those is preventable.

How this changes by stage

At finding fit: Cost is irrelevant. Burn through credits. Optimizing infra spend before product-market fit is yak-shaving the multiplier that doesn't yet matter.

At operating at scale: The cloud bill is a P&L line item the CFO tracks weekly. COGS as % of revenue gets reported to the board. The architecture decisions from earlier stages are the cost curve the company now lives with.

When the architecture has to change

There's a moment when "spin up another server" stops working. Catching that moment early is the difference between a 6-month migration and a 30-month one.

The trigger signals

The tells are concrete. Cost growing faster than users. A single vendor's share of the bill crossing 50%. New features blocked because the infra to support them costs more than the feature earns. On-call burden eating the roadmap, sprint after sprint. Each one's a leading indicator that the architecture multiplier has tipped. The PM-shaped move is to track the ratio of cost growth to user growth quarterly. When cost outruns users by 2× for two quarters running, the architecture's the next conversation.

The cost of waiting

Migration is expensive. Waiting is more expensive. Dropbox's 2.5-year move off S3 cost roughly $100M of engineering time and recovered around $75M a year, forever. Waiting another year would have meant another $75M unrecoverable, plus a harder migration off a bigger pile of S3 data. The math at scale is unforgiving: the cost of waiting compounds, the cost of migrating is fixed. The decision isn't whether to migrate — it's whether to migrate this year or pay another year of rent on the wrong architecture.

The migration playbook

Migrate the heaviest cost line item first — the one service that owns most of the bill. Run old and new in parallel long enough to verify parity, not a day longer. Move one customer cohort at a time so a bad week is contained instead of catastrophic. The PM-shaped read is that the migration plan is a roadmap document, not an engineering task. Stakeholders span finance, customer success, and the CTO, and the cross-functional plan needs an owner. That owner's usually you.

The cheapest server is the one your architecture didn't make you spin up.

What changes when AI is in the loop

The question: What does AI change about the cost of running this?

What's changed. Four things are concretely different:

A new cost shape: per-call, per-token, per-context-window. Traditional compute pricing is per-CPU-second or per-request — bounded, predictable, sub-linear with caching. AI inference pricing is per-token in + per-token out + per-context-window, and the variance is enormous. A 5K-token context call to GPT-4-class flagship can cost 10–50× a 500-token call to a mid-tier model.
Model tier is a new lever — 10–100× cost difference for the same surface. Picking flagship vs mid-tier vs cheap is the highest-leverage cost decision in any AI-feature architecture. The PM-shaped read: which surfaces deserve flagship, which can run on cheap, and which can be cached entirely?
AI inference scales linearly with usage. No sub-linear advantage of cloud applies — each call costs roughly the same, regardless of how many calls you make. The architecture multiplier from §2 is muted; usage becomes a near-direct multiplier on cost.
Latency-vs-cost is a UX-shaped tradeoff. Faster models cost more per token. Streaming responses look fast but bill at the same rate. The PM owns the calibration: which surfaces tolerate 8-second responses for half the cost, and which surfaces require sub-second for the full price?

What hasn't changed. The chapter's whole argument: the architecture × usage × time frame still applies. AI is a new cost shape inside the architecture dimension, not a replacement for the frame. Cost decisions made at design time still compound; the CFO still wants COGS as a % of revenue; the cost curve is still locked in by the architecture choice.

A second thing that hasn't changed: who signs off. The model-tier decision, the per-call rate decision, the caching strategy decision — all PM-shaped. The engineering team can build any of them; the question of which one matches the product's economics is the PM's.

What to watch for.

AI-as-feature vs AI-as-everything cost shapes. AI-as-feature (one tool inside a larger product) has a bounded cost line that scales with feature usage. AI-as-everything (chat is the product) has a cost line that scales with session length and is the primary COGS. Different businesses entirely; recognize which one you're in.
Caching strategies as cost compression. Per-call costs become per-user storage costs when you cache. The trade: storage is sub-linear and predictable; per-call is linear and unbounded. For surfaces with repeated queries, caching is the highest-leverage cost lever after model tier.
The model-tier strategy. Which surfaces deserve flagship? Usually: the ones that are the product's differentiator. Which can run on cheap? Usually: classification, summarization, ranking, auto-complete. Defaulting to flagship for everything is the AI-cost equivalent of running a single-region monolith on global infrastructure.
Reserved capacity vs spot pricing for inference. Major AI providers offer reserved capacity at significant discounts for predictable workloads. The PM-shaped trade: what fraction of inference is steady-state (reserve it) vs spiky (use spot or on-demand)?

PM Playbook — Questions to ask

The next time a cloud bill, architecture, or AI-feature decision lands on your desk, try these:

"What does this cost per user today, and at 10× users?" — Usage is the variable; the curve is the architecture.
"Which three services account for 80% of our cloud bill?" — Three services usually carry it; one is almost always preventable.
"What's the architecture decision that's locking in this cost shape?" — Names the multiplier. Surfaces the load-bearing decision.
"If we have an AI feature, what % of total COGS is it — and what's our model-tier strategy?" — Forces explicit tiering instead of defaulting to flagship for everything.
"What's the trigger signal that says it's time to migrate?" — Avoids "should have migrated two years ago."
"Which costs are sub-linear with users, and which are linear?" — Compute/storage often sub-linear; AI per-call is linear; vendor floor is fixed.
"What's the people-cost of this architecture — on-call, ops, observability headcount?" — The often-invisible cost dimension.

Check your understanding 4 questions