Governing AI spend

Your AI bill grew 40% last quarter. Your engineering org didn’t. Somewhere between an enthusiastic adopter on the Platform team and a chatbot in production that’s calling Opus when Sonnet would do, money is leaking. You can see the leak in aggregate. You can’t see where it’s coming from, you can’t set rules to stop it, and even if you could write the rules, you can’t enforce them. That’s not a budget problem — that’s a governance gap. This play is about closing it.

The job

AI governance is the discipline of giving your organisation the visibility, policy, enforcement and process to control AI spend the way you already control every other line item.

It’s not the same thing as AI safety, AI ethics, or AI risk management — though it overlaps with all three at the edges. Governance, in this play, is specifically about the financial and operational controls. Who’s spending. On what. Within what limits. With what consequences if those limits get crossed.

Real governance has four layers. Skip any of them and the system breaks down.

Why it matters

Lots of vendors call themselves “AI governance” tools. Most of them are dashboards. A dashboard tells you what already happened. It can’t stop anything. It can’t even reliably tell you who’s responsible.

Meanwhile, the line item that needs governing the most is the one that’s growing fastest, with the lowest visibility, and the loosest controls. AI spend doesn’t sit cleanly in any of the existing buckets — it’s not headcount, it’s not pure SaaS, it’s not pure cloud. The processes you already have for the other buckets don’t apply.

The cost of getting this wrong compounds. Every quarter you don’t have governance, the spend grows, the attribution drift gets harder to clean up, and the wasted dollars become baked into your run rate.

The four layers

1. See

Visibility is the first layer. Every other layer depends on it. You can’t decide policy on something you can’t see, you can’t enforce a rule against an event you didn’t capture, and you can’t resolve a violation you didn’t notice.

Good visibility means every token tied to a person or a service, a project, a cost centre and a capitalisation state — across all your providers, across both developer AI and production AI. That last bit matters. Most teams who try to roll their own visibility cover one face of AI spend and ignore the other.

If you’re at this layer only, you have a dashboard. That’s a start. Don’t pretend it’s governance.

2. Decide

Policy is the second layer. Visibility tells you what’s happening; policy says what should be happening.

What lives in policy: approved providers, approved models, per-team caps, allowed providers per service, capitalisation rules per project stage, expense thresholds that trigger different review processes. What doesn’t live in policy: anything you can’t actually enforce. Don’t write policy you can’t apply.

The hard part isn’t writing the policy — it’s owning it. Each policy needs an owner who can say yes or no to exceptions, and a versioned history so you can show an auditor what the rule was on the day a particular decision was made.

3. Enforce

Enforcement is the third layer, and the one most “governance” tools collapse on. An alert that an engineer has breached policy is not enforcement. It’s an email. Enforcement means the request that violates policy doesn’t go through — or goes through with conditions.

There are three places to enforce, in increasing operational invasiveness:

Provider edge. When your provider supports IP allowlisting (most do, on enterprise tiers), the provider itself rejects requests that don’t come from approved sources. There’s no workaround.
Key provisioning. When keys are issued with budget caps, scope, and expiry baked in, misuse fails the first time, not the hundredth. This is boring credential hygiene applied to the highest-spend credential in your stack.
Inline. For the services and teams where policy actually has to bite mid-request, controls applied at request time — rate limits, model substitution, mid-session caps. Most teams don’t need this for everything. Reach for it where the leak is real.

Pick the lightest-touch enforcement that actually works for each policy. Don’t run inline enforcement on every developer’s machine if a budget cap on the key would do it.

4. Resolve

The fourth layer is what happens after a violation. This is the one that gets skipped most often, and it’s the one that turns governance from “rules” into “process.”

A graduated response works better than a blanket one. The first time a developer drifts, it should be a private nudge — usually they self-correct. The second time, the manager surfaces. Only when something escalates does it become a formal evidence pack with a paperwork trail.

The reason graduated response matters: most violations are accidents. A quiet engineer doesn’t realise they’ve been spinning Opus on side experiments for three weeks. The right response is a heads-up, not a meeting with HR. Treat people like adults the first time, and they overwhelmingly behave like adults.

The reason the evidence pack matters: when something genuinely needs to escalate, you need a defensible record. Timestamps, project-match confidences, prompt feature summaries, cumulative off-project spend, warnings issued, responses. Built up automatically, not reconstructed.

Why all four layers matter

A tool that does only the first layer is a dashboard. A tool that does layers one and two is a policy tracker. A tool that does layers one, two and three is a control plane. Only a tool that does all four is actually governing AI spend.

The temptation, when you’re starting out, is to do the visibility layer and call it done. Don’t. Visibility without policy creates the illusion of control without the substance. Policy without enforcement is a wish list. Enforcement without a resolution process turns every breach into a fight.

Build all four. They’re cheaper together than separately.

What this looks like over time

Month one: visibility is in place across most of your AI spend. Some providers are still missing — that’s fine.

Month three: policy is written and owned for the four or five biggest cost centres. The rest still operates under “best effort,” but you’re tightening it.

Month six: enforcement is live for the high-spend services. The graduated response process has been triggered a dozen times — mostly self-correcting nudges, two manager surfacings. Nobody’s been formally escalated. Spend growth has flattened.

Year one: AI governance is just how you run AI now. The conversation has shifted from “what are we even spending?” to “what’s the marginal value of the next dollar?”

That’s the point of governance. Not to police the spend — to free up the conversation about what to do next.