All Articles
Uncategorized

The Governance Moment We’ve Been Avoiding

A developer told an AI agent to migrate his infrastructure. Two and a half years of production data disappeared in seconds. This isn’t a user error story. It’s a structural one.

Sanjay Sethi
7 min read

A developer told an AI agent to migrate his infrastructure to AWS. The AI found duplicate resources. Decided they needed to go. Issued a destroy command. Two and a half years of production data — student submissions, course projects, leaderboards — gone in seconds. AWS restored it within a day. The internet had opinions.

The developer, Alexey Grigorev, founder of DataTalks.Club, recounted the incident publicly. His candour was commendable. He admitted to over-relying on the AI, bypassing manual review of destructive commands, and ignoring Claude’s own warning flags about the approach. Within hours, the verdict on social media had been rendered:

“He told Claude to destroy Terraform. Claude destroyed Terraform. Shocked Pikachu.”

Fair. But here’s what bothers me about that framing: it treats this as a user error story.

It isn’t.


The Problem With “He Should Have Known Better”

Yes — the developer over-relied on AI. Yes, he proceeded despite Claude’s own warnings. Yes, in hindsight, the failure points are clear and individually avoidable.

All of that is also largely irrelevant to the bigger structural question.

Because in the next 18 months, millions of enterprises will put AI agents into production — customer operations, sales workflows, IT automation, financial processing. These agents will have real permissions, real access, real consequences.

And most of those organizations will not have a thoughtful developer who reads the flags, understands the risk model, and makes a deliberate judgment call — however flawed that call turns out to be. They’ll have something much more dangerous:

The real production scenario
  • A VP Operations who approved a pilot three quarters ago
  • A vendor who said “it’s production-ready”
  • A prod environment that nobody actually treated like prod
  • No oversight architecture between AI action and irreversible consequence

The question isn’t why one developer made a bad call under pressure.

The question is: what happens when this is the default operating model at enterprise scale?


The Quiet Version Is Already Happening

Most organizations won’t get an incident as clean as this one — with a clear cause, a recoverable outcome, and a developer willing to explain exactly what went wrong.

I’ve watched the quieter version play out in customer operations more times than I can count.

An AI handles an interaction it wasn’t designed for. Nobody catches it — there’s no oversight layer, just a dashboard showing 74% deflection. The customer churns. The ops team doesn’t connect it to the AI. The model gets credited for the deflection rate; it never gets debited for the damage.

No dramatic story. No AWS rescue. Just slow, invisible erosion of customer relationships — attributed to churn, to market conditions, to anything except the AI system that nobody is actually watching.

This is already the production reality in most enterprises that have deployed AI with any meaningful autonomy. The Grigorev incident is notable because it was sudden and recoverable. The version happening inside customer operations is slow and isn’t.


Sandbox AI and Production AI Are Fundamentally Different Problems

In a sandbox: a mistake is a learning. The failure is contained, reversible, instructive.

In production: a mistake is a consequence. A customer is lost. A database is gone. A policy exception has been made that cannot be unmade.

The difference isn’t the AI’s capability level. It’s what is built around it.

Moving an AI system from sandbox to production without changing the operational architecture around it — the oversight model, the authorization boundaries, the escalation paths — is like deploying any other powerful tool without the safety infrastructure that makes it safe to operate at scale. You wouldn’t do it with a surgical team. You wouldn’t do it with a financial trading desk. You don’t do it because the cost of failure in production is categorically different from the cost of failure in testing.

AI agents in production aren’t different. They’re just new. And their newness has become cover for avoiding a very old and well-understood design discipline: operational control architecture.


What Governance Actually Means in a Hybrid World

When I say governance in this context, I don’t mean compliance checklists. I mean something more fundamental: designing who decides what, and making that design explicit.

The questions every production deployment must answer
  • What is this AI agent authorized to resolve, and what requires a human decision?
  • Who reviews actions before they become irreversible?
  • What is the “stop and ask” threshold — and who designed it, and when was it last tested?
  • How does context survive a handoff between AI and human?
  • Who gets notified when something looks wrong, and what does “wrong” mean in this system?
  • What is the audit trail for consequential AI decisions?

The developer in this story didn’t have that layer. He had the tool, the access, and good intentions. What he lacked was the operational architecture that would have caught the failure mode before it became irreversible.

Most enterprises deploying AI agents right now are in exactly the same position — just at larger scale, with more distributed accountability, and less visibility into where the failure mode lives.


Human-AI Collaboration Requires More Governance, Not Less

There’s a particular kind of optimism about AI that I encounter frequently in enterprise contexts: the belief that as models get better, governance requirements go down. That as the AI becomes more capable and more aligned, the need for oversight infrastructure reduces.

I think this gets it precisely backwards.

As AI agents get more capable — as they handle more consequential decisions, with more autonomy, in more production contexts — the governance infrastructure required to keep them operating safely gets more important, not less. The more powerful the agent, the more important the control layer.

In a hybrid human-AI world, these failures aren’t exceptional events. They’re structural. Predictable. Baked into any system where AI has autonomous access to production without designed oversight.

The solution isn’t more careful prompting. It isn’t better models. It isn’t more cautious developers.

It’s treating AI agents like every other powerful operational resource: defined scope, audit trails, human review gates for irreversible actions, escalation paths that function before the crisis, and continuous feedback loops that make the system smarter over time.

We have always known this for surgical teams, financial traders, pilots, nuclear plant operators. The discipline exists. The methodology exists.

What’s lagging is the willingness to apply it to AI — because applying it requires admitting that the “just deploy it” mentality that works in sandbox environments has a real cost ceiling in production.


The Moment We’re In

Grigorev’s incident will be remembered, if it is remembered at all, as an early and recoverable example of AI production failure. He was lucky: his data was recoverable, his vendor was responsive, and he had the presence of mind to document and share what happened.

The next version of this story won’t be as clean. The organization will be larger. The AI will have access to more. The damage will be more distributed and harder to trace. The post-mortem — if one gets done — will point at human error, change management failures, vendor limitations. Everything except the absence of operational control architecture.

Every enterprise moving AI from pilot to production right now is navigating the same gap. The governance infrastructure for hybrid human-AI operations is not a future problem. It is a present one.

The question for every organization is simply whether they build it deliberately — before the incident — or scramble to reconstruct it after.

Share this article: