AI-Augmented Delivery

Pilot purgatory

Your AI pilot worked. Congratulations — you've cleared the easy 10%. A demo runs on curated data, one willing user, and no consequences. Production runs on the real board, a skeptic, an integration nobody scoped, and someone's name on the output. That gap is where most AI programs quietly die.

Loy O'Kelley · Program Director · 7 min read · Published July 1, 2026

Gartner predicted that at least 30% of generative AI projects would be abandoned after proof of concept by the end of 2025 — citing poor data quality, inadequate risk controls, escalating costs, and unclear business value. Sit with that list for a second. Not one of those is a modeling problem. Every one of them is a delivery problem — and more precisely, every one of them is a part of the job the pilot was built to skip.

I've watched this sequence on more programs than I can count. The pilot is a triumph. The agent drafts the status report, flags the risk, summarizes the thread, and the room lights up. Six months later the capability is a folder of screenshots and a login nobody uses. Nothing about the model got worse. The program just ran into everything the demo was carefully arranged to avoid — and it turned out that everything was the actual work.

The demo is the easy part

A pilot is engineered to succeed. That isn't cynicism; it's the whole point of a pilot. You pick a clean use case, feed it good inputs, hand it to someone who wants it to work, and you keep it well clear of the security review, the integration backlog, and the finance team. Under those conditions almost any competent tool looks like magic. The trouble is that none of those conditions survive contact with production — so a pilot tells you the tool can work while telling you nothing about whether it will work here, and those are not the same question.

1. The sandbox that flatters the tool

The pilot ran on a curated slice of data somebody cleaned up first. Production runs on the real system of record — the board nobody groomed and the register nobody updated. What you clocked as capability in the demo was partly just the quality of the inputs you hand-picked. Point the same agent at the live data and the output regresses, sometimes hard. If your pilot never touched the messy record, you didn't test the tool. You tested the sandbox.

2. The integration nobody scoped

The demo was a standalone — a chat window, a sample file, a screen to point at. Production means wiring the thing into Jira, the data warehouse, single sign-on, and whatever the security team requires before it goes anywhere near a real record. That plumbing is most of the work and none of the applause, which is exactly why it gets left out of the pilot and then ambushes the program in month three. The capability was never the hard part. Getting it to live inside systems that already exist, under rules that already apply, is the hard part.

3. The owner who was never named

Every pilot has a champion — the person excited enough to run it on their own time. Production needs something different: an owner accountable for the output, the same way someone still has to own the signature. A champion volunteers; an owner is on the hook. When the pilot ends and no one's name is attached to keeping the capability fed, integrated, and trusted, it reverts to the manual process it was meant to replace — not with a decision, just with a slow, quiet drift back to the way things were.

The principle

A pilot answers one question: can it work? A program answers a much harder one — will it work here, every day, on real data, owned by someone, at a cost we'll keep paying? The demo was never the hard part, and treating a good demo as a finished project is how you land in the 30% that never ship.

Build the pilot as a rehearsal, not a magic trick

In the Army we never confused a rehearsal with the mission. The rehearsal exists precisely to expose what the mission will break — you run it hard, under realistic conditions, so the failures happen while they're still cheap. A pilot engineered to impress teaches you nothing you can use. A pilot engineered to surface what production will demand is worth every hour. The whole difference is in what you choose to test.

So test the parts that actually decide the outcome. Run the pilot on real, messy data from the start, not a set someone sanitized for the demo. Hand it to your skeptic rather than your enthusiast — the person who'll use it grudgingly is a far better instrument than the one who already wants it to win. Scope the integration, the security review, and the cost at full scale before the demo, not after, because those are the three things most likely to kill the project and the three things a flashy pilot is designed to hide. And name the production owner on day one, while it's still cheap to argue about, not in the scramble after the applause dies down.

Most of all, change what "done" means. A pilot isn't done when it demos well. It's done when it's running in production, on the real record, with an owner who answers for the output. Anything short of that is a rehearsal you've mistaken for the mission — and rehearsals deliver nothing except the comforting illusion that you're further along than you are.

How to actually do this

Pilot on real, messy data — not a cleaned-up sample. If it never touched the live record, you tested the sandbox, not the tool.
Recruit the skeptic as your pilot user. Enthusiasts prove it can delight; skeptics prove it can survive an ordinary Tuesday.
Scope integration, security, and cost-at-scale before the demo. Those are the failure modes that abandon projects after POC — surface them while they're cheap.
Name the production owner on day one. A champion volunteers; an owner is accountable. No owner, no program.
Redefine "done" as in production and owned, not demoed. What counts as finished decides what actually ships.

The bottom line

The graveyard of AI initiatives is full of projects that demoed beautifully. The ones that die after proof of concept don't die of weak models — they die of poor data, missing controls, runaway cost, and unclear value, which is to say they die of everything the demo was built to avoid. A pilot is a promising start and nothing more. The work is the last mile nobody claps for: the real data, the integration, the cost, the name on the output. Run the pilot to expose that work, not to hide it — otherwise you don't have a program. You have a very good screenshot.

Abandonment figure from Gartner, "Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept By End of 2025" (July 2024). Views are my own.