AI-Augmented Delivery

Garbage in, gospel out

Point an agent at a dishonest Jira board and it won't tell you the board is lying. It will summarize the lie — fluently, confidently, in executive-ready prose. Before AI can make your program smarter, your system of record has to deserve the trust you're about to automate.

Loy O'Kelley · Program Director · 7 min read · Published June 10, 2026

Every AI capability I've written about on this site — honest automated reporting, early risk detection, the whole delivery stack — rests on one assumption that almost nobody checks before they start: that the system of record reflects reality. The agent reads Jira, the schedule, the risk register, the messages. If those are true, you get a faster, more honest program. If they're not, you get something genuinely worse than what you had before.

Here's why it's worse, not just neutral. The old failure was visible. A stale deck looked stale. A risk register nobody had touched since kickoff smelled like exactly what it was, and an experienced reviewer discounted it on sight. AI removes that signal. It takes tickets nobody updated and a register nobody groomed and renders them as a crisp, well-structured, confident report — fiction with production values. The data didn't get better. It got laundered. Garbage used to come out looking like garbage. Now it comes out looking like gospel.

And the underlying data is usually worse than anyone wants to believe. When researchers had managers actually score their own records, 47% of newly created data records contained at least one critical error, and only 3% of companies' data met basic quality standards. That study wasn't about project data specifically — in my experience, project data is worse, because nobody's paycheck bounces when a ticket status is wrong.

Where the record goes quietly wrong

Program data doesn't rot dramatically. It rots in three ordinary ways, and each one poisons a different AI capability.

1. The board that describes last sprint

Tickets marked "in progress" that nobody has touched in nine days. Stories closed in a batch on Friday because the sprint was ending, not because the work was done. An agent reading that board reports velocity that doesn't exist — and the automated weekly, the one you built specifically to stop humans negotiating yellow into green, now does the negotiating for you. You wired reporting to the source of record to make it harder to lie to. If the record itself lies, you've just moved the lie upstream, where it's harder to see.

2. The risk register as graveyard

Most registers are written once, at kickoff, in the language of things that already worried someone. Then the program changes and the register doesn't. An agent doing pattern detection across a dead register isn't doing early warning — it's doing archaeology. The risks that kill programs are the ones that emerged in month four and never got written down, and no model can surface what the team never recorded.

3. The fields nobody fills

Dependencies, due dates, effort estimates, owners — the structured fields that make a board machine-readable are exactly the ones humans skip when they're busy. A human PM compensates with hallway knowledge. The agent can't. It either treats the empty field as truth (no dependency recorded, so none exists) or guesses. Both are how an automation confidently re-sequences work straight into a wall that everyone on the team knew was there.

The principle

AI doesn't add truth to a program. It amplifies whatever the record already says — accuracy or fiction, at equal speed and equal polish. Data hygiene isn't an admin chore anymore. It's the load-bearing wall.

Hygiene as a governance function

The Army drilled one habit into me that translates directly: you maintain the weapon before the mission, not during the firefight. On the programs I run, data hygiene gets the same standing as any other control — written down, owned, and checked. It looks like this. Every machine-read field has a definition of "current" — a ticket status is current if it's been touched within the sprint, a risk is current if it's been reviewed within two weeks — and anything outside that window is stale by rule, not by judgment. The agents are configured to flag staleness rather than read around it: a report built on a board that's 30% stale says so, in the first line, before any human mistakes polish for truth. Hygiene itself shows up as a number in the weekly — percentage of current tickets, register review age — so the leadership team watches the foundation, not just the output. And ownership is explicit: the team owns the truth of its tickets the same way the artifact owner owns the signature. Nobody gets to treat the board as someone else's paperwork.

None of this is glamorous, which is exactly why it gets skipped. Clients want to talk about agents and models; almost nobody wants to talk about whether their dependency fields are filled in. But I've watched the sequence enough times to call it a rule: programs that skip the hygiene work get an impressive demo and then a quiet loss of trust, because the third time the AI-drafted report says something the room knows is wrong, the room stops reading it. The tooling didn't fail. The data did — and the tooling took the blame.

How to actually do this

Audit the record before you wire anything to it. Score a sample of tickets and risks against reality — the number will be worse than you expect.
Define "current" per field, in writing. Staleness becomes a rule the machine can check, not a feeling a human has to defend.
Make agents disclose data quality, not mask it. A confident report on a stale board is the failure mode — force the caveat into the first line.
Put hygiene metrics in the weekly. What leadership watches, teams maintain.
Assign ownership of the record itself. A board everyone uses and nobody owns will always drift toward fiction.

The bottom line

The least exciting work on an AI-augmented program is the work that decides whether any of it is real. Clean data won't show up in a demo and nobody gets promoted for a well-groomed risk register. But every capability you actually want — reporting that tells the truth, risk detection that fires early, automation that doesn't walk into walls — inherits the discipline of the record underneath it. Maintain the weapon before the mission. The programs that treat hygiene as governance get AI that compounds their honesty. The ones that don't get fiction, faster.

Data quality figures from Nagle, Redman & Sammon, Harvard Business Review (2017). Views are my own.

Where the record goes quietly wrong

1. The board that describes last sprint

2. The risk register as graveyard

3. The fields nobody fills

Hygiene as a governance function

The bottom line

Want AI built on a record that tells the truth?