AI hallucinations in delivery: a field log

5 June 2026·8 min read

Most writing about AI hallucinations stays abstract — "the model can be wrong, verify outputs." Fine, but useless. What does it actually look like when AI quietly wrecks a delivery artifact?

Five things we've seen go wrong in the last twelve months, all anonymised, all from real PMOs. None of them are the cartoon "AI invented a fake citation" case. They're worse — small, plausible, easily missed.

1. The dependency that didn't exist

A PM asked an AI to summarise dependencies across three workstreams from a 40-page programme brief. The output listed "Workstream B blocked by vendor X's API delivery, est. Q2." Vendor X had no API in scope. The AI had pattern-matched a similar sentence from elsewhere in the document and confidently inserted it.

Caught: Three weeks later by the workstream lead, who was being asked about a dependency he'd never agreed to.

2. The risk that mutated

The PM pasted the RAID into an AI and asked for "risks with no recent update." The AI returned a list — and changed the severity rating on two of them. Not by much. R-04 went from "Medium / High" to "High / High." The PM didn't notice until the next steerco.

Caught: Sponsor asked when severity had moved. PM couldn't say.

Why this one matters

Numeric mutations are the most dangerous hallucination class. Words can be sense-checked at a glance. Numbers slide through.

3. The date that drifted

AI was asked to roll forward a project plan after a two-week slip. It moved everything by two weeks — except one milestone, which it moved by ten days, because the prompt had referenced "ten working days" somewhere two paragraphs earlier. The model averaged the signal.

Caught: By the milestone owner, who got an email confirming a date she hadn't agreed to.

4. The stakeholder who didn't say that

Meeting transcript in, "decisions made" out. The AI summarised a decision as "Sponsor agreed to defer Phase 2 to FY27." The sponsor had said no such thing. He had said "we may need to look at deferring," which AI confidently upgraded to a decision.

Caught: The sponsor read the minutes, called the PM, was unimpressed.

5. The benefits case that inflated

The PM asked an AI to "tighten" a benefits realisation paragraph for a steerco pack. The original said "estimated £1.2m annualised saving based on early indicators." The AI's tightened version: "delivering £1.4m annualised saving." The number got rounder, the hedge got deleted.

Caught: By finance, who had the original model and wanted to know where the extra £200k came from.

The pattern

None of these are dramatic. None look like obvious hallucinations. They all share three features:

Plausible. The fabricated detail fits the surrounding context. That's why it survives the skim-read.
Compressed. They emerge from summarisation, rewriting, or "tightening" — operations where the PM expects the content to change a little. The lossy boundary is where things slip.
Specific. Numbers, dates, severity ratings, named stakeholders. The exact details that downstream readers act on.

Three habits that catch them

Diff before ship. If AI rewrote a paragraph, diff the original against the new version. Eyes go to changed numbers and deleted hedges automatically.
Never let AI invent specifics. Add "If a number, date, name, or rating isn't in the source, say UNKNOWN — do not infer" to every summarisation prompt. It works.
Forbid tone changes on quoted material. Transcripts, decisions, sponsor quotes — keep them verbatim. Tighten the framing around them instead.

The honest takeaway

AI hallucinations in delivery work don't look like the demos. They look like a slightly wrong date in a plan refresh. The cost is paid by the PM's credibility, weeks after the fact. Build the diffing and constraint habits early — the next pilot's reputation depends on them.