All insights
AI Risk
Quality
Delivery

AI hallucinations in delivery: a field log

·8 min read

Most writing about AI hallucinations stays abstract — "the model can be wrong, verify outputs." Fine, but useless. What does it actually look like when AI quietly wrecks a delivery artifact?

Five things we've seen go wrong in the last twelve months, all anonymised, all from real PMOs. None of them are the cartoon "AI invented a fake citation" case. They're worse — small, plausible, easily missed.

1. The dependency that didn't exist

A PM asked an AI to summarise dependencies across three workstreams from a 40-page programme brief. The output listed "Workstream B blocked by vendor X's API delivery, est. Q2." Vendor X had no API in scope. The AI had pattern-matched a similar sentence from elsewhere in the document and confidently inserted it.

Caught: Three weeks later by the workstream lead, who was being asked about a dependency he'd never agreed to.

2. The risk that mutated

The PM pasted the RAID into an AI and asked for "risks with no recent update." The AI returned a list — and changed the severity rating on two of them. Not by much. R-04 went from "Medium / High" to "High / High." The PM didn't notice until the next steerco.

Caught: Sponsor asked when severity had moved. PM couldn't say.

Why this one matters
Numeric mutations are the most dangerous hallucination class. Words can be sense-checked at a glance. Numbers slide through.

3. The date that drifted

AI was asked to roll forward a project plan after a two-week slip. It moved everything by two weeks — except one milestone, which it moved by ten days, because the prompt had referenced "ten working days" somewhere two paragraphs earlier. The model averaged the signal.

Caught: By the milestone owner, who got an email confirming a date she hadn't agreed to.

4. The stakeholder who didn't say that

Meeting transcript in, "decisions made" out. The AI summarised a decision as "Sponsor agreed to defer Phase 2 to FY27." The sponsor had said no such thing. He had said "we may need to look at deferring," which AI confidently upgraded to a decision.

Caught: The sponsor read the minutes, called the PM, was unimpressed.

5. The benefits case that inflated

The PM asked an AI to "tighten" a benefits realisation paragraph for a steerco pack. The original said "estimated £1.2m annualised saving based on early indicators." The AI's tightened version: "delivering £1.4m annualised saving." The number got rounder, the hedge got deleted.

Caught: By finance, who had the original model and wanted to know where the extra £200k came from.

The pattern

None of these are dramatic. None look like obvious hallucinations. They all share three features:

  • Plausible. The fabricated detail fits the surrounding context. That's why it survives the skim-read.
  • Compressed. They emerge from summarisation, rewriting, or "tightening" — operations where the PM expects the content to change a little. The lossy boundary is where things slip.
  • Specific. Numbers, dates, severity ratings, named stakeholders. The exact details that downstream readers act on.

Three habits that catch them

  • Diff before ship. If AI rewrote a paragraph, diff the original against the new version. Eyes go to changed numbers and deleted hedges automatically.
  • Never let AI invent specifics. Add "If a number, date, name, or rating isn't in the source, say UNKNOWN — do not infer" to every summarisation prompt. It works.
  • Forbid tone changes on quoted material. Transcripts, decisions, sponsor quotes — keep them verbatim. Tighten the framing around them instead.

The honest takeaway

AI hallucinations in delivery work don't look like the demos. They look like a slightly wrong date in a plan refresh. The cost is paid by the PM's credibility, weeks after the fact. Build the diffing and constraint habits early — the next pilot's reputation depends on them.