Written for: CISO Head of Audit

AI assurance evidence for auditors

AI assurance evidence is the gap most programmes ignore. Here is how to build a control-to-evidence map your internal auditor can test independently.

By Giovanni Salvador · 12 June 2026 · 6 min read

A control that operates but cannot be shown to operate is, to an auditor, indistinguishable from one that does not exist.

Every CISO I speak to at regulated firms has built real AI controls: model allow-lists, gateway logging, kill switches, human-in-the-loop gates on consequential decisions. The gap is almost never the control. The gap is the evidence that proves the control ran, on the day the auditor asks about, against the system under review. When that evidence is missing, you spend three weeks reconstructing what the AI estate did from logs nobody curated and decision records nobody kept.

AI assurance evidence is not a documentation exercise you schedule before an audit. It is a design discipline you apply when you build the control. If the evidence is not accumulating automatically, it is not there when you need it.

The stake

The pressure is real and getting sharper. DORA operational resilience testing requirements ask firms to demonstrate that ICT controls operate as designed, not just that they exist in a policy document. AI Act supervision obligations require that high-risk AI systems include effective human oversight and that this can be evidenced. Internal auditors are being asked by their own audit committees to cover the AI estate with the same rigour applied to any other technology risk.

The question they bring to the table is: show me. Show me the control exists. Show me it ran over the period. Show me someone with authority reviewed it.

That three-part question maps to three evidence types, and the firms that answer it cleanly are the ones that designed for those types from day one.

Three evidence types, one control

Every AI control in your estate produces, or should produce, one of three evidence types.

Design evidence is proof the control exists and is specified: written policies, architecture diagrams, configuration-as-code, the criteria that gate a model release. Design evidence answers “does the control exist?” It is the easiest to create and the most common to confuse with proof that anything actually runs.

Operating evidence is proof the control ran and produced an effect: logs, decision records, alert histories, test results, records of HITL approvals. Operating evidence answers “did the control operate over the period?” It is the type auditors most often cannot find, because it exists in a system rather than a document and nobody connected the two.

Oversight evidence is proof that someone with authority reviewed the control: board or committee minutes, sign-offs on configuration changes, exception registers, accountable-owner attestations. Oversight evidence answers “did someone govern it?” It is the type that trips firms up most often with AI, because AI governance accountability has often not been formally assigned.

An assurance map that captures all three for each control answers the auditor’s standing question. A map that captures only design evidence gives you policy documentation, which is not the same thing.

The control-to-evidence map

The assurance map is the spine of AI audit readiness. For each control in your estate, you need four things recorded: where the design evidence lives, where the operating evidence lives, where the oversight evidence lives, and who owns each.

Think of it as an index, not a warehouse. The map points at evidence; it does not copy the evidence into a document. A log store reference with a retention window and an owner is the right entry. A copied-in screenshot is not. Copied-in screenshots are expensive to maintain, easy to fabricate, and hard to sample independently.

Some controls are easier to map than others. An AI gateway is a good example of a control that consolidates evidence: it is the chokepoint for model egress, so its logs cover prompt and response records, redaction events, model allow-list decisions, rate-limit enforcements, and residency-routing records, all in one place. If your gateway is configured and logging, an auditor can go to one system and sample across multiple controls at once.

But the gateway is not a substitute for controls that sit elsewhere. An auditor reviewing whether human-in-the-loop gates fired on high-consequence decisioning workflows will not find that evidence at the gateway. They will find it in your workflow system’s approval records, or in the agent-runtime logs that capture HITL outcomes per decision. The map has to send them to the right place for each control.

What auditors actually test

Internal audit is your third line, and a well-built assurance map is what lets it do its job on AI without needing specialist AI knowledge for every engagement. Readiness means three things.

Testability. Every control in the map exposes evidence the auditor can pull independently. They should not need to ask you to run a query on their behalf. The logs should be accessible, the retention window should cover the period, and the sample should be directly interpretable.

Traceability. An auditor should be able to start from a single high-risk AI use case in your inventory, follow the thread through its controls, find the evidence for each, and trace that back to the oversight that governs it. The map is the thread. If it breaks at any point, the auditor stops and raises a finding.

A standing gap register. Controls with no evidence, evidence past its retention window, and use cases that bypassed onboarding governance are all findings. The firms that handle audit well are the ones that find these themselves, before the auditor does, and log them in a register they maintain continuously.

One AI-specific wrinkle your auditor will need briefing on: non-determinism. An AI control’s evidence often shows a distribution of behaviour rather than a binary pass or fail. A guardrail that blocks 97 out of 100 adversarial test cases is not broken because of the 3 it missed, but your auditor needs the evaluation methodology, the threshold, and the residual-risk acceptance to judge that correctly. Put that interpretive context alongside the raw results in the map.

External assurance and regulator queries

External assurance frameworks consume the same evidence, filtered for their scope. A SOC 2 Type II examination samples whether controls operated over a period, which maps directly to your operating evidence column. An ISO/IEC 42001 certification audit assesses your AI management system, which maps to the governance, inventory, and oversight rows of your map. An enterprise customer’s due-diligence questionnaire is typically answered by pointing at the same evidence plus any external report you already hold.

Assemble the pack from the map, never the other way around. Building a polished binder for one assessor and then trying to back-fill a map from it is how firms end up with assurance theatre: documentation that bears no relationship to what the systems actually do.

Regulator queries are sharper still. A supervisor asks “show me how this AI system is controlled, and prove it,” and expects an answer under time pressure. Lead with the gateway’s consolidated evidence for systems brokered through it, then follow the placement tags deeper when the query goes to agent-runtime or connector controls the gateway does not see. And carry board-level risk context: supervisors increasingly ask not only whether the control is there, but whether the board understands the residual risk and has accepted it.

What to do this week

Run the evidence audit on your five highest-risk AI use cases. For each, list the controls you believe are in place. For each control, ask what design, operating, and oversight evidence exists right now. Anything with a blank is a gap.
Classify the gaps by type. Missing design evidence means the control is not documented. Missing operating evidence means you cannot prove it ran. Missing oversight evidence means accountability is not formally attached. Each type needs a different fix.
Connect your AI inventory to the map. If you do not have a formal AI use-case inventory, start one. Every use case is a row-set in the assurance map, and the risk tier it carries determines how much evidence depth and oversight frequency it warrants.
Nominate an owner for each evidence location. A log with no owner is a log that will lapse. Every evidence source in the map needs a named owner and a retention window that covers your audit cycle.
Set a refresh cadence before the next audit season. Machine-generated evidence is continuous. Design and oversight evidence needs a scheduled review, at a minimum quarterly for high-risk use cases, or on any material change to the control.

The firms that handle AI audit cleanly do not do so because their controls are exceptional. They do so because they designed the evidence accumulation before someone asked for it. That is the whole discipline.

If you're working on this right now — Book a discovery call