CertiComplyLaunching June 15, 2026
← All posts
Compliance

NIST AI RMF in Practice: Building Evidence Packs That Pass an AI Audit

TC
The CertiComply Team
May 22, 2026 · 10 min read

The email arrived on a Tuesday. Subject line: AI Governance Review — Documentation Request. Your auditor wants evidence that your organization has implemented AI risk controls. They have given you six weeks. You have a JIRA board full of model deployments and a shared drive full of one-pagers that nobody updates.

This is the moment that separates the GRC teams who treated AI risk as a future problem from the ones who started building durable control evidence twelve months ago. If you are in the first group, this guide will help you recover ground quickly. If you are in the second, it will help you organize what you already have into a package that actually passes.

The NIST AI Risk Management Framework is not a checklist. It is a set of outcomes. Auditors who know it well are not looking for a spreadsheet that says "yes" next to every control. They are looking for traceable, dated evidence that your organization has thought carefully about each function and done something about it. That distinction matters enormously when you are building your evidence pack.

What NIST AI RMF Actually Requires

The NIST AI RMF, published in January 2023 and now referenced by the SEC, FTC, and several state AI statutes, organizes AI risk management into four core functions: GOVERN, MAP, MEASURE, and MANAGE. Each function contains categories and subcategories with identifiers you will see in audit questionnaires.

GOVERN is the foundation. GOVERN-1.1 requires that policies, processes, and structures are in place to address AI risk. This is not theoretical. Your auditor will ask for the policy document, the person who owns it, and evidence that it was last reviewed. GOVERN-2.2 asks whether AI risks are incorporated into enterprise risk management. GOVERN-4.1 covers organizational teams and their defined responsibilities for AI oversight. GOVERN-6.1 requires that policies are in place for responsible AI development and deployment, including expectations for external AI vendors and partners.

MAP is about context and stakeholder awareness. MAP-1.1 asks you to identify the context in which AI systems are deployed, the affected populations, and the potential impacts. MAP-2.1 requires that scientific, technical, and societal factors relevant to your AI use are understood. MAP-3.5 is increasingly significant: it asks whether teams have identified and documented AI system limitations, including for bias and fairness. If you are using a third-party model, MAP-5.1 requires that impacts on individuals, communities, and society are understood before deployment.

MEASURE is where most evidence gaps live. MEASURE-1.1 requires that evaluation methods are identified and used. MEASURE-2.1 asks for documented approaches to testing AI systems before and after deployment, including adversarial testing. MEASURE-2.5 covers bias testing across demographic groups. MEASURE-4.1 requires that measurement results are documented and communicated. This is the function where "we ran some tests" is not sufficient. You need eval results, dated, with methodology, with outcome summaries.

MANAGE covers what you do with what you find. MANAGE-1.1 requires that identified AI risks are prioritized and addressed. MANAGE-2.2 covers incident response for AI systems. MANAGE-4.1 asks whether residual risk has been documented and accepted by appropriate authority. MANAGE-4.2 requires that lessons learned from AI incidents are incorporated into future decisions.

The framework is non-prescriptive on tooling, which is a feature. But it is quite specific about outcomes, which means your evidence has to show outcomes, not intentions.

What Counts as Evidence for Each Function

Evidence requirements differ by function. Here is what auditors typically expect:

NIST AI RMF Function Primary Evidence Types
GOVERN Policy documents with version history; org chart with AI risk owners; board or executive acknowledgment; vendor contract addenda covering AI provisions
MAP Context assessments per AI system; stakeholder analysis documentation; impact assessments; limitation registers
MEASURE Eval results with dates and methodology; bias test outputs; third-party audit reports; model cards or system cards; benchmark comparisons
MANAGE Risk registers with prioritization rationale; incident logs; remediation records; residual risk sign-offs; lessons-learned documentation

A few nuances that trip up GRC teams who are new to AI audits:

Attestations must be signed. A Google Doc titled "AI Policy" that has never been formally approved carries almost no weight. You need a signature, a date, and evidence it was circulated to relevant stakeholders. If your policy process runs through your policy management system, export a PDF with the approval metadata intact.

Vendor responses need follow-up. If your evidence for a third-party model includes a vendor's SOC 2 report, that is a starting point, not a finish line. The auditor will want to see that you reviewed it, identified any gaps relevant to your use case, and documented your residual risk decision.

Eval results need context. A spreadsheet with accuracy numbers is not an eval result in the NIST sense. You need the methodology (what dataset, what population, what time period), the outcome, and what action you took based on it. MEASURE-2.1 and MEASURE-4.1 together require that measurement informs decisions.

Model cards are underused. If your team is deploying models without writing model cards, you are creating an evidence gap that will be visible in any audit against MAP-3.5 and MEASURE-4.1. Model cards are not academic artifacts. They are the document that proves you understood the system before you shipped it.

How to Structure an Evidence Pack

A well-structured AI compliance evidence pack has nine sections. This is not an industry standard; it is a practical structure that works across NIST AI RMF, SOC 2 AI criteria, and most bespoke enterprise AI audit questionnaires.

Section 1: Cover page. Organization name, scope of review, date prepared, primary contact, document classification. Auditors review dozens of packages. Make it easy to orient.

Section 2: Executive summary. Two pages maximum. State the AI systems in scope, the frameworks mapped against, the overall risk posture, and any material residual risks that have been accepted by leadership. This page should be written by a human who understands the risk picture, not assembled from control responses.

Section 3: Scope definition. List every AI system in scope. For each system, document the deployment context (internal tool, customer-facing, embedded in a regulated workflow), the data it processes, and the NIST AI RMF risk classification you assigned. If you excluded systems, document why.

Section 4: Vendor and model inventory. For each third-party model or AI service, list the vendor, the specific model or API version, your use case, and what diligence you completed. Attach vendor responses, model cards where available, and any contractual AI risk provisions. This section is where MAP-5.1 evidence lives.

Section 5: Control matrix. A table mapping each NIST AI RMF category to your control status (implemented, partially implemented, not applicable, gap) with a reference to the supporting evidence in the appendix. Keep this navigable. Auditors use it to prioritize what they review.

Section 6: Evidence appendix. Organized by control reference. Include the actual documents, not just citations. Signed policies, eval outputs, bias test results, vendor attestations, incident records. Every piece of evidence should have a date and an owner.

Section 7: Residual risk register. List every accepted risk, the rationale for acceptance, who accepted it, and when it will be reviewed. This is MANAGE-4.1 evidence. It also signals maturity: an organization that has no accepted residual risks has either achieved something remarkable or has not finished its assessment.

Section 8: Audit log excerpt. If your GRC platform or AI governance tool maintains an audit trail of who reviewed what and when, include a dated excerpt. This is particularly relevant for demonstrating that your controls are operating, not just documented. CertiComply Enterprise generates a timestamped evidence log for each control review cycle, which drops directly into this section.

Section 9: Glossary. Define terms as you use them. NIST AI RMF uses specific definitions for "AI system," "risk," "impact," and "trustworthy AI" that may differ from how your engineering team uses those words. Alignment on definitions prevents unnecessary audit findings.

Common Mistakes GRC Teams Make

Incomplete vendor inventory. The most frequent gap. Teams document their internally built models carefully and then list third-party AI vendors as "various SaaS tools." Auditors treat this as a MAP-5.1 failure. Every AI system that touches regulated data or affects consequential decisions needs to be in the inventory with a named owner.

No eval results for production systems. It is surprisingly common for organizations to have robust pre-deployment testing documentation and nothing for systems that have been running in production for eighteen months. MEASURE-2.1 and MEASURE-4.1 apply to ongoing operation, not just launch. If your only eval results are from the pilot, you have a gap.

Unsigned attestations. Policies in draft, dated from two years ago, with no approval metadata. This is common in fast-moving engineering orgs where policy management runs behind product. The fix is not complicated, but it requires someone to own it: get the policy approved, record the approval, and set a calendar reminder for annual review.

Control matrix without evidence links. A spreadsheet that says "implemented" next to GOVERN-1.1 means nothing without a link to the document that demonstrates implementation. The control matrix is a navigation tool, not evidence itself. Every "implemented" status needs a traceable artifact.

Treating the evidence pack as a one-time project. This is the mistake that makes the next audit harder. NIST AI RMF is explicitly a lifecycle framework. The evidence you collect for one audit should feed directly into the next review cycle. Organizations that build for repeatability, with templated sections, living control matrices, and automated evidence collection, spend a fraction of the time on their second audit that they spent on their first.

The Crosswalk Advantage

Here is something worth knowing if you are building your AI governance program from scratch against NIST AI RMF: you are also building toward ISO 42001 and the EU AI Act simultaneously.

ISO 42001, published in December 2023, is the international standard for AI management systems. Its structure maps closely to NIST AI RMF. If you have documented controls against GOVERN-1.1, GOVERN-4.1, and MANAGE-1.1, you have covered a significant portion of ISO 42001 Clause 5 (leadership) and Clause 8 (operation). The governance documentation, the impact assessments, and the risk register transfer almost directly.

The EU AI Act, with high-risk system obligations applying from August 2026, requires Annex IV technical documentation that maps substantially to NIST AI RMF MEASURE and MAP evidence. If you have eval results, bias testing documentation, and a limitation register, you have most of what Annex IV Section 2 requires for post-market monitoring data.

The practical implication: a well-structured NIST AI RMF evidence pack puts you roughly 60 percent of the way to ISO 42001 certification readiness and a comparable portion of the way to EU AI Act Annex IV compliance. The investment compounds. The organizations that treat each framework as a separate project spend three times the effort and end up with three inconsistent documents that auditors can exploit for gaps.

CertiComply Enterprise includes a built-in crosswalk library that maps your NIST AI RMF control evidence to ISO 42001 clauses and EU AI Act articles automatically. When you log evidence for MEASURE-2.5 (bias testing), it surfaces as supporting documentation for ISO 42001 Clause 8.4 and EU AI Act Article 10. You collect evidence once; it populates across frameworks.

Getting Started This Week

You do not need to build the entire evidence pack before your auditor arrives. You need to demonstrate a program that is operational, not perfect. Here is what to prioritize in the first two weeks:

Start with the vendor inventory. Pull every AI tool and model your organization uses. Assign an owner to each. Document the use case and the data it touches. This alone will surface the gaps your auditor would otherwise find first, and it gives you MAP-5.1 coverage that is genuinely difficult to fake.

Next, get your governance policy signed. If you have a draft AI policy, get it through your approval process this week. A signed policy with a recent date is worth more to your GOVERN-1.1 response than a sophisticated policy that nobody has formally approved.

Then find your eval results. Search your shared drives, your ML platform, your Slack archives. Collect every eval, test result, or performance review you have for each in-scope AI system. Even informal documentation helps if it is dated and shows that someone reviewed system performance. Organize it by system and by control reference.

The evidence pack you build under deadline pressure will not be perfect. That is acceptable. What auditors are looking for is evidence of a functioning program: ownership, process, documentation, and a clear-eyed view of where the gaps are and what you are doing about them. That story, told with real artifacts, is what passes an AI audit.

Start with the vendor inventory. Everything else follows from knowing what you are governing.

TC
The CertiComply Team
More from The CertiComply Team

Keep reading

Get our weekly take on the cert economy.

No spam.

Ready to get certified?

Find my cert →