← Dual-Axis Framework · Pattern reference
Approval Gate
Governance × RouteWhy this pattern exists
An agent that can read is not the same product as an agent that can write. An agent that can write internal memos is not the same product as one that can delete production data, send customer-facing email, or transfer funds. Authority is the dimension along which agent risk compounds fastest, and Approval Gate is the pattern that lets authority scale incrementally without scaling risk linearly.
For leadership: this is the pattern that separates agents you can defend in a board meeting from agents that turn into incidents at 3am. The November 2025 Anthropic disclosure of an agent-orchestrated cyberattack — in which an attacker used Claude with 80–90% autonomy to execute the first publicly-acknowledged large-scale AI-driven intrusion campaign — made the case for Approval Gate concrete and urgent for every executive who had previously seen this as a “maybe later” concern. Approval Gate is not optional governance theatre. It is the load-bearing element of any agent that touches the world.
The agent-design problem it solves
Approval Gate sits between an agent’s proposed action and its execution. It does four things:
- Route by policy — a deterministic rule (not the model) decides which actions need approval. Reads pass; writes confirm; deletes escalate. The rule is auditable, versioned, owned by engineering plus compliance, never by the model.
- Pause cleanly — the agent halts at the gate, persists state, hands off the decision to a human. The decision can arrive milliseconds later (auto-approve in dev mode) or days later (genuine human review) without breaking the agent.
- Capture context for the reviewer — the gate produces a decision packet: the proposed action, the reasoning trace, the data the action will touch, the predicted side effects. The reviewer is not asked “trust me?” — they are asked a specific yes/no with evidence.
- Record the verdict — approved, denied, modified, escalated — with the reviewer’s identity and the time. This record is the audit trail regulators come for first.
The pattern is fundamentally a routing decision — hence Governance × Route — not a workflow extension. The route rule is the design surface. Get the routing right and the rest follows; get the routing wrong and the gate becomes either a useless rubber-stamp or a productivity-killing bottleneck.
Deep thinking direction
The hardest design question in Approval Gate is where to draw the threshold. Too low and every action triggers approval, latency explodes, reviewers stop reading carefully, rubber-stamp culture takes over. Too high and high-stakes actions slip through unreviewed until the first incident. The discipline is to set the threshold by category, not by confidence. Categories are stable: “modifies billing” is always a category-3 action regardless of how confident the agent is. Confidence is the model’s self-report and should never be the gating signal — the actions an agent is most confident about are exactly the actions where overconfidence is most dangerous.
Three failure modes recur. Rubber-stamping: reviewers approve everything within seconds because volume is too high. The discipline is rate-limiting at the source — if the agent is generating more decisions per hour than a human can review meaningfully, the gate has been miscalibrated upstream. Approval Drift: the rule expands quietly to “everything important gets approved” without anyone noticing. The discipline is explicit version control on the route rule, treated like production code. Silent Bypass: an emergency override path becomes the normal path. The discipline is explicit logging of every override with mandatory post-hoc review.
The architectural insight is that Approval Gate is the RBAC + change-control pattern from enterprise security reborn for the agent world. The route rule is the role definition; the verdict log is the access-review report; the threshold is the role’s permission scope. Engineers who have built SOX-compliant change-control systems recognize this pattern in minutes. The medium changed; the controls stayed.
Engineering blog posts — curated
- Disrupting the first reported AI-orchestrated cyber espionage campaign — Anthropic The case that turned Approval Gate from optional to load-bearing. Attacker used Claude with 80–90% autonomy; the disclosed mitigations centre on permission scoping and explicit approval steps.
- Claude Code Permission Modes (5+1) The clearest worked example of a production Approval Gate: read-only / suggest / confirm / approve / veto / forbidden tiers, each with explicit semantics.
- OWASP Agentic AI Top 10 — Least Agency Reframes the “least privilege” principle as “least agency” for agents. Approval Gate is the operational instrument that enforces it.
- CSA Agentic Trust Framework — 5-Gate Promotion Formal multi-stage promotion model: dev / sandbox / staged / approved / production, with explicit gate criteria. The enterprise governance complement to per-action Approval Gate.
- Cascading Failures in Multi-Agent Systems — Galileo Quantifies the 87% cascade-failure rate when approval gates are missing or misplaced in multi-agent setups. Evidence for placing gates at every authority transition.
Latest paper progress (arXiv)
- Agent Authority: A Survey of Permission, Approval, and Sandboxing in LLM Agents Catalogues the design space for agent permission systems. Section 4 on approval routing maps directly onto this pattern.
- Human-in-the-Loop Agentic AI: A Decision-Theoretic Framework Models the approval decision as a cost-benefit calculation: cost of the human review vs. expected value of catching a bad action. Useful for calibrating thresholds.
- Constitutional AI: From Principles to Production Policy Anthropic-side model on how constitutional rules become enforcement points. The model-side complement to harness-side Approval Gate.
- Audit Trails for Agentic Systems: A Compliance Engineering Perspective Audit-trail requirements derived from financial-services regulation (SOX, MiFID II, MAS Notice 626). Concrete schema for what an Approval Gate verdict log should contain.
- Least Agency: Permission Scoping for Long-Horizon Agents Formalises the “least agency” principle. Shows that agents with narrower default authority + Approval Gate escalation outperform agents with broad default authority on safety benchmarks without sacrificing capability.
Related patterns
Where this pattern is developed
- Manning book — Designing AI Agents, Chapter 9 §9.2 (Governance / Approval Gate).
- Paper — Huang & Zhou (2026), §4.7 Pattern 7.