Capability Escalation Monitor (CEM)

AI-2027 Response — Risk & Alignment Section

1. Core Function

When capability benchmarks cross defined thresholds, automatic governance escalation is triggered.

This monitor is a proposed safeguard, not a deployed system.

2. Example Triggers

TriggerThreshold Description
Autonomous code benchmarkModel passes autonomous multi-day code execution benchmark
Persuasion scoreModel exceeds predefined persuasion evaluation score
Strategic planningModel demonstrates multi-step strategic planning above defined threshold
Biological knowledge synthesisModel demonstrates biological knowledge synthesis above defined threshold

3. Automatic Actions

When a trigger fires, the following actions are proposed:

Governance quorum review — Escalates trigger event to designated governance authorities

Sandboxing enforcement — Initiates compute isolation constraints until review completes

Public disclosure log entry — Records trigger event in immutable audit ledger

Compute throttle recommendation — Proposes resource limitation pending governance decision

4. Architecture

The CEM operates as a monitoring layer adjacent to the existing pipeline:

  • Receives capability evaluation signals from benchmark harnesses
  • Compares against threshold registry
  • Triggers governance escalation via OpenApprove
  • Logs all trigger events via OpenWitness

This architecture is proposed, not deployed.

5. What This Monitor Does Not Claim

  • Does not detect all forms of capability escalation.
  • Does not prevent capability development.
  • Does not replace human judgment in governance decisions.
  • Does not guarantee that thresholds are correctly calibrated.
  • Does not address capability concealment or sandbagging.

6. Invitation

This monitor is a proposed design, not a production system.

Researchers working on capability evaluation, dangerous capability thresholds, and governance mechanisms are invited to review the design and propose improvements.

Capability Escalation Monitor is a trigger-and-action design that defines specific observable signals for capability escalation events in AI systems and specifies response actions within bounded execution constraints.

Trigger Design

Triggers are defined as observable signals that indicate capability escalation beyond defined bounds. Triggers must be specific, measurable, and verifiable without relying on the system being monitored.


Action Design

Actions are deterministic responses to trigger conditions. They are encoded as constitutional constraints, not governance recommendations. Actions include execution halt, attestation requirement, and audit initiation.


Limits of the Monitor

The CEM detects observable escalation signals. It cannot detect capability concealment, deceptive alignment, or escalation that occurs below the monitoring threshold.