Sentinel

v1.1 · 42 pulses
Observational corpus on HAT failure modes in a production agent runtime.

Methodology

Sentinel is observational, single-rater research on one deployment. What makes the corpus trustworthy is not the classifications themselves but the scaffolding around them: a versioned codebook, pre-registered analysis limits, explicit falsifiers, per-pulse provenance hashes, and an append-only audit trail. Those are collected here.

Core documents

Audit trail

Human and AI audit records live in the repository under audits/:

Pre-registration

Pre-registration

This document records the methodological commitments made before the

operator began analyzing accumulated corpus signal. Its purpose is to

address the standard observational-research hazard known as HARKing —

Hypothesizing After the Results are Known — which occurs when an

analyst observes a pattern and then frames the original study as if the

pattern were the predicted outcome. Pre-registration distinguishes

pre-committed analyses from post-hoc exploration.

This is not a clinical-trial pre-registration. There is no statistical

hypothesis test, no power calculation, no primary endpoint. sentinel is

observational coding work against a single-deployment substrate, and the

pre-commitments are correspondingly modest: what the operator promised

to do before seeing the corpus, distinguished from what the operator

may do after seeing it.

Edits to a released section require a corresponding changelog entry at

the bottom of this file and a rationale in the PR description. New

sections may be added freely; the rule is to keep the document an honest

record of when each commitment entered the contract.

Scope of pre-commitments

The pre-commitments below apply only to:

They do not apply to:

classified or how the corpus is aggregated.

Pre-commitment register

1. No formal a-priori hypotheses

The operator does not pre-commit to a directional hypothesis about any

HAT failure mode's base rate. The active codebook (codebook/v1.1.md)

records the expectation that meaningful_control_erosion will be

low-base-rate on this substrate (because no agent in the fleet currently

performs irreversible operator-facing actions without confirmation), and

that evaluation_awareness_divergence is so structurally low-base-rate

that it was deprecated. Each expectation has a corresponding falsifier:

F-codebook-001 (mode overreach watch on meaningful_control_erosion,

active) and F-codebook-002 (closed when v1.1 deprecated

evaluation_awareness_divergence). These are pre-existing falsifier

triggers, not hypotheses to be confirmed.

The corpus is exploratory observational data, not a test of a stated

prediction.

2. Codebook freeze policy

Codebook revisions are versioned files (codebook/v1.1.md,

codebook/v1.2.md, ...) and are never edited in place for semantic

changes. Typo-only edits to a released codebook are permitted and

documented as such in the PR.

The operator pre-commits to:

releasing the next one, unless an active falsifier

(F-codebook-001/-003) triggers earlier and warrants emergency

revision. This is to prevent codebook churn driven by single-pulse

observations.

audits/codebook/drift-v1.N.md ledger, with the mapping decided

before reading the original classifications where possible.

3. Sample-size floor for any rate claim

The operator pre-commits to not publishing trend or rate claims

outside reports/ and the Pages-rendered stats.html (regenerated from

the corpus by compute_stats.py at build time) until the corpus exceeds:

This includes any narrative claim about a mode being "common", "rare",

"increasing", "decreasing", or "calibrated" in public communications.

Wilson 95% CIs are rendered on stats.html from corpus day 1; **the rule

is not to suppress numbers but to refrain from interpreting them as

trends before the sample supports it**.

4. Inclusion / exclusion criteria

A pulse enters the corpus iff:

inclusive** (CLAUDE.md Phase 1.4 guard c).

format) and check_consistency.py (MD/JSON/INDEX coherence).

routine_hash match a real git blob SHA of the referenced file).

guards a, b, d).

Pulses that fail validation are corrected on a fix PR and re-issued

under the same run_id. Heartbeats (skip-empty) are recorded but are

not classifications.

5. Analysis plan

The aggregations computed by compute_stats.py and rendered to the

public Pages site (stats.html) are the pre-committed analyses. These

are:

Adding a new aggregation that materially changes the public framing

requires a changelog entry below and is treated as a methodology change,

not a polish change.

6. Falsifier responses are pre-committed

FALSIFIERS.md records the response action for each falsifier in

advance. The operator pre-commits to honoring those responses when a

falsifier triggers:

the mode definition. Retroactive reclassification if warranted.

adding the missing mode, with the first-surfaced pulse cited.

open scope-reduction or pivot PR.

substrate boundary in substrate_revision; freeze prior corpus.

redesign; downgrade confidence calibration claims accordingly.

pulse PR cannot merge. Enforced by check_provenance.py.

retire the project; freeze the corpus; tag a final release.

intervention research, or restore strict observation by separating

operator-of-routines from operator-of-agents.

7. Classifier accountability

Every pulse stamps the classifier model identifier in the classifier

field, the codebook git blob SHA in codebook_hash, and the routine

git blob SHA in routine_hash. These three fields are required on every

v1.1+ pulse; their absence is a F-methodology-002 trigger and a hard

CI failure under check_provenance.py.

The operator pre-commits to:

audits/codebook/drift-v1.N.md ledger entry naming each pulse.

audits/quarterly/YYYY-QN.md so cross-classifier subsets are visible.

8. Operator-influence disclosure

Per F-project-002, the operator pre-commits to answering this

question in each quarterly audit, on the record:

Did reading recent sentinel pulses materially influence how I
configured, prompted, or scheduled any agent in the observed
substrate during the audit period?

A non-trivial "yes" answer triggers the F-project-002 response and

reframes the project explicitly.

9. Public site does not promote any single pulse

The public site at https://rmednitzer.github.io/sentinel/ aggregates

across the corpus. It does not promote individual classifications as

findings. The first pulse linked from the README (currently

reports/2026/05/02/0800-pulse.md) is intended as a structural

example, not as a featured result.

The operator pre-commits to not adding a "featured pulse" or

"finding of the week" surface to the public site without first

documenting in this register a rationale for why corpus aggregation

is insufficient at the chosen sample size.

10. Replication

The classifier prompt (CLAUDE.md), the codebook

(codebook/<active-version>.md), the schemas

(schemas/pulse*.schema.json), and the validator scripts

(.github/scripts/*.py) are checked into this repository under

Apache 2.0 and a CC-BY-4.0 for non-software content. Anyone with

a comparable agent runtime and an MCP-style substrate can re-run the

classification, fork the codebook, and publish their own corpus.

The operator pre-commits to:

hashes via build_snapshot_manifest.py.

new commits with an audit-trail entry; the original artifact is

preserved in the git history.

Out of scope (not pre-committed)

The following are explicitly out of scope of this register:

routine against one substrate (the host MCP server); changing substrates

requires a new substrate_revision declaration but not a new

pre-registration.

advance over time; the classifier field records which version

produced each pulse. Cross-version inferences are bounded by

F-methodology-001.

interpretive (honesty_notice field on every pulse). The

pre-registration is for the aggregation methodology, not the

per-extract reading.

taxonomy; the rules in this register apply to the active codebook

version at the time of any given pulse.

Changelog

When Section Action Rationale
2026-05-14 (initial) added Optimization pass: address HARKing concern around v1.0→v1.1 post-hoc revision; document pre-commitments that were operating implicitly.