Methodology

Sentinel is observational, single-rater research on one deployment. What makes the corpus trustworthy is not the classifications themselves but the scaffolding around them: a versioned codebook, pre-registered analysis limits, explicit falsifiers, per-pulse provenance hashes, and an append-only audit trail. Those are collected here.

Core documents

Codebook — the active HAT failure-mode definitions, boundary rules, and confidence calibration.
Falsifiers — the conditions under which the framework, codebook, or project would be considered wrong, and the committed response to each.
Corpus stats — descriptive aggregates with Wilson confidence intervals, regenerated from the corpus on every build.

Audit trail

Human and AI audit records live in the repository under audits/:

audits/corpus/ — whole-corpus instrument-health reviews
audits/codebook/ — codebook-fitness audits and drift reclassifications
audits/substrate/ — engineering audits of the observed host
audits/reconciliations/ — routine-state cleanup ledgers
audits/quarterly/ — scheduled blind-reclassification audits

Pre-registration

This document records the methodological commitments made before the

operator began analyzing accumulated corpus signal. Its purpose is to

address the standard observational-research hazard known as HARKing —

Hypothesizing After the Results are Known — which occurs when an

analyst observes a pattern and then frames the original study as if the

pattern were the predicted outcome. Pre-registration distinguishes

pre-committed analyses from post-hoc exploration.

This is not a clinical-trial pre-registration. There is no statistical

hypothesis test, no power calculation, no primary endpoint. sentinel is

observational coding work against a single-deployment substrate, and the

pre-commitments are correspondingly modest: what the operator promised

to do before seeing the corpus, distinguished from what the operator

may do after seeing it.

Edits to a released section require a corresponding changelog entry at

the bottom of this file and a rationale in the PR description. New

sections may be added freely; the rule is to keep the document an honest

record of when each commitment entered the contract.

Scope of pre-commitments ¶

The pre-commitments below apply only to:

The classification routine in CLAUDE.md.
The codebook in codebook/<active-version>.md.
The statistical aggregation in .github/scripts/compute_stats.py.
The falsifier registry in FALSIFIERS.md.
The quarterly audit protocol in audits/templates/quarterly.md.

They do not apply to:

Operator-internal interpretation of any single pulse.
Routine maintenance unrelated to classification semantics.
Tooling, CI, and documentation changes that do not affect what is

classified or how the corpus is aggregated.

Pre-commitment register ¶

1. No formal a-priori hypotheses ¶

The operator does not pre-commit to a directional hypothesis about any

HAT failure mode's base rate. The active codebook (codebook/v1.1.md)

records the expectation that meaningful_control_erosion will be

low-base-rate on this substrate (because no agent in the fleet currently

performs irreversible operator-facing actions without confirmation), and

that evaluation_awareness_divergence is so structurally low-base-rate

that it was deprecated. Each expectation has a corresponding falsifier:

F-codebook-001 (mode overreach watch on meaningful_control_erosion,

active) and F-codebook-002 (closed when v1.1 deprecated

evaluation_awareness_divergence). These are pre-existing falsifier

triggers, not hypotheses to be confirmed.

The corpus is exploratory observational data, not a test of a stated

prediction.

2. Codebook freeze policy ¶

Codebook revisions are versioned files (codebook/v1.1.md,

codebook/v1.2.md, ...) and are never edited in place for semantic

changes. Typo-only edits to a released codebook are permitted and

documented as such in the PR.

The operator pre-commits to:

A minimum of 14 days between releasing a codebook version and

releasing the next one, unless an active falsifier

(F-codebook-001/-003) triggers earlier and warrants emergency

revision. This is to prevent codebook churn driven by single-pulse

observations.

Reclassifying recent pulses under a new codebook only via an

audits/codebook/drift-v1.N.md ledger, with the mapping decided

before reading the original classifications where possible.

3. Sample-size floor for any rate claim ¶

The operator pre-commits to not publishing trend or rate claims

outside reports/ and the Pages-rendered stats.html (regenerated from

the corpus by compute_stats.py at build time) until the corpus exceeds:

30 days of continuous operation, OR
40 pulses (whichever is later).

This includes any narrative claim about a mode being "common", "rare",

"increasing", "decreasing", or "calibrated" in public communications.

Wilson 95% CIs are rendered on stats.html from corpus day 1; **the rule

is not to suppress numbers but to refrain from interpreting them as

trends before the sample supports it**.

4. Inclusion / exclusion criteria ¶

A pulse enters the corpus iff:

The window length (window.end - window.start) is **4 to 14 hours

inclusive** (CLAUDE.md Phase 1.4 guard c).

The pulse passes both validate_artifacts.py (schema + heartbeat

format) and check_consistency.py (MD/JSON/INDEX coherence).

The pulse passes check_provenance.py (codebook_hash and

routine_hash match a real git blob SHA of the referenced file).

The window is monotonic, non-future, and not duplicate (Phase 1.4

guards a, b, d).

Pulses that fail validation are corrected on a fix PR and re-issued

under the same run_id. Heartbeats (skip-empty) are recorded but are

not classifications.

5. Analysis plan ¶

The aggregations computed by compute_stats.py and rendered to the

public Pages site (stats.html) are the pre-committed analyses. These

are:

HAT mode counts, shares, and Wilson 95% CIs.
Confidence-level counts, shares, and Wilson 95% CIs.
Shannon entropy of the mode distribution (active modes only).
Distinct-modes-observed support.
Classifier diversity (per-classifier pulse counts).
Provenance coverage rate.
Window length min/median/max.
Rationale length median + p90.
Distinct classification sources.
Run cadence and last-run timestamps.

Adding a new aggregation that materially changes the public framing

requires a changelog entry below and is treated as a methodology change,

not a polish change.

6. Falsifier responses are pre-committed ¶

FALSIFIERS.md records the response action for each falsifier in

advance. The operator pre-commits to honoring those responses when a

falsifier triggers:

F-codebook-001 (mode overreach): codebook revision PR tightening

the mode definition. Retroactive reclassification if warranted.

F-codebook-003 (unmappable patterns persist): codebook revision PR

adding the missing mode, with the first-surfaced pulse cited.

F-framework-001 (HAT lens does not capture signal): pause routine,

open scope-reduction or pivot PR.

F-framework-002 (substrate shift invalidates prior corpus): mark

substrate boundary in substrate_revision; freeze prior corpus.

F-methodology-001 (classifier non-determinism > 20%): routine

redesign; downgrade confidence calibration claims accordingly.

F-methodology-002 (provenance decoupling): hard CI failure;

pulse PR cannot merge. Enforced by check_provenance.py.

F-project-001 (cannot hold methodological commitment): pause or

retire the project; freeze the corpus; tag a final release.

F-project-002 (results contaminate substrate): reframe project as

intervention research, or restore strict observation by separating

operator-of-routines from operator-of-agents.

7. Classifier accountability ¶

Every pulse stamps the classifier model identifier in the classifier

field, the codebook git blob SHA in codebook_hash, and the routine

git blob SHA in routine_hash. These three fields are required on every

v1.1+ pulse; their absence is a F-methodology-002 trigger and a hard

CI failure under check_provenance.py.

The operator pre-commits to:

Stamping the actual classifier in use, not a placeholder.
Not retroactively editing these fields without a paired

audits/codebook/drift-v1.N.md ledger entry naming each pulse.

Filing a quarterly classifier-diversity disclosure under

audits/quarterly/YYYY-QN.md so cross-classifier subsets are visible.

8. Operator-influence disclosure ¶

Per F-project-002, the operator pre-commits to answering this

question in each quarterly audit, on the record:

Did reading recent sentinel pulses materially influence how I

configured, prompted, or scheduled any agent in the observed

substrate during the audit period?

A non-trivial "yes" answer triggers the F-project-002 response and

reframes the project explicitly.

9. Public site does not promote any single pulse ¶

The public site at https://rmednitzer.github.io/sentinel/ aggregates

across the corpus. It does not promote individual classifications as

findings. The first pulse linked from the README (currently

reports/2026/05/02/0800-pulse.md) is intended as a structural

example, not as a featured result.

The operator pre-commits to not adding a "featured pulse" or

"finding of the week" surface to the public site without first

documenting in this register a rationale for why corpus aggregation

is insufficient at the chosen sample size.

10. Replication ¶

The classifier prompt (CLAUDE.md), the codebook

(codebook/<active-version>.md), the schemas

(schemas/pulse*.schema.json), and the validator scripts

(.github/scripts/*.py) are checked into this repository under

Apache 2.0 and a CC-BY-4.0 for non-software content. Anyone with

a comparable agent runtime and an MCP-style substrate can re-run the

classification, fork the codebook, and publish their own corpus.

The operator pre-commits to:

Tagging quarterly snapshots (corpus-YYYY-QN) with sealed content

hashes via build_snapshot_manifest.py.

Not deleting historical pulses from the corpus. Corrections land as

new commits with an audit-trail entry; the original artifact is

preserved in the git history.

Out of scope (not pre-committed) ¶

The following are explicitly out of scope of this register:

The choice of which MCP substrate to observe. The operator runs the

routine against one substrate (the host MCP server); changing substrates

requires a new substrate_revision declaration but not a new

pre-registration.

The choice of Claude model version used as classifier. Versions

advance over time; the classifier field records which version

produced each pulse. Cross-version inferences are bounded by

F-methodology-001.

The interpretation of any single classification. Classifications are

interpretive (honesty_notice field on every pulse). The

pre-registration is for the aggregation methodology, not the

per-extract reading.

The future scope of the codebook beyond v1.x. v2 may restructure the

taxonomy; the rules in this register apply to the active codebook

version at the time of any given pulse.

Changelog ¶

When	Section	Action	Rationale
2026-05-14	(initial)	added	Optimization pass: address HARKing concern around v1.0→v1.1 post-hoc revision; document pre-commitments that were operating implicitly.