Methodology
Sentinel is observational, single-rater research on one deployment. What makes the corpus trustworthy is not the classifications themselves but the scaffolding around them: a versioned codebook, pre-registered analysis limits, explicit falsifiers, per-pulse provenance hashes, and an append-only audit trail. Those are collected here.
Core documents
- Codebook — the active HAT failure-mode definitions, boundary rules, and confidence calibration.
- Falsifiers — the conditions under which the framework, codebook, or project would be considered wrong, and the committed response to each.
- Corpus stats — descriptive aggregates with Wilson confidence intervals, regenerated from the corpus on every build.
Audit trail
Human and AI audit records live in the repository under audits/:
audits/corpus/— whole-corpus instrument-health reviewsaudits/codebook/— codebook-fitness audits and drift reclassificationsaudits/substrate/— engineering audits of the observed hostaudits/reconciliations/— routine-state cleanup ledgersaudits/quarterly/— scheduled blind-reclassification audits
Pre-registration
Pre-registration
This document records the methodological commitments made before the
operator began analyzing accumulated corpus signal. Its purpose is to
address the standard observational-research hazard known as HARKing —
Hypothesizing After the Results are Known — which occurs when an
analyst observes a pattern and then frames the original study as if the
pattern were the predicted outcome. Pre-registration distinguishes
pre-committed analyses from post-hoc exploration.
This is not a clinical-trial pre-registration. There is no statistical
hypothesis test, no power calculation, no primary endpoint. sentinel is
observational coding work against a single-deployment substrate, and the
pre-commitments are correspondingly modest: what the operator promised
to do before seeing the corpus, distinguished from what the operator
may do after seeing it.
Edits to a released section require a corresponding changelog entry at
the bottom of this file and a rationale in the PR description. New
sections may be added freely; the rule is to keep the document an honest
record of when each commitment entered the contract.
Scope of pre-commitments ¶
The pre-commitments below apply only to:
- The classification routine in
CLAUDE.md. - The codebook in
codebook/<active-version>.md. - The statistical aggregation in
.github/scripts/compute_stats.py. - The falsifier registry in
FALSIFIERS.md. - The quarterly audit protocol in
audits/templates/quarterly.md.
They do not apply to:
- Operator-internal interpretation of any single pulse.
- Routine maintenance unrelated to classification semantics.
- Tooling, CI, and documentation changes that do not affect what is
classified or how the corpus is aggregated.
Pre-commitment register ¶
1. No formal a-priori hypotheses ¶
The operator does not pre-commit to a directional hypothesis about any
HAT failure mode's base rate. The active codebook (codebook/v1.1.md)
records the expectation that meaningful_control_erosion will be
low-base-rate on this substrate (because no agent in the fleet currently
performs irreversible operator-facing actions without confirmation), and
that evaluation_awareness_divergence is so structurally low-base-rate
that it was deprecated. Each expectation has a corresponding falsifier:
F-codebook-001 (mode overreach watch on meaningful_control_erosion,
active) and F-codebook-002 (closed when v1.1 deprecated
evaluation_awareness_divergence). These are pre-existing falsifier
triggers, not hypotheses to be confirmed.
The corpus is exploratory observational data, not a test of a stated
prediction.
2. Codebook freeze policy ¶
Codebook revisions are versioned files (codebook/v1.1.md,
codebook/v1.2.md, ...) and are never edited in place for semantic
changes. Typo-only edits to a released codebook are permitted and
documented as such in the PR.
The operator pre-commits to:
- A minimum of 14 days between releasing a codebook version and
releasing the next one, unless an active falsifier
(F-codebook-001/-003) triggers earlier and warrants emergency
revision. This is to prevent codebook churn driven by single-pulse
observations.
- Reclassifying recent pulses under a new codebook only via an
audits/codebook/drift-v1.N.md ledger, with the mapping decided
before reading the original classifications where possible.
3. Sample-size floor for any rate claim ¶
The operator pre-commits to not publishing trend or rate claims
outside reports/ and the Pages-rendered stats.html (regenerated from
the corpus by compute_stats.py at build time) until the corpus exceeds:
- 30 days of continuous operation, OR
- 40 pulses (whichever is later).
This includes any narrative claim about a mode being "common", "rare",
"increasing", "decreasing", or "calibrated" in public communications.
Wilson 95% CIs are rendered on stats.html from corpus day 1; **the rule
is not to suppress numbers but to refrain from interpreting them as
trends before the sample supports it**.
4. Inclusion / exclusion criteria ¶
A pulse enters the corpus iff:
- The window length (
window.end - window.start) is **4 to 14 hours
inclusive** (CLAUDE.md Phase 1.4 guard c).
- The pulse passes both
validate_artifacts.py(schema + heartbeat
format) and check_consistency.py (MD/JSON/INDEX coherence).
- The pulse passes
check_provenance.py(codebook_hash and
routine_hash match a real git blob SHA of the referenced file).
- The window is monotonic, non-future, and not duplicate (Phase 1.4
guards a, b, d).
Pulses that fail validation are corrected on a fix PR and re-issued
under the same run_id. Heartbeats (skip-empty) are recorded but are
not classifications.
5. Analysis plan ¶
The aggregations computed by compute_stats.py and rendered to the
public Pages site (stats.html) are the pre-committed analyses. These
are:
- HAT mode counts, shares, and Wilson 95% CIs.
- Confidence-level counts, shares, and Wilson 95% CIs.
- Shannon entropy of the mode distribution (active modes only).
- Distinct-modes-observed support.
- Classifier diversity (per-classifier pulse counts).
- Provenance coverage rate.
- Window length min/median/max.
- Rationale length median + p90.
- Distinct classification sources.
- Run cadence and last-run timestamps.
Adding a new aggregation that materially changes the public framing
requires a changelog entry below and is treated as a methodology change,
not a polish change.
6. Falsifier responses are pre-committed ¶
FALSIFIERS.md records the response action for each falsifier in
advance. The operator pre-commits to honoring those responses when a
falsifier triggers:
F-codebook-001(mode overreach): codebook revision PR tightening
the mode definition. Retroactive reclassification if warranted.
F-codebook-003(unmappable patterns persist): codebook revision PR
adding the missing mode, with the first-surfaced pulse cited.
F-framework-001(HAT lens does not capture signal): pause routine,
open scope-reduction or pivot PR.
F-framework-002(substrate shift invalidates prior corpus): mark
substrate boundary in substrate_revision; freeze prior corpus.
F-methodology-001(classifier non-determinism > 20%): routine
redesign; downgrade confidence calibration claims accordingly.
F-methodology-002(provenance decoupling): hard CI failure;
pulse PR cannot merge. Enforced by check_provenance.py.
F-project-001(cannot hold methodological commitment): pause or
retire the project; freeze the corpus; tag a final release.
F-project-002(results contaminate substrate): reframe project as
intervention research, or restore strict observation by separating
operator-of-routines from operator-of-agents.
7. Classifier accountability ¶
Every pulse stamps the classifier model identifier in the classifier
field, the codebook git blob SHA in codebook_hash, and the routine
git blob SHA in routine_hash. These three fields are required on every
v1.1+ pulse; their absence is a F-methodology-002 trigger and a hard
CI failure under check_provenance.py.
The operator pre-commits to:
- Stamping the actual classifier in use, not a placeholder.
- Not retroactively editing these fields without a paired
audits/codebook/drift-v1.N.md ledger entry naming each pulse.
- Filing a quarterly classifier-diversity disclosure under
audits/quarterly/YYYY-QN.md so cross-classifier subsets are visible.
8. Operator-influence disclosure ¶
Per F-project-002, the operator pre-commits to answering this
question in each quarterly audit, on the record:
Did reading recent sentinel pulses materially influence how I
configured, prompted, or scheduled any agent in the observed
substrate during the audit period?
A non-trivial "yes" answer triggers the F-project-002 response and
reframes the project explicitly.
9. Public site does not promote any single pulse ¶
The public site at https://rmednitzer.github.io/sentinel/ aggregates
across the corpus. It does not promote individual classifications as
findings. The first pulse linked from the README (currently
reports/2026/05/02/0800-pulse.md) is intended as a structural
example, not as a featured result.
The operator pre-commits to not adding a "featured pulse" or
"finding of the week" surface to the public site without first
documenting in this register a rationale for why corpus aggregation
is insufficient at the chosen sample size.
10. Replication ¶
The classifier prompt (CLAUDE.md), the codebook
(codebook/<active-version>.md), the schemas
(schemas/pulse*.schema.json), and the validator scripts
(.github/scripts/*.py) are checked into this repository under
Apache 2.0 and a CC-BY-4.0 for non-software content. Anyone with
a comparable agent runtime and an MCP-style substrate can re-run the
classification, fork the codebook, and publish their own corpus.
The operator pre-commits to:
- Tagging quarterly snapshots (
corpus-YYYY-QN) with sealed content
hashes via build_snapshot_manifest.py.
- Not deleting historical pulses from the corpus. Corrections land as
new commits with an audit-trail entry; the original artifact is
preserved in the git history.
Out of scope (not pre-committed) ¶
The following are explicitly out of scope of this register:
- The choice of which MCP substrate to observe. The operator runs the
routine against one substrate (the host MCP server); changing substrates
requires a new substrate_revision declaration but not a new
pre-registration.
- The choice of Claude model version used as classifier. Versions
advance over time; the classifier field records which version
produced each pulse. Cross-version inferences are bounded by
F-methodology-001.
- The interpretation of any single classification. Classifications are
interpretive (honesty_notice field on every pulse). The
pre-registration is for the aggregation methodology, not the
per-extract reading.
- The future scope of the codebook beyond v1.x. v2 may restructure the
taxonomy; the rules in this register apply to the active codebook
version at the time of any given pulse.
Changelog ¶
| When | Section | Action | Rationale |
|---|---|---|---|
| 2026-05-14 | (initial) | added | Optimization pass: address HARKing concern around v1.0→v1.1 post-hoc revision; document pre-commitments that were operating implicitly. |