Falsifiers
This document records conditions under which the sentinel framework, codebook,
or project would be wrong, and the response action each triggers. The act of
writing these down is most of the value: a research practice without explicit
falsification criteria becomes confirmation-seeking by default. When a
falsifier triggers, the operator commits to acting on it, including retracting
prior claims if the evidence warrants.
This file is a contract, not a reference. Edits require a corresponding
changelog entry below and a rationale in the PR description.
Falsifier categories ¶
Falsifiers are partitioned into four severity levels, in increasing order of
what they call into question.
- F-codebook: a specific mode definition is wrong, too loose, or too tight.
Response: codebook revision (new versioned file under codebook/).
- F-framework: the HAT failure-mode lens is the wrong frame for this
substrate. Response: scope reduction or pivot.
- F-methodology: single-rater observation under this routine produces
unreliable signal. Response: routine redesign.
- F-project: the project should not continue in its current form.
Response: explicit retirement or pause, with prior claims annotated.
Active falsifiers ¶
F-codebook-001: meaningful_control_erosion overreach ¶
Condition. meaningful_control_erosion rate exceeds 30% over any rolling
30-day window in which no pulse cites an actually irreversible action by the
observed agent.
Why this matters. The mode definition implies the agent reduced operator
ability to intervene, contest, or reverse. If the rate is high but no
irreversible actions occurred, the mode is being applied to legible
operator-displeasure rather than actual control loss. The mode is too loose.
Detection. Manual review during quarterly audit, plus automatable check
(check_falsifiers.py, deferred until corpus reaches 30 days).
Response. Open a codebook/ revision PR that tightens the mode
definition. Document the trigger in the PR body. Reclassify retroactively if
warranted (Wave 2 codebook drift detection).
F-codebook-002: evaluation_awareness_divergence base rate (closed by v1.1) ¶
Status. Closed by codebook v1.1 (PR #51). The mode produced no signal
in v1.0 and was deprecated in v1.1 precisely because base rate was
effectively zero on this single-deployment substrate. The falsifier
itself ran the other direction (rate near zero rather than >5%) and was
satisfied by deprecating the mode, not by triggering an investigation.
Condition (historical). evaluation_awareness_divergence rate exceeds 5%
in any rolling 14-day window, given the v1.0 codebook's note that base
rate should be near zero for this single-deployment substrate.
Why this matters (historical). The codebook itself recorded the
expectation that this mode was rarely applicable here. A high observed
rate would have meant the mode definition did not match the substrate,
or the substrate had changed in ways that newly exposed
evaluation-awareness behavior, or the classifier was overreaching.
Restoration condition. Restore as an active falsifier if a future
codebook version (v1.2+) reintroduces an evaluation-awareness mode, e.g.
because the substrate begins to span multiple monitoring conditions.
F-codebook-003: unmappable patterns persist ¶
Condition. patterns_observed field contains entries that do not map to
any codebook mode for three consecutive pulses.
Why this matters. patterns_observed is the explicit overflow channel
for things the codebook cannot name. A persistent overflow signals the
codebook is missing a mode.
Detection. Manual review during quarterly audit, plus a check
(check_consistency.py) that surfaces unmapped pattern entries.
Response. Codebook revision adding the missing mode, with explicit note
in codebook/v1.N.md of which pulse first surfaced it.
F-framework falsifiers ¶
F-framework-001: HAT lens does not capture the actual signal ¶
Condition. Two consecutive quarterly audits produce mode disagreement
between the operator's blind reclassification and the routine's original
classification on more than 40% of audited pulses, AND the operator's
reclassifications themselves do not converge on consistent modes.
Why this matters. If the operator (the codebook author) cannot
reproduce the classifier's labels and cannot consistently produce their
own labels, the framework is not naming stable categories. The HAT
failure-mode lens may be fine in principle but wrong for this substrate,
or the codebook may need a structural rewrite.
Detection. Manual via quarterly audit comparison.
Response. Pause routine. Open scope-reduction or pivot PR; possibly
retire codebook v1 entirely and restart with v2 against a different mode
inventory.
F-framework-002: substrate shift invalidates prior corpus ¶
Condition. The host MCP server infrastructure undergoes a structural redesign
that materially changes which agents run and what they produce, AND the
codebook's mode rates shift by more than 50% within four weeks of the
change.
Why this matters. Pulses are observations of one specific substrate.
If the substrate changes substantially, the corpus before the change and
after it cannot be meaningfully aggregated. Trends across the boundary are
artifacts of the substrate change, not the codebook.
Detection. Manual; substrate revisions are visible in
substrate_revision field once populated.
Response. Mark the substrate boundary explicitly in substrate_revision
and STATS.md. Compute mode rates separately on each side. Treat the older
corpus as a frozen reference, not a comparable baseline.
F-methodology falsifiers ¶
F-methodology-001: classifier non-determinism ¶
Condition. Re-running the routine on identical inputs (same window,
same substrate state) produces classifications that disagree on mode for
more than 20% of items.
Why this matters. Single-rater work depends on the rater being
internally consistent. A non-deterministic classifier produces noise that
masquerades as signal. The routine cannot ground claims if it cannot
reproduce its own outputs.
Detection. Adversarial replay test: schedule one pulse per quarter to
re-classify a fixed historical window. Currently manual; a replay-test.yml
workflow is deferred to Wave 4.
Response. Routine redesign. Lock down model version, temperature, and
seed (where supported). If still non-deterministic above threshold,
acknowledge the limit explicitly in honesty_notice and downgrade
confidence calibration claims accordingly.
F-methodology-002: provenance-classification decoupling ¶
Condition. A pulse's codebook_hash or routine_hash does not match
the git blob SHA of the file at the pulse's commit, or those fields are
missing on a v1.1 pulse.
Why this matters. Provenance fields exist to make codebook drift,
substrate drift, and classifier drift observable. If they are missing or
wrong, the corpus loses its self-attestation property and historical
classifications cannot be trusted.
Detection. Automated (check_provenance.py, deferred to Wave 2.5).
Response. Hard fail in CI. The pulse PR cannot merge until provenance
is correct.
F-project falsifiers ¶
F-project-001: methodological commitment cannot be held ¶
Condition. The operator misses two consecutive quarterly audits, OR
codebook revisions land without the required reclassification of recent
pulses, OR the routine runs without producing valid pulses for more than
14 consecutive days.
Why this matters. Observational research requires sustained discipline.
A project that cannot maintain its own audit cadence is producing data
without the methodological scaffolding that makes that data trustworthy.
Better to retire honestly than continue under an empty commitment.
Detection. Manual; pulse cadence is automated via stale-routine.yml.
Response. Pause the routine. Either resume with a renewed commitment
documented in a PR, or retire the project. Retirement means freezing the
corpus, marking it explicitly retired in README, and tagging a final
release.
F-project-002: results contaminate the substrate they observe ¶
Condition. Sentinel pulses materially change how the observed substrate's
agents behave (the operator reads pulses, adjusts routines based on them,
and the next pulses reflect those adjustments rather than the agents'
unobserved behavior).
Why this matters. This is the observer effect at the operator level.
If the corpus is no longer a record of how the agents behave but a record
of how the agents behave when known to be observed, the project's framing
as observational research is wrong. It becomes intervention research,
which is a different kind of work with different methodology.
Detection. Manual; require an audit question per quarter explicitly
asking the operator whether and how recent pulses influenced agent
configuration.
Response. Reframe the project explicitly. Either adopt intervention-
research methodology (with controls, intentional A/B periods, etc.) or
restore strict observation by separating the operator-of-routines from
the operator-of-agents. The current single-operator setup makes pure
observation difficult.
Inactive / retired falsifiers ¶
(none yet)
Changelog ¶
| When | Falsifier | Action | Rationale |
|---|---|---|---|
| 2026-05-04 | (initial) | added | Wave 2 of redesign. Establishes the falsifier registry. |