Skip to content

Contributor runbook

A detailed walk-through for a maintainer or AI-assisted contributor who needs to audit, review, enhance, validate, or extend nous. The AGENTS.md file is the orientation; CONTRIBUTING.md is the PR checklist; CLAUDE.md collects the Claude-specific addenda. This runbook is the longer-form procedure that ties those three documents to the live state of the codebase.

The runbook is written so a fresh session can pick it up cold. Every step names the files it touches, the make target it relies on, and the governance artefact (ADR, BL-NNN, STPA derived requirement) that the work needs to keep in sync.

0. Pre-flight

Confirm the working environment before touching anything. The toolchain is uv + Python 3.12 or newer (3.14 on the Ubuntu 26.04 baseline per ADR 0016), with ruff, mypy --strict, pytest, hypothesis, and mkdocs installed by uv sync --all-extras. The single source of truth for build commands is the Makefile; use the targets rather than the underlying tools so the next contributor inherits the same invocation.

make install                      # uv sync --all-extras
make check                        # ruff + mypy strict + pytest
make docs-build                   # mkdocs build --strict
uv run nous serve                 # stdio MCP server
NOUS_TRANSPORT=http uv run nous serve   # HTTP with OAuth

Read STATUS.md and LIMITATIONS.md before anything else. STATUS.md lists the current phase (L0 scaffold, L1 subsystems, L2 claude.ai integration, L3 STPA and benchmarks) and the maturity of every component. LIMITATIONS.md is authoritative on what is intentionally out of scope. A finding that proposes work already listed under LIMITATIONS.md is not a finding; it is a duplicate.

Set a working branch off main named claude/<short-slug> or feature/<short-slug> (the patterns in AGENTS.md). For an audit run the branch should not modify code; for a review or enhancement run the branch should land one logical change at a time so the diff is small enough for a single PR. The reference instance auto-pulls main every five minutes via nous-auto-update.timer (see docs/deployment.md), so treat every merge as immediately live.

1. Audit run

An audit produces a point-in-time defects report. The baseline artefact is AUDIT.md, conducted on 2026-05-20 against revision a2d0ed4. A fresh audit either extends AUDIT.md in place or lands a companion AUDIT-YYYY-MM-DD.md if the previous report has too many resolved findings to read clearly. The conducting commit hash and the date go at the top so the report is reproducible.

Walk the codebase in the order the spine flows. Start with src/nous/policy.py, src/nous/runner.py, src/nous/audit.py, src/nous/server.py, and src/nous/engine.py. These five files carry the audit invariants: tier classification, audited tool execution, SHA-256-only output hashing, FastMCP wiring, tick orchestration. A defect in any of them is automatically Critical or High; document the class, the file, the line range, the invariant violated, and the minimal patch. The existing severity legend (Critical, High, Medium, Low, Strength) in AUDIT.md section 2 is the rubric.

Continue through each subsystem under src/nous/subsystems/, each estimator under src/nous/estimators/, each interop adapter under src/nous/interop/, the OAuth issuer in src/nous/auth/, the FSM in src/nous/state/machine.py, the Anthropic client in src/nous/anthropic_client.py, the deploy bundle under deploy/, and the test tree under tests/. For each module, capture: contract claimed by the docstring, behaviour observed in code, gap between the two, and the smallest reproducer that exposes the gap. Stub modules that return plausible-looking values without filtering are the most dangerous shape and warrant a Critical or High; record them under a "stubs that look real" sub-heading the way AUDIT.md C5 records the thermal and compute estimator stubs.

Finish with cross-cutting concerns: CI workflow under .github/workflows/, the policy-grep coverage that CLAUDE.md claims (em-dash ban, private-repo ban), the documentation-vs-code drift surfaced by STATUS.md, and the BOM-to-YAML provenance chain rooted at docs/bom.md. Map every finding to a BL-NNN id in docs/backlog.md and to a remediation sequence at the end of the audit report. The published sequence in AUDIT.md section 9 is the model: order by blast radius and estimated hours so a maintainer can sprint the list without re-reading the whole document.

Out-of-scope items go in their own section so the next auditor does not re-flag them. AUDIT.md section 10 lists the patterns that look like defects but are intentional (FSM raising on unknown trigger, audit sink swallowing exceptions, scalar Kalman without Joseph form). Mirror that section in any fresh audit; it is the most effective signal that the report is calibrated against the project's design choices.

2. Review run

A review is lighter than an audit; it focuses on architecture-to-implementation drift and on the realism of the simulator. The reference artefact is docs/review-2026-05-21.md, which surveys correctness, security, data, concurrency, error handling, and tests with prioritised findings and a roadmap. Treat that document as the template: same six categories, same prioritisation rubric, same closing "unknowns and minimal checks" list.

The first check is the architecture document against the live engine. docs/architecture.md describes a tick that fans out across every subsystem and refreshes the self-model. The current src/nous/engine.py wires power and APU only. When the gap widens, either land the wiring or land an explicit capability_matrix / fidelity_level field on device_info so a controller can gate behaviour by fidelity. Either way, the review must either confirm the docs match the code or file a finding.

The second check is the maturity table in STATUS.md against the tool surface in src/nous/server.py and the per-document state of every file under docs/. A document flagged stable should not be churning week to week; a document flagged in-progress should be moving. If STATUS.md claims a subsystem is in-progress but the module is a typed stub returning constants, downgrade the row or fix the module in the same PR. The review should never leave STATUS.md lying about a component.

The third check is realism. Walk each numeric value in profiles/ against docs/bom.md. The BOM is the source of truth; the profile reads from it. A value that has no BOM row gets one; a BOM row without a citation gets a vendor datasheet, a MIL-STD reference, or a published benchmark added. The same applies to estimator covariance bounds: every claim in a model card under docs/model-cards/ needs to be reproducible against the estimator code, ideally against a unit test. The review report captures the gaps and recommends BL-NNN entries.

Close the review with an "unknowns" section. The 2026-05-21 review listed CI branch protection, real-world calibration error of the power/APU model, and runtime performance at target tick rates as the three things that could not be verified from the repository alone. Pick the same shape: each unknown plus the minimal check that would resolve it. The next review picks the unknowns up as starting points.

3. Enhance run

An enhance run lands a fix or a polish for an existing surface. Pick a finding from the most recent audit report or a BL-NNN item flagged for the current phase. The recommended sequencing in AUDIT.md section 9 is calibrated for blast radius first, leverage second; follow it unless the maintainer asks for something specific.

The pattern is the same regardless of the finding. Open the file, write the failing test under tests/unit/ or tests/integration/, watch it fail, write the minimum patch, watch it pass. For a spine file (policy.py, runner.py, audit.py, state/machine.py, anthropic_client.py, estimators/base.py, interop/base.py, or the hardware-profile schema) the patch needs an ADR cross-reference in the commit message even if no new ADR is required, because those files are on the "no change without an ADR" list in CLAUDE.md. If the patch introduces a contract change, an ADR is required: copy docs/adr/0000-template.md to the next number and fill in Context / Decision / Consequences / Revisit triggers.

For each enhancement that materially changes behaviour, append a line to CHANGELOG.md under [Unreleased]. Follow the Keep a Changelog vocabulary (Added, Changed, Fixed, Removed, Deprecated, Security). Reference the BL-NNN id and any ADR. The audit-discovery items from AUDIT.md Critical and High classes should all reference both: the AUDIT line number and the BL-NNN they landed under, so a future reader can trace a behaviour back to the report that motivated it.

Two concrete walk-throughs for the still-open critical findings in docs/audit-2026-05-23.md (the C1 worked example is preserved at the end as the canonical pattern for a spine-file fix; C1 itself is closed):

# C2 recursive redaction in audit.py
# 1. Open src/nous/audit.py, locate redact().
# 2. Replace the flat dict comprehension with a recursive walker
#    that recurses through Mapping and Sequence values.
# 3. Add tests/unit/test_audit.py with a deeply nested payload.
# 4. Commit: fix(audit): recurse argument redaction through nested
#            mappings. References AUDIT-2026-05-23 C2.

# C3 tick task in server lifespan
# 1. Open src/nous/server.py, register a FastMCP lifespan context
#    that schedules nous.tick.tick_loop().
# 2. Cancel the task on shutdown; call engine.stop() so the FSM
#    lands on shutdown rather than leaking the running state.
# 3. Cover with tests/integration/test_server_lifespan.py.
# 4. Commit: fix(server): tick engine through FastMCP lifespan.
#            References AUDIT-2026-05-23 C3, BL-002.

# C1 (closed; kept here as the reference pattern for a spine-file fix)
# 1. Open src/nous/anthropic_client.py, locate CallCap.increment().
# 2. Move fh.flush() / os.fsync() above fcntl.flock(LOCK_UN).
# 3. Add tests/unit/test_anthropic_client.py exercising concurrent
#    locking with multiprocessing; assert no double-counting.
# 4. Commit: fix(anthropic): flush daily-cap counter before unlock
#            References AUDIT.md C1, ADR-0005.

Whenever an enhance run touches a high blast radius file, add a "Security note" paragraph to the PR description per CONTRIBUTING.md. The Security-note list there covers policy.py, runner.py, audit.py, anthropic_client.py, estimators/base.py, interop/base.py, and the hardware-profile schema (six files plus the schema). The broader ADR-required list in CLAUDE.md also covers state/machine.py, but the FSM does not currently require a Security-note paragraph by itself; track ADR requirements and Security-note requirements as separate gates. Add the paragraph proactively whenever the change touches any of the listed surfaces.

4. Validate run

Validation is the answer to "does this still do what STATUS.md says it does". The minimum is make check, which runs ruff, mypy in strict mode, and pytest. CI runs the same target; a green CI does not absolve a contributor from running it locally first. make docs-build (which invokes mkdocs build --strict) is the equivalent for the docs tree and should be green before any PR that touches .md files lands.

Beyond the make targets, three layers of validation matter for this project. The first is the audited tool path: every new tool registered in src/nous/server.py needs at least one test that exercises the audited call (the runner must produce an audit line), per CONTRIBUTING.md. The second is invariants: energy conservation across the tick loop, monotonicity of state.tick and state.ts_s, SoC clamping in [0, 1], FSM transitions only along the allowed table. Use hypothesis (already a dev dependency) to write a property-based test for each invariant; the test lives under tests/unit/ with a test_invariants_<surface>.py name. The third is scenario replayability: each scenario YAML under scenarios/ that lands in the showcase or in CI needs an integration test under tests/integration/test_scenario_<name>.py.

Manual smoke is required for any change that touches the server, the auth issuer, or the deployment bundle. The flow is short:

# stdio
uv run nous serve
# In another terminal: send a JSON-RPC initialize + tools/list +
# a representative tool call (device_info, state_get).

# HTTP with OAuth (requires NOUS_OAUTH_ENABLED=true)
NOUS_TRANSPORT=http uv run nous serve
curl -s http://127.0.0.1:8088/.well-known/oauth-authorization-server
# Walk the PKCE dance against /authorize, /token, /register per RFC
# 7591/7636. The single-client lockdown means a re-registration
# replaces the previous client; that is intentional.

# scenario
uv run nous scenario scenarios/env-monitoring-urban.yaml
tail -f "${NOUS_AUDIT_PATH:-$NOUS_HOME/audit.jsonl}"   # confirm audit lines arrive

For UI-shaped changes (the showcase site), build the site locally (make docs-build then mkdocs serve) and click through the fidelity page, the FSM viewer, and the capability matrix. The showcase is the public face of the project (ADR 0017); a broken link or stale capability column is a regression even if every test passes.

Capture the validation evidence in the PR description's "Blast radius" and "Rollback path" sections. The conventional content is: which components were exercised, which tests were added or extended, which manual checks were run, and what git revert would have to undo. The PR descriptions in the claude/repo-audit-best-practices-fHVFy and subsequent branches are worked examples.

4.X Local cache hygiene

The Hypothesis property tests share an examples database under .hypothesis/ (examples/, constants/, unicode_data/). The database accumulates failing examples across runs, which is what lets a previously-flaky test surface deterministically on the next invocation. Two consequences worth knowing:

The first is the directory grows quietly. A contributor with a hard-to-reproduce flake (a property test fails locally but passes in CI, or vice versa) can clear the database to rule out a stale shrink: rm -rf .hypothesis/. The directory is git-ignored; clearing it is reversible (Hypothesis re-seeds on the next run) and does not lose code.

The second is that a new failing example can appear without any code change to the property test itself. The tests/unit/test_policy_fuzz.py drift caught in the 2026-05-27 engagement is the canonical example: the test had a hardcoded skip list that missed a tool name; Hypothesis happened to generate that name during the run and the test failed even though no code in the property test or in policy.py had changed. Fix the test (derive the skip list from the policy module's tool sets), do not delete the database to make the symptom go away.

The other caches (.ruff_cache, .mypy_cache, .pytest_cache) are cleared by make clean. The audit log lands at ${NOUS_AUDIT_PATH:-$NOUS_HOME/audit.jsonl} by default; on a development workstation that resolves to a path under tests/.nous_home/ via the tmp_nous_home fixture, never under /var/log/nous/.

5. Extend run

An extend run adds new functionality. The canonical recipes live in AGENTS.md under "Canonical recipes"; this section expands each recipe with the order of operations, the files that need attention, and the cross-references that keep the docs honest.

5.1 Adding a subsystem

Pick the simplest physics model that matches reality: a one-state Kalman beats a multi-state EKF if both meet the covariance bound in the model card. Drop the module under src/nous/subsystems/<name>.py implementing the Subsystem Protocol (step / truth / sensor_obs). Add the curves to profiles/jetson-agx-orin.yaml with citations propagated from docs/bom.md; update the other profiles or document why they differ. Add an estimator under src/nous/estimators/<name>.py; pair it with a model card under docs/model-cards/estimator-<name>-<filter>.md that documents the covariance bound.

Wire the subsystem into src/nous/engine.py (both Engine.__init__ and the tick step), and register an MCP tool that reads the estimated state. Classify the tool T0 in src/nous/policy.py; only mutating tools earn T1 or higher per ADR 0013. Add at least one unit test under tests/unit/test_subsystem_<name>.py and one integration test that ticks the engine and asserts the estimator converges to truth within the bound. Update STATUS.md to flip the row from planned to in-progress, and append a Added entry to CHANGELOG.md referencing the BL-NNN.

If the subsystem changes a contract (new sensor format, new vocabulary, new field on the profile schema), open an ADR. The thermo-optical subsystem in BL-055 is the worked example currently on the backlog.

5.2 Adding an MCP tool

Decide the tier first. The default is T0 read-only; a tool that mutates engine or scenario state is T2; a tool that triggers an external side-effect (publish, broadcast, write to disk outside $NOUS_HOME) is T3. T1 is reserved for reversible mutations (e.g. set-then-undo). The classification goes in src/nous/policy.py at registration; conservative defaults err high.

Register the tool in src/nous/server.py. The handler must call app.run(tool=..., ctx=..., audit_args=..., policy_text=..., work=...) so the runner records an audit line. The handler should never write to disk directly and should never call the Anthropic client without going through src/nous/anthropic_client.py.

Update docs/tool-reference.md (or run make schema to regenerate it). Add at least one test under tests/unit/test_server.py or tests/integration/ that exercises the audited path. The audit line is the contract; assert on it.

5.3 Adding a scenario

Drop a YAML file under scenarios/<name>.yaml with the top-level keys the Scenario loader expects: schema_version, meta (name, description, tags), profile (the hardware profile id), tick_budget (the maximum number of ticks before the scenario ends), and steps (a timeline of {at_min, action, args} entries; injection actions like inject_sensor_drift live inside steps, not at the top level). Reference the profile the scenario expects; if the scenario relies on a non-reference profile, add a sentence to docs/scenarios/README.md explaining why.

If the scenario is meant to be replayable in CI, add a test under tests/integration/test_scenario_<name>.py that loads, runs, and asserts the closing engine state. Showcase scenarios get a page under docs/showcase/scenarios/ generated by scripts/gen_showcase_telemetry.py; rerun the script if you change the scenario's tick count or end state.

5.4 Adding a hardware profile

Copy profiles/jetson-agx-orin.yaml and edit the curves. The citation header at the top of the YAML is mandatory; every numeric value needs to trace to a row in docs/bom.md. Run make schema to regenerate the JSON Schemas the project does emit (AuditRecord, Scenario) under docs/schema/; note that Engine._load_profile() today calls yaml.safe_load without schema validation, so a typo in a profile key degrades silently to the default (AUDIT M10, BL-006). Until BL-006 lands, the load-time validation step is a manual diff against profiles/jetson-agx-orin.yaml. Add a section to docs/hardware-profiles.md and a one-line entry in the profiles README, then update STATUS.md if the new profile shifts the maturity of the schema row.

5.5 Adding an interop adapter

Implement the Adapter Protocol in src/nous/interop/base.py. The encode/decode pair must be well-formed even at stub maturity; the audit explicitly flagged stubs that emit malformed output (MISB key truncation, CoT missing required attributes, incomplete NMEA GGA) as High findings precisely because a controller can be misled.

Add a conformance document under docs/conformance/<standard>.md declaring the QoS policy, the supported envelope, the deliberate omissions, and the gap between the v0.1 posture and the standard. Cite the canonical source per the CLAUDE.md citations convention; do not paste excerpts into the document.

5.6 Adding an ADR

Copy docs/adr/0000-template.md to the next number. Fill in Status, Date, Authors, Context, Decision, Consequences, Revisit triggers. Keep it to one page; the existing ADRs (0001 through 0018) are the bar. Update docs/adr/README.md, or regenerate it with scripts/gen_adr_index.py. ADRs cited from STPA derived requirements need a back-reference: open docs/stpa/09-derived-requirements.md and add the ADR id to the relevant row.

5.7 Adding a backlog item

Append to docs/backlog.md with the next BL-NNN id, a one-line summary, the phase (L0..L3), a [planned] status. Move to [in-progress] and [done] as work lands. If the item resolves a finding in AUDIT.md or in a review document, cite the finding number inline. If the item is referenced from an STPA derived requirement, prefix the description with the DR-N id.

6. Backlog work

The backlog in docs/backlog.md sequences work by phase. L0 is scaffold (the [in-progress] BL-001 covers the v0.1 PR itself). L1 is subsystem models and the state machine wiring. L2 is claude.ai integration and the scenario pack. L3 is STPA completion, real local inference, propagation-aware comms, and the additional adapters. Items inherit the additive-surface rule (ADR-0007) once L0 ships; a change that breaks an existing tool signature needs an ADR even if the BL row exists.

Pick items by phase first, then by dependency, then by blast radius. A practical sprint for the current state of the repo (as of the 2026-05-23 audit, which closed C1, C4, C5 (estimator stubs), H3, H4, H5, and M4 and added N1 deployment drift and N2 audit-degraded as new highs):

# Sprint A (close the live-VM gap)
AUDIT N1          # catch-up PR bringing main up to the L1 rollout
AUDIT N2          # restore the audit JSONL sink on the live VM
AUDIT C3          # FastMCP lifespan tick task
# Sprint B (close the remaining baseline criticals)
AUDIT C2          # recursive argument redaction in audit.py
AUDIT C6          # CI policy greps (em-dash + private-repo)
AUDIT H1          # tests/unit/test_runner.py (runner is the only spine
                  # module still without a unit test)
# Sprint C (OAuth hardening + auto-update discipline)
AUDIT H6          # OAuth file-store async lock + parent fsync + 0600 chmod
AUDIT H7          # refresh-token family revocation
AUDIT H8          # auto-update rollback record + kill-switch in SECURITY.md
# Sprint D (subsystem polish + self-model wiring)
BL-005b           # PMU/PDU subsystem (lifts off PowerSubsystem)
BL-014            # scenario YAML loader and injectors
BL-018            # self-model assess + viability wiring

A finding without a BL-NNN should get one before the work starts; a BL-NNN without a finding it resolves should cite an ADR or STPA derived requirement instead. The link between the work tracker and the governance artefacts is the trace that keeps the project legible.

Status semantics are strict. [planned] means scoped but unstarted. [in-progress] means there is a branch, a stub, or a partial PR. [done] means it has landed on main and STATUS.md reflects it. Move the marker in the same commit that lands the work, not after.

7. Cleanups, consolidations, and refactoring

Refactoring inside the low-blast-radius surfaces is free to iterate. Tool wiring in src/nous/server.py (provided the tier is set correctly), subsystem physics curves in profile YAML, scenario YAML files, and docs (README, ADR additions, model cards, conformance posture) are all on the low list per CLAUDE.md. Land the smallest possible diff per PR; the maintainer can compose larger sequences from the merge log.

High blast radius surfaces require an ADR before any change. src/nous/policy.py, src/nous/runner.py, src/nous/audit.py, src/nous/state/machine.py, src/nous/anthropic_client.py, src/nous/estimators/base.py, src/nous/interop/base.py, and the hardware-profile schema in profiles/ are the seven files plus one schema on the ADR-required list. The Security-note requirement in CONTRIBUTING.md covers a narrower set (six of those files plus the schema; the FSM in state/machine.py is ADR-gated but not on the Security-note list). Treat the two requirements as separate gates: ADRs document the decision, Security notes document the threat-model implications.

The known consolidation candidates as of the 2026-05-23 working state (see docs/audit-2026-05-23.md for the audit they came out of):

The first is the estimator base. Each estimator currently implements its own predict / update / state triple. As more filters land (thermal, compute, biometrics, comms), the boilerplate around covariance bookkeeping and step accounting will repeat; consolidating into a BaseEstimator mixin under src/nous/estimators/base.py is defensible only if the resulting class still leaves the filter implementation under twenty lines (the bar PowerEstimator set). If the mixin grows beyond that, leave the implementations independent.

The second is the interop encoder. The CoT, SensorThings, MISB KLV, and NMEA encoders all build a structured record from data, validate it, and serialise. The validation and the "required-attributes-by-standard" tables could move into src/nous/interop/base.py as a declarative schema, but only after each adapter ships a complete encoder (the audit High findings H3, H4, H5 must land first). A consolidation against incomplete encoders entrenches the gaps.

The third is the subsystem stub posture. Several v0.1 subsystems return constants from truth() and sensor_obs(). The audit's C5 finding recommends a _stub: True sentinel in the covariance dict so a controller can distinguish "I have no estimate" from "I have a zero-error estimate". The same pattern applies to src/nous/self_model/ where p5 / p50 / p95 are all 0.0. Land the sentinel pattern in one PR across every stub before any per-subsystem implementation work; it is the smallest consolidation that prevents the most expensive class of bug (a controller acting on a plausible zero).

The fourth is the deployment bundle. deploy/install.sh, deploy/auto-update.sh, the systemd units under deploy/systemd/, and the Caddyfile template all carry repeated path constants ($NOUS_HOME, /var/log/nous, /opt/nous). A deploy/paths.env file sourced by every script and templated into every unit would remove the drift risk, and would land cleanly behind the systemd EnvironmentFile= directive. This is a docs-and-shell change, no Python, but warrants an ADR if the path contract changes.

The fifth is the doc tree itself. The next section walks the full markdown update procedure.

8. Markdown update run

A markdown sweep is a discrete contribution shape: no Python changes, no behaviour changes, only documentation freshness. The reference artefact is the PR sequence that landed docs/review-2026-05-21.md and the docs: bring markdown tree up to date with current code and ADR 0017 commit (e0b3c7b). Follow the same shape.

8.1 Establish the baseline

Inventory every markdown file in the tree. The find invocation is the canonical one (excluding caches and the git directory):

find . -name "*.md" \
    -not -path "./node_modules/*" \
    -not -path "./.git/*" \
    -not -path "./.venv/*" \
    -not -path "./site/*" \
    | sort > /tmp/nous-md-inventory.txt
wc -l /tmp/nous-md-inventory.txt

Read the inventory once end-to-end before editing anything. The tree currently spans the top level (README, AGENTS, CLAUDE, CONTRIBUTING, SECURITY, STATUS, LIMITATIONS, CHANGELOG, AUDIT), the docs/ tree (architecture, backlog, deployment, releasing, bom, hardware-profiles, state-machine, tool-reference, the review artefact, this runbook), docs/adr/ (the numbered ADRs and the index), docs/stpa/ (the numbered STPA artefacts), docs/conformance/ (per-standard posture), docs/model-cards/ (per-subsystem and per-estimator), docs/showcase/ (the public-facing site), docs/subsystems/, docs/scenarios/, the skills/ runbooks, the deploy/README.md, and the examples/inspector_quickstart.md.

8.2 Check the cross-references

Every markdown link should resolve. Two greps catch most of the rot:

# Internal markdown links that point at files
grep -rEn '\]\((\.\.?/|docs/|src/|tests/|deploy/|profiles/|scenarios/|skills/|examples/)[^)]+\.md\)' \
    --include='*.md' .

# BL-NNN references
grep -rEn '\bBL-[0-9]{3}[a-z]?\b' --include='*.md' .

# ADR references
grep -rEn '\bADR[ -]?[0-9]{4}\b' --include='*.md' . 

For each hit, confirm the target exists and the heading anchor (if any) is still valid. The MkDocs strict mode catches broken links at site-build time (make docs-build), but the BL-NNN and ADR references go beyond what MkDocs can validate. A BL-NNN reference must resolve to a row in docs/backlog.md; an ADR reference must resolve to a file under docs/adr/.

8.3 Enforce the em-dash ban

CLAUDE.md declares the em-dash ban; the CI grep is the enforcement mechanism (or should be, per audit C6). Run the grep manually until the CI step lands:

! grep -rPn '\x{2014}' --include='*.md' .   # U+2014 EM DASH
! grep -rPn '\x{2013}' --include='*.md' .   # U+2013 EN DASH (optional)

Replace any hit with --, a comma, a colon, or a parenthetical, per the convention. The ban is repository-wide for markdown (including fenced code blocks); only source-code strings under src/ may contain U+2014 if the string genuinely needs one.

8.4 Cross-check maturity claims

Every component status in STATUS.md needs a matching reality. Walk the component table and confirm the maturity flag for each module against the actual code: stable modules should have an ADR governing changes and a test suite covering the contract, in-progress should have at least one wired-up call path, planned should be a typed stub. If a row drifts, fix the row in this PR.

Same procedure for the per-document state table: stable ADRs should not be edited (a new ADR supersedes them), in-progress documents should be moving (a commit in the last sprint), planned should either become a stub file or come out of the table.

8.5 Refresh the BOM and the model cards

docs/bom.md is the realism anchor for every numeric value in profiles/. For each row, confirm the citation is still the best public reference and the numeric value still matches the profile YAML. If a profile drifted from the BOM, fix the profile or the BOM in the same PR; never let the two diverge.

Each estimator and subsystem model card under docs/model-cards/ needs the same check against the estimator code. The covariance bound, the warm-up period, the divergence conditions, and the "do not use for X" caveats should all match the implementation. The audit's C5 finding is the cautionary example: a model card that claims a calibrated covariance but is backed by a stub returning constants is worse than a model card that declares the subsystem unimplemented.

8.6 Sweep the showcase and the public face

docs/showcase/ is the externally visible artefact; ADR 0017 documents the lockdown posture and the rationale for keeping the production VM CIDR-gated. The capability matrix, the fidelity badges, and the FSM viewer should reflect the live main state. If scripts/gen_showcase_telemetry.py produced telemetry against a different scenario or profile than the showcase claims, regenerate or fix the claim.

The top-level README.md, STATUS.md, and LIMITATIONS.md are the next-most-visible files. The "Last reviewed" date in STATUS.md and LIMITATIONS.md should be updated to the date of the sweep. Capability lists and limitation rows that no longer match main get rewritten; a new limitation that emerged since the last sweep gets a fresh LN identifier.

8.7 Update the changelog

If the sweep is substantive (more than typo fixes), append a Changed or Fixed entry to the [Unreleased] block in CHANGELOG.md. The convention is one bullet per material clarification, with a parenthetical pointing at the affected files. Pure typo fixes do not warrant a CHANGELOG entry.

8.8 Regenerate and validate

Regenerate the generated docs and rebuild the site:

make schema        # tool-reference.md, ADR index, backlog summary, JSON schemas
make docs-build    # mkdocs build --strict

mkdocs build --strict will fail on a broken link, a missing navigation entry, an orphan page, or a Markdown extension parse error. Resolve each warning rather than silencing it.

8.9 Land as one PR

A markdown sweep ships as a single PR titled docs: bring markdown tree up to date with current code and ADR NNNN (or similar). The PR body lists the files touched in three groups (top-level, docs/, skills/), references the audit or review that motivated the sweep, and notes any cross-cutting findings (em-dash bans triggered, BL-NNN references repaired, STATUS.md rows that changed maturity). The maintainer's review focuses on whether the maturity claims still match main; the cosmetic changes are secondary.

A sweep that is too large to review as one PR is a signal to split by sub-tree: docs: refresh STPA artefacts, docs: refresh model cards, docs: refresh conformance posture are the natural splits. Each split carries its own CHANGELOG entry if substantive.

9. Closing the loop

Every contribution flows back to four places. STATUS.md reflects the new maturity of every component the change touched. CHANGELOG.md captures the user-visible behaviour. The backlog docs/backlog.md advances the relevant BL-NNN rows. The audit artefact (AUDIT.md or its successor) crosses off the finding(s) the change resolved. A PR that leaves any of those four out of sync is incomplete.

The reference rhythm is: audit on a fixed cadence (quarterly, or after a phase boundary), review on a lighter cadence (monthly), enhance and validate continuously, extend per the BL-NNN sprint, sweep markdown every time the maturity table shifts. Following the rhythm keeps the governance documents honest and the simulator legible to the next controller that picks up the surface.