Skip to content

AUDIT 2026-05-23

Full system audit of nous covering the local repository and the live nous-mcp surface that claude.ai talks to. This is a delta audit against the 2026-05-20 baseline in AUDIT.md and the in-depth review in docs/review-2026-05-21.md. It records which baseline findings have closed, which remain open, and which new defects this run uncovered.

Conducted: 2026-05-23. Branch audited: claude/optimistic-gauss-5jpfI (49 commits ahead of origin/main). Source revision: 02f2062 (Merge PR #37, BL-011 biometrics subsystem). Live MCP audited: nous-mcp over the configured claude.ai connector (see §4 below).

1. Executive summary

The local repository has closed the majority of the baseline audit's critical and high findings. Quality gates are green (ruff clean, mypy strict clean across 58 source files, 345 pytest tests pass, mkdocs strict build succeeds). The L1 subsystem suite that the baseline called out as "wired but stub-flavoured" is now implemented end to end: ten subsystems carry physics, nine estimators run live, the FSM has SC-2 thermal and low-power guards, the interop adapters emit standards-shaped output, and the daily-cap fsync race is closed.

The single most material finding from this audit run is deployment drift. The local branch is forty-nine commits ahead of origin/main, and the auto-update timer on the live VM tracks origin/main. The result, which I confirmed directly through the live MCP connector, is that the live nous-mcp exposes the v0.1 stub surface (eleven representative tools, all subsystem reads stubbed to null, audit sink reported as degraded, engine never ticked past boot) while the repository under src/nous/ carries the full L1 surface. Anyone reading STATUS.md and then querying the live MCP will see a different device than the one the docs describe. This is the deployment-side twin of the architecture-vs-implementation drift the 2026-05-21 review flagged, and it is the highest-leverage thing to fix this week.

Three baseline-critical findings remain open in code: audit.py redaction is still single-level (C2), the FastMCP server still has no lifespan task that drives engine.tick() (C3), and CI still does not enforce the documented em-dash and private-repo greps (C6). Two high findings remain in auth/oauth.py: there is no inter-handler lock around the read-modify-write cycle (H6), and refresh-token rotation still revokes one token at a time rather than the whole family on detected reuse (H7). The auto-update script asserts post-restart service health but still has no commit-SHA rollback record (H8).

Otherwise the repository is in noticeably better shape than the baseline. The MISB KLV adapter (C4), CoT adapter (H3), NMEA adapter (H5), SensorThings adapter (H4), the audit chmod 0600 (part of H6), and the systemd hardening (M4) are all closed. The estimator stubs that the baseline flagged for returning misleading covariance (C5) are gone: thermal, compute, storage, sensors, and biometrics now run Kalman filters with covariance that actually shrinks under observation. The anthropic_client daily-cap fsync race (C1) is closed and covered by unit tests that exercise the cap exhaustion, UTC rollover, corrupted state, and concurrent locking paths.

2. What closed since the 2026-05-20 baseline

Baseline id Module Status Evidence
C1 anthropic_client.py Closed CallCap.increment now fh.flush() + os.fsync() while still inside the flock, and a fsync failure raises CapExhausted (src/nous/anthropic_client.py:100-110). Covered by tests/unit/test_anthropic_client.py (9 tests, includes multiprocessing flock test).
C4 interop/misb_klv.py Closed Full BER short and long form encoding with explicit overflow refusal (misb_klv.py:108-128). Key range [1, 255] validated; values > max_value_len raise rather than truncate (misb_klv.py:61-71).
C5 (thermal) estimators/thermal.py Closed Two-state Kalman with shrinking covariance (estimators/thermal.py). Covered by tests/unit/test_thermal_estimator.py.
C5 (compute) estimators/compute.py Closed Two-state Kalman over (load_pct, draw_w). Covered by tests/unit/test_compute_estimator.py.
C5 (storage / sensors / biometrics) estimators/*.py Closed Each ships a live Kalman with bounds validation and a rejected_updates counter (estimators/storage.py, sensors.py, biometrics.py).
C5 (self_model zeros) self_model/*.py Partially closed The placeholder is still typed-stub (self_model/assess.py:29-44) but documented as [planned] in STATUS.md. Returns capabilities dict rather than misleading numeric zeros.
H3 interop/cot.py Closed time, start, stale, how="m-g" written on every event (cot.py:67-78). XXE-safe parser refuses DOCTYPE / ENTITY declarations.
H4 interop/sensorthings.py Closed phenomenonTime normalised to UTC ISO-8601 with Z suffix (sensorthings.py:41-45). max_payload_len cap added.
H5 interop/nmea0183.py Closed Full 14-field GGA sentence, XOR checksum, ASCII enforcement, lat/lon range checks (nmea0183.py:38-64).
H9 profiles/jetson-agx-orin.yaml Open (still uncited) inference_local.tok_per_s_p50: 200 and energy_j_per_tok: 0.12 still carry no inline placeholder comment.
M2 deploy/Caddyfile.example Not re-verified Out of audit scope this round.
M4 deploy/systemd/nous.service Closed ProtectClock=true, ProcSubset=pid, SystemCallFilter=@system-service with negated set, MemoryDenyWriteExecute=true, RestrictAddressFamilies (nous.service:40-54).
Audit chmod 0600 (part of H6) audit.py Closed target.chmod(0o600) applied after open (audit.py:188-189). The file handler is a WatchedFileHandler with explicit fsync after every emit (_FsyncingFileHandler.emit).

The L1 subsystem suite (BL-003 power, BL-005 thermal, BL-005a APU, BL-007 compute, BL-008 storage, BL-009 sensors, BL-010 position, BL-011 biometrics, BL-012 comms, BL-013 inference) all moved from planned or stub to in-progress since the baseline, with matching estimators, matching MCP tools (tier T0), and matching integration tests under tests/integration/test_*_through_engine.py and tests/integration/test_*_drives_*.py.

3. Open findings carried from the baseline

C2. Argument redaction is still flat

src/nous/audit.py:52-68. redact() walks only the top-level keys of the argument mapping. A caller that passes {"context": {"headers": {"Authorization": "Bearer ..."}}} would still write the secret verbatim. The current MCP tool surface only takes top-level scalar arguments, so the blast radius is theoretical today, but the redaction allowlist promises depth that the implementation does not deliver, and the only redaction test in tests/unit/test_audit.py:12-18 exercises a flat mapping.

Recommendation: implement the recursive walk() from the baseline, add a nested-key test, and lift the test to cover lists of dicts too. The fix is required before any tool surface accepts structured payloads (BL-014 scenario loader is the next surface that will).

C3. Engine starts but the FastMCP server still never ticks it

src/nous/server.py:48 calls self.engine.start() once in Nous.__init__ but no FastMCP lifespan hook schedules tick_loop. The live MCP confirms the symptom directly: device_health returns tick=0, ts_s=0.0, mode="boot", and state_history returns the single stowed -> boot transition that fires at construction. A controller calling any of the new *_status tools against the live server reads boot-time truth values, not anything that has evolved through the physics. The CLI tick subcommand can drive engine.tick() manually, but the public HTTP server cannot.

Recommendation: add a FastMCP lifespan async context manager that creates an asyncio.Task running tick_loop(engine, hz=tick_hz) and cancels it on shutdown, calling engine.stop() on the way out. Wire the cancellation through anyio so SIGTERM under systemd lands cleanly. This is the single change that flips every L1 subsystem read from "boot snapshot" to "live state" on the deployed server.

C6. CI still does not enforce the documented policy greps

.github/workflows/ci.yml:38-39 runs make check and nothing else. scripts/ has gen_*.py but no policy_checks.sh. CLAUDE.md still states "the CI grep checks both" (em-dashes and private-repo references). The repository continues to pass by authorial discipline only.

Recommendation: write scripts/policy_checks.sh with two greps (! grep -rPn '\x{2014}' --include='*.md' . and the private-repo allowlist) and add a policy workflow job. Land alongside an explicit file in scripts/README.md so the contract is documented in code, not in CLAUDE.md.

H1. Spine module test coverage is partly remediated

tests/unit/test_anthropic_client.py lands with 9 cases. tests/unit/test_state_machine_guards.py lands with 8 cases. tests/unit/test_policy.py and tests/unit/test_policy_fuzz.py both exist. The remaining gap is runner.py: there is no tests/unit/test_runner.py. The runner is the audited execution wrapper that every tool call passes through; the four tiers, three policy modes, denial-path audit record, exception-to-body mapping, and truncation behaviour are exercised only through integration tests.

Recommendation: write tests/unit/test_runner.py parametrised over the four tiers and three policy modes; assert denied=True audit lines on the denial path; assert exception.__class__.__name__ ends up in the body envelope; assert truncation kicks in at max_output. Roughly two hours of work.

H2. mypy strict still excludes the test tree

pyproject.toml sets files = ["src/nous"] for mypy. The test tree has grown to 38 files and would benefit from strict typing; any fixture decorator drift will silently slip past make check.

H6. OAuth file store still has no inter-handler lock

src/nous/auth/oauth.py has no asyncio.Lock. _Store.load() ... _Store.save() sequences in register_client, exchange_authorization_code, _issue, load_access_token, and exchange_refresh_token race under concurrent FastMCP requests. The single-client lockdown configuration limits the practical impact, but the read-modify-write cycle is still wrong on its merits. The directory chmod is correct (the install script sets 0750 on the state directory) but _Store.save() writes via tmp.replace without fsyncing the parent directory, so a hard reboot can lose an already-rotated refresh token.

Recommendation: an asyncio.Lock field on FileOAuthProvider wrapping every load/save sequence, plus an os.fsync on the parent directory fd after Path.replace. A chmod(0o600) after tmp.replace would also bring the state files to parity with the audit log.

H7. Refresh-token rotation still has no family revocation

src/nous/auth/oauth.py:247-257. exchange_refresh_token() deletes the consumed refresh token and mints a fresh pair. OAuth 2.1 BCP §4.13 requires that on detected reuse of a rotated refresh token, the entire chain be revoked. The current implementation has no chain id, so a captured-then-replayed refresh token continues to mint access tokens silently in parallel with the rightful client.

Recommendation: add an issue_id field set at first issuance and copied across each rotation. On reuse or collision wipe every token record carrying that id. Add an integration test under tests/integration/test_oauth_rotation.py.

H8. Auto-update has post-restart health check but no rollback record

deploy/auto-update.sh:41-56 does git reset --hard "${REMOTE}", runs install.sh, restarts nous.service, and asserts systemctl is-active. That closes the worst-case "broken service never noticed" gap from the baseline, but the script still does not record the prior known-good commit SHA, so a controller wanting to roll the live host back to the prior commit needs to read the auto-update log and manually git reset --hard <prev>. The systemd unit also has no hardening even though it runs as root.

Recommendation: capture LOCAL to /var/lib/nous/auto-update.last_ok only on success; add a auto-update-rollback.sh companion that reads that file and reverses the update; document the kill switch (systemctl disable --now nous-auto-update.timer) in SECURITY.md.

M1. Runner denial path still omits exit_code

src/nous/runner.py:54-69. The denial branch writes denied=True, decision_reason=... but leaves exit_code=None. The audit record has an exit_code: int | None = None field but it is never populated on the denial path. Setting it to 1 would let forensic queries count denials per tier per day without parsing the body string.

M7, M8, M9, M10, L1, L2, L4, L5, L6, L7, L8, L9, L10

Not re-verified this round; the 2026-05-20 baseline remains authoritative. The largest remaining item is M10 (no JSON-schema validation at profile load time): now that ten subsystems consume profile keys, a misspelled key still silently degrades to a default. Wire scripts/gen_schemas.py output into engine._load_profile and fail fast on validation error before the next profile schema change.

4. Live MCP audit

I exercised every advertised tool on the configured nous-mcp claude.ai connector. Outputs below are verbatim modulo formatting.

4.1 Tool surface and posture

device_info reports:

version:    0.1.0
profile:    jetson-agx-orin
transport:  http
policy:     open
tick_hz:    2.0
audit.path: /var/log/nous/audit.jsonl
audit.degraded: true

Findings:

  1. audit.degraded is true. The audit sink cannot write to /var/log/nous/audit.jsonl on the live host. The path is set by Settings.resolved_audit_path() and the install script creates /var/log/nous/ owned by the nous user; the most likely cause is that the deployed binary is running under a user that does not own the directory, or that systemd's ProtectSystem=strict plus a stale ReadWritePaths is denying the write. Either way the live server is logging every tool call to stderr only, with no append-only fsync guarantee. This is a regression from the audit invariants documented in ADR-0002 and LIMITATIONS.md.

  2. policy: open is the default. The server admits every tool regardless of tier. Acceptable in a single-tenant claude.ai integration but worth pinning explicitly in SECURITY.md so the posture is documented rather than implicit.

  3. tick_hz: 2.0 is set but the engine is not actually ticking (see 4.2).

4.2 Engine state and tick activity

device_health returns tick=0, ts_s=0.0, mode="boot", operator_state="nominal", comms_state="connected". state_history(limit=256) returns one entry: stowed -> boot. state_get returns mode="boot", tick=0.

This is the live confirmation of baseline finding C3. The engine has been booted but no tick has ever run. Every subsystem read served by the live server returns the boot snapshot, not evolving physics.

4.3 Subsystem reads return v0.1 stubs

Tool Live response Repository response
power_status {"soc_pct": null, "draw_w": null, "endurance_min_p50": null, "note": "power subsystem ships as a typed stub in v0.1"} Full SoC, voltage, current, load, charge offered/accepted, endurance, plus Kalman estimate (server.py:192-223).
apu_status {"solar_w": null, "fuelcell_w": null, "fuelcell_fuel_pct": null, "note": "APU subsystem ships as a typed stub in v0.1"} Per-source watts (solar, fuelcell, vehicle, USB-C), fuel mass, connected flags, estimator (server.py:225-252).
comms_state {"state": "connected", "links": [], "note": "links emit through the comms estimator in L1"} Per-link beliefs, derive_state label + reason (server.py:348-363).
self_estimator_status {"estimators": [], "note": "estimator framework lands in L1"} Nine estimators with point + covariance + ts (server.py:511-541).
interop_formats Lists six adapters with the v0.1 stub note. Same shape (matches).
self_model_assess {"capabilities": {}, "note": "self-model layer lands in L1"} Same shape (still a placeholder by design until BL-018).
inference_local Returns the synthetic response with model: nous-local-mock. Same shape, plus latency / energy / capacity metrics from the inference subsystem (server.py:543-565).

The live server is missing seven of the seventeen MCP tools the repository registers: thermal_status, compute_status, storage_status, comms_status, position_status, sensors_status, biometrics_status, and inference_status. The advertised instructions block also lists only the v0.1 tool roster.

4.4 Diagnosis: deployment drift

The cause is the deployment loop. deploy/auto-update.sh:28-30 fetches origin/main and fast-forwards. The most recent commit on origin/main is 09728d3 ("Merge pull request #18 from rmednitzer/claude/auto-update-main"). The local branch has 02f2062 (Merge PR #37) at the head; git log origin/main..HEAD contains forty-nine commits including PRs #29 through #37 (the L1 subsystem rollout). Those merges never landed on main. The live VM is therefore correctly tracking main; the bug is upstream of the deployer.

This is the deployment-side restatement of the architecture-vs-code drift the 2026-05-21 review called out: documentation describes the L1 surface, the L1 surface exists in code, but the live deployment is still v0.1.

Recommendation: open and merge a single squash PR (claude/main-catchup or similar) that brings main up to 02f2062. Watch the nous-auto-update.timer log on the live host (journalctl -u nous-auto-update.service --since "10 minutes ago") to confirm the pull and the post-restart systemctl is-active check. Then re-run the live MCP audit checklist in skills/nous-getting-started.md to confirm the seventeen-tool surface lands. After the live MCP catches up, investigate the audit.degraded: true reading and remediate before the next ADR or release commit lands.

4.5 Other live-MCP observations

The OAuth issuer is reachable through the claude.ai connector (the session authenticated successfully). No introspection endpoint is exposed by FastMCP, so I could not verify the access-token TTL or the single-client lockdown state remotely; this needs a server-side check from the operator (ls -la $NOUS_HOME/auth/clients.json and cat $NOUS_HOME/auth/tokens.json | jq 'keys | length').

The interop_formats adapter list matches the repository, so the adapter discovery surface is at parity even though the encoding surface (interop_encode / interop_decode) is not yet wired into the server.

5. New findings (introduced or first noticed in this audit)

N1 (High). Deployment drift between main and the development line

Covered in detail in §4.4. The local branch is forty-nine commits ahead of origin/main; the live VM serves origin/main. The audit discipline says STATUS.md describes the deployed device; today it does not.

N2 (High). Live audit sink is degraded

device_info on the live MCP reports audit.degraded: true. The audit invariants (ADR-0002, LIMITATIONS.md audit section) assume the JSONL sink is authoritative. A degraded sink means the live host is currently in an unaudited state for every tool call served by the live MCP. Triage path: SSH to the host, check /var/log/nous/audit.jsonl ownership and mode, check the systemd unit's ReadWritePaths, check whether the nous user can open(..., O_APPEND | O_CREAT) the path. The audit.py constructor falls back to stderr on OSError, so the cause is almost certainly a filesystem permission denial rather than a logic bug.

N3 (Medium). state_get returns a less informative payload than state_history

state_get returns {"mode": "boot", "tick": 0}. The richer payload (device_health) includes operator_state, comms_state, profile, scenario. For consistency, either deprecate state_get in favour of device_health (they overlap) or extend it with the FSM refusal counter from StateMachine.refusals() so a controller has a one-shot path to ask "what refused, why".

N4 (Medium). No engine.snapshot() parity test against device_health

The snapshot keys (engine.py:239-306) and the device_health MCP tool payload (server.py:158-164) drift each time a subsystem lands; the recent biometrics merge added biometrics: {...} to the snapshot but the snapshot dict was updated by hand. A snapshot/schema parity test would catch the next contributor who forgets to add a field. One small test under tests/unit/test_engine_restart.py could do it.

N5 (Low). The _INSTRUCTIONS advertisement is stale

src/nous/server.py:611-626 advertises device_info / device_health / state_get / state_history / power_status / apu_status / thermal_status / compute_status / comms_state / self_model_assess / self_estimator_status / inference_local / interop_formats. The server actually registers storage_status, comms_status, position_status, sensors_status, biometrics_status, and inference_status too. The instructions= field FastMCP advertises to a controller therefore under-promises six tools. Add them, or generate the list from the registration loop.

N6 (Low). engine._load_profile still has a silent fallback

src/nous/engine.py:309-317. A missing or non-dict profile silently returns {"name": name, "source": "default-fallback"}. The 2026-05-21 review flagged this and the recommendation still stands: fail fast in non-test mode, expose a strict_profile_load setting toggle, default to true in production.

N7 (Low). MISB KLV decode emits hex strings, not typed values

src/nous/interop/misb_klv.py:101. The decoder returns {"items": {k: v.hex() for k, v in items.items()}}. For round-trip parity with the encoder's stringified UTF-8 values (line 65), the decoder could attempt UTF-8 decode and fall back to hex on UnicodeDecodeError. Today a encode -> decode round trip is lossy: the encoder writes b"foo" and the decoder yields "666f6f". Documented in docs/conformance/misb-klv.md (not re-verified) or fix it.

6. Quality gates this run

Gate Result
make lint (ruff) passes
make typecheck (mypy strict, src only) passes, 58 source files, 0 issues
make test (pytest) 345 passed in 8.26s
make docs-build (mkdocs strict) builds; eight markdown files exist under docs/ but are not in the nav (docs/review-2026-05-21.md, ADR 0018, seven showcase scenarios). Mermaid plugin warns about the MkDocs 2.0 contribution-model change, informational only.
Em-dash grep grep -rPn '\x{2014}' --include='*.md' . clean
Live MCP smoke (claude.ai connector) seventeen-tool surface advertised; eleven tools reachable on the live host (see §4.3)
  1. N1 catch-up PR. Merge the development line to main; let the auto-update timer pick it up; verify the seventeen-tool surface lands on the live host.
  2. N2 audit sink triage. Restore audit JSONL writes on the live VM. Document the failure mode and root cause in SECURITY.md.
  3. C3 FastMCP lifespan tick task. One PR, no ADR (this fits the "tool wiring" low-blast-radius zone of CLAUDE.md). Land before the catch-up PR if possible, or in the same merge train.
  4. C2 recursive redaction + nested-key test. Required before BL-014 (scenario YAML loader) lands.
  5. C6 policy greps in CI. Two-line script + workflow job.
  6. H1 runner unit tests. tests/unit/test_runner.py covering the denial path's denied=True audit record, the four tiers, the three policy modes, and the truncation behaviour.
  7. H6 OAuth lock + parent fsync + 0600 chmod on state files.
  8. H7 refresh-token family revocation.
  9. H8 auto-update rollback record + kill-switch documentation.
  10. M1 runner denial exit_code=1.
  11. N3-N7 cleanup.
  12. M10 profile JSON-schema validation at load time.
  13. H2 mypy strict for the test tree.
  14. H9 profile inline placeholder comments.

Items 1 through 3 are the live-VM remediation pack. Items 4 through 6 close the remaining baseline-critical findings. Items 7 through 10 close the remaining baseline-high findings. The rest is opportunistic.

8. Out of scope

Items the 2026-05-20 baseline explicitly excluded remain out of scope here: stub adapters with note: lands with BL-NNN, mesh / DTN absence (L7, L12), parametric biometrics by design (L6), absence of a real local model (L9), FSM raising on unknown trigger (ADR-0004 design choice), audit.write swallowing internal exceptions (correctness requirement, not a bug). The 2026-05-21 in-depth review recommendations on assurance depth, calibration harnesses, and EU-sovereignty-oriented architecture options are tracked in docs/review-2026-05-21.md and not re-litigated here.

9. Cross-references

10. Re-audit at HEAD 43d0db2 (post PR #40 / #41 / #42)

Conducted later on 2026-05-23, after the catch-up PR (#38) brought origin/main up to the L1 development line and three remediation PRs landed: PR #40 (FastMCP lifespan), PR #41 (CI policy greps), PR #42 (tick overrun + try/finally). Quality gates at re-audit time: ruff clean, mypy strict clean across 58 source files, 351 pytest tests pass (up from 345), make policy passes, mkdocs build --strict builds clean.

10.1 Findings that closed since §2

Id Closed by Evidence
N1 Deployment drift PR #38 (84d2e06 docs sweep + the catch-up merge train) git log origin/main..HEAD is empty; origin/main matches HEAD at 43d0db2. The L1 surface is now on main and the auto-update timer will land it on the live VM on the next poll.
C3 Engine starts but the FastMCP server never ticks it PR #40 (9a9cb85) + PR #42 (e8c2c3d, 4762496) src/nous/server.py:59-82 introduces tick_lifespan, an async context manager that spawns tick_loop(engine, tick_hz, stop) in an anyio.create_task_group() and calls engine.stop() in a finally so a tick-task crash still surrenders the engine cleanly. build_server wires the lifespan into FastMCP at server.py:90-93. Covered by tests/integration/test_server_lifespan.py (6 cases including a sustained-overrun cancellation guard and a tick-crash shutdown guard).
C3 follow-up Overrun starvation PR #42 (e8c2c3d) src/nous/tick.py:32-33 calls await anyio.lowlevel.checkpoint() after the overrun counter bump, so the loop yields control even when every tick exceeds its budget; the cancellation regression test is test_tick_loop_yields_on_sustained_overrun.
C6 CI does not enforce the documented policy greps PR #41 (4c799f8, f282618, 7730363) scripts/policy_checks.sh runs the em-dash and (placeholder) private-repo greps; Makefile adds the policy target; .github/workflows/ci.yml adds a policy job that calls make policy. The script forces LC_ALL=C.UTF-8 so grep -P '\x{2014}' compiles under any contributor locale, treats grep exit 2 as a policy failure (not a silent pass), and excludes itself from the private-repo scan so a future deny-list entry does not match its own declaration. Verified end-to-end under LC_ALL=C, LC_ALL=POSIX, and the default locale; injected em-dash still caught.

10.2 Findings that remain open (re-verified against current code)

Most of §3 carries forward. Quick status table:

Id Status Evidence at re-audit
C2 Flat redaction Open src/nous/audit.py redact() still walks only the top-level keys; nested mappings bypass masking. The unit test in tests/unit/test_audit.py still exercises a flat map.
N2 Audit sink degraded on the live VM Open (needs server-side action) Documented in SECURITY.md lines 84-104 as a kill-switch / triage path; the live-VM remediation is out of scope for this re-audit run.
H1 Spine module test coverage Partial tests/unit/ carries 32 modules including test_anthropic_client.py, test_state_machine_guards.py, test_policy.py, test_policy_fuzz.py; test_runner.py still missing.
H2 mypy strict excludes the test tree Open pyproject.toml still scopes mypy to ["src/nous"].
H6 OAuth lock + parent fsync + 0600 chmod Open src/nous/auth/oauth.py _Store.save() uses tmp.replace() with no asyncio.Lock, no parent-dir fsync, no chmod(0o600) after write.
H7 Refresh-token family revocation Open exchange_refresh_token() still pops the consumed token and mints a fresh pair; no issue_id chain, no family-revocation on reuse.
H8 Auto-update rollback record Open deploy/auto-update.sh still has no known-good capture file and no auto-update-rollback.sh companion.
M1 Runner denial path omits exit_code Open src/nous/runner.py denial branch still leaves exit_code=None on the denied=True audit record.
M10 Profile JSON-schema validation Open engine._load_profile still falls back to {"name": name, "source": "default-fallback"} with no schema validation step.
H9 Profile constants without inline citations Open profiles/jetson-agx-orin.yaml inference_local.tok_per_s_p50 and energy_j_per_tok still uncited.
N3 state_get payload Open src/nous/server.py state_get still returns {"mode": ..., "tick": ...} only.
N4 snapshot/MCP parity test Open No test asserts engine.snapshot() keys match device_health output structure.
N5 _INSTRUCTIONS advertisement stale Open The instructions block still lists 13 of 19 registered tools; six per-subsystem reads (storage_status, comms_status, position_status, sensors_status, biometrics_status, inference_status) are not advertised.
N6 Silent profile fallback Open No strict_profile_load setting; the fallback is still silent.
N7 MISB KLV decode hex strings Open src/nous/interop/misb_klv.py decoder still returns hex-only; round-trip parity with the encoder is still lossy.

10.3 Revised remediation order

The live-VM remediation pack is partially complete. Updated order:

  1. ~~N1 catch-up PR.~~ Done (PR #38 + the catch-up train; origin/main matches HEAD).
  2. N2 audit sink triage. Restore audit JSONL writes on the live VM. Document the failure mode and root cause in SECURITY.md. Still the next live-VM action item.
  3. ~~C3 FastMCP lifespan tick task.~~ Done (PRs #40 + #42).
  4. C2 recursive redaction + nested-key test. Required before BL-014 (scenario YAML loader) lands.
  5. ~~C6 policy greps in CI.~~ Done (PR #41).
  6. H1 runner unit tests. tests/unit/test_runner.py covering the denial path's denied=True audit record, the four tiers, the three policy modes, and the truncation behaviour.
  7. H6 OAuth lock + parent fsync + 0600 chmod on state files.
  8. H7 refresh-token family revocation.
  9. H8 auto-update rollback record + kill-switch documentation.
  10. M1 runner denial exit_code=1.
  11. N3-N7 cleanup.
  12. M10 profile JSON-schema validation at load time.
  13. H2 mypy strict for the test tree.
  14. H9 profile inline placeholder comments.

Three of the original fourteen items are closed (N1, C3, C6). The next single action with the largest leverage is N2 (server-side), followed by C2 (required before BL-014 lands).

10.4 New (light) findings from this re-audit

These are opportunistic improvements rather than blockers; recording them here so they are not lost.

  • N8 (Low). docs/showcase/capability-matrix.md deployment note was stale. The note still referenced revision 02f2062 and the pre-catch-up drift. Updated in this PR to point at 43d0db2 and reference §10 of the audit.
  • N9 (Low). test_tick_lifespan_stops_engine_when_tick_task_crashes assertion is broader than its intent. The test asserts that some exception in the surfaced ExceptionGroup is the simulated RuntimeError; an additional unrelated exception (e.g. from a fixture teardown) would still satisfy the assertion. The test's primary subject (the engine landing on Mode.SHUTDOWN) is asserted separately, so the impact is bounded; tighten by asserting len(excinfo.value.exceptions) == 1 when the suite next touches that file (tests/integration/test_server_lifespan.py:115-128).
  • N10 (Low). AGENTS.md "Boundaries" and LIMITATIONS.md L17 did not yet reference scripts/policy_checks.sh. Both surfaces talked about the standalone-repo rule without naming the enforcement seam. Updated in this PR.

10.5 Quality gates this run

Gate Result
make lint (ruff) passes
make typecheck (mypy strict, src only) passes, 58 source files, 0 issues
make test (pytest) 351 passed in 9.43s (up from 345; +6 covering tick_lifespan, sustained-overrun cancellation, and tick-crash shutdown)
make policy (em-dash + private-repo greps) passes; locale-stable under LC_ALL=C, POSIX, and default
make docs-build (mkdocs strict) builds clean
Live MCP smoke Not re-run this round; expected to land on the next auto-update cycle now that main carries the L1 surface.