AUDIT 2026-05-23¶
Full system audit of nous covering the local repository and the live
nous-mcp surface that claude.ai talks to. This is a delta audit against
the 2026-05-20 baseline in AUDIT.md
and the in-depth review in
docs/review-2026-05-21.md. It records
which baseline findings have closed, which remain open, and which new
defects this run uncovered.
Conducted: 2026-05-23.
Branch audited: claude/optimistic-gauss-5jpfI (49 commits ahead of
origin/main).
Source revision: 02f2062 (Merge PR #37, BL-011 biometrics subsystem).
Live MCP audited: nous-mcp over the configured claude.ai connector
(see §4 below).
1. Executive summary¶
The local repository has closed the majority of the baseline audit's critical and high findings. Quality gates are green (ruff clean, mypy strict clean across 58 source files, 345 pytest tests pass, mkdocs strict build succeeds). The L1 subsystem suite that the baseline called out as "wired but stub-flavoured" is now implemented end to end: ten subsystems carry physics, nine estimators run live, the FSM has SC-2 thermal and low-power guards, the interop adapters emit standards-shaped output, and the daily-cap fsync race is closed.
The single most material finding from this audit run is deployment
drift. The local branch is forty-nine commits ahead of origin/main,
and the auto-update timer on the live VM tracks origin/main. The
result, which I confirmed directly through the live MCP connector, is
that the live nous-mcp exposes the v0.1 stub surface (eleven
representative tools, all subsystem reads stubbed to null, audit
sink reported as degraded, engine never ticked past boot) while the
repository under src/nous/ carries the full L1 surface. Anyone
reading STATUS.md and then querying the live MCP will see a different
device than the one the docs describe. This is the deployment-side
twin of the architecture-vs-implementation drift the 2026-05-21 review
flagged, and it is the highest-leverage thing to fix this week.
Three baseline-critical findings remain open in code: audit.py
redaction is still single-level (C2), the FastMCP server still has no
lifespan task that drives engine.tick() (C3), and CI still does not
enforce the documented em-dash and private-repo greps (C6). Two high
findings remain in auth/oauth.py: there is no inter-handler lock
around the read-modify-write cycle (H6), and refresh-token rotation
still revokes one token at a time rather than the whole family on
detected reuse (H7). The auto-update script asserts post-restart
service health but still has no commit-SHA rollback record (H8).
Otherwise the repository is in noticeably better shape than the baseline. The MISB KLV adapter (C4), CoT adapter (H3), NMEA adapter (H5), SensorThings adapter (H4), the audit chmod 0600 (part of H6), and the systemd hardening (M4) are all closed. The estimator stubs that the baseline flagged for returning misleading covariance (C5) are gone: thermal, compute, storage, sensors, and biometrics now run Kalman filters with covariance that actually shrinks under observation. The anthropic_client daily-cap fsync race (C1) is closed and covered by unit tests that exercise the cap exhaustion, UTC rollover, corrupted state, and concurrent locking paths.
2. What closed since the 2026-05-20 baseline¶
| Baseline id | Module | Status | Evidence |
|---|---|---|---|
| C1 | anthropic_client.py |
Closed | CallCap.increment now fh.flush() + os.fsync() while still inside the flock, and a fsync failure raises CapExhausted (src/nous/anthropic_client.py:100-110). Covered by tests/unit/test_anthropic_client.py (9 tests, includes multiprocessing flock test). |
| C4 | interop/misb_klv.py |
Closed | Full BER short and long form encoding with explicit overflow refusal (misb_klv.py:108-128). Key range [1, 255] validated; values > max_value_len raise rather than truncate (misb_klv.py:61-71). |
| C5 (thermal) | estimators/thermal.py |
Closed | Two-state Kalman with shrinking covariance (estimators/thermal.py). Covered by tests/unit/test_thermal_estimator.py. |
| C5 (compute) | estimators/compute.py |
Closed | Two-state Kalman over (load_pct, draw_w). Covered by tests/unit/test_compute_estimator.py. |
| C5 (storage / sensors / biometrics) | estimators/*.py |
Closed | Each ships a live Kalman with bounds validation and a rejected_updates counter (estimators/storage.py, sensors.py, biometrics.py). |
| C5 (self_model zeros) | self_model/*.py |
Partially closed | The placeholder is still typed-stub (self_model/assess.py:29-44) but documented as [planned] in STATUS.md. Returns capabilities dict rather than misleading numeric zeros. |
| H3 | interop/cot.py |
Closed | time, start, stale, how="m-g" written on every event (cot.py:67-78). XXE-safe parser refuses DOCTYPE / ENTITY declarations. |
| H4 | interop/sensorthings.py |
Closed | phenomenonTime normalised to UTC ISO-8601 with Z suffix (sensorthings.py:41-45). max_payload_len cap added. |
| H5 | interop/nmea0183.py |
Closed | Full 14-field GGA sentence, XOR checksum, ASCII enforcement, lat/lon range checks (nmea0183.py:38-64). |
| H9 | profiles/jetson-agx-orin.yaml |
Open (still uncited) | inference_local.tok_per_s_p50: 200 and energy_j_per_tok: 0.12 still carry no inline placeholder comment. |
| M2 | deploy/Caddyfile.example |
Not re-verified | Out of audit scope this round. |
| M4 | deploy/systemd/nous.service |
Closed | ProtectClock=true, ProcSubset=pid, SystemCallFilter=@system-service with negated set, MemoryDenyWriteExecute=true, RestrictAddressFamilies (nous.service:40-54). |
| Audit chmod 0600 (part of H6) | audit.py |
Closed | target.chmod(0o600) applied after open (audit.py:188-189). The file handler is a WatchedFileHandler with explicit fsync after every emit (_FsyncingFileHandler.emit). |
The L1 subsystem suite (BL-003 power, BL-005 thermal, BL-005a APU,
BL-007 compute, BL-008 storage, BL-009 sensors, BL-010 position, BL-011
biometrics, BL-012 comms, BL-013 inference) all moved from planned or
stub to in-progress since the baseline, with matching estimators,
matching MCP tools (tier T0), and matching integration tests under
tests/integration/test_*_through_engine.py and
tests/integration/test_*_drives_*.py.
3. Open findings carried from the baseline¶
C2. Argument redaction is still flat¶
src/nous/audit.py:52-68. redact() walks only the top-level keys of
the argument mapping. A caller that passes
{"context": {"headers": {"Authorization": "Bearer ..."}}} would still
write the secret verbatim. The current MCP tool surface only takes
top-level scalar arguments, so the blast radius is theoretical today,
but the redaction allowlist promises depth that the implementation
does not deliver, and the only redaction test in
tests/unit/test_audit.py:12-18 exercises a flat mapping.
Recommendation: implement the recursive walk() from the baseline,
add a nested-key test, and lift the test to cover lists of dicts too.
The fix is required before any tool surface accepts structured
payloads (BL-014 scenario loader is the next surface that will).
C3. Engine starts but the FastMCP server still never ticks it¶
src/nous/server.py:48 calls self.engine.start() once in Nous.__init__
but no FastMCP lifespan hook schedules tick_loop. The live MCP
confirms the symptom directly: device_health returns
tick=0, ts_s=0.0, mode="boot", and state_history returns the single
stowed -> boot transition that fires at construction. A controller
calling any of the new *_status tools against the live server reads
boot-time truth values, not anything that has evolved through the
physics. The CLI tick subcommand can drive engine.tick() manually,
but the public HTTP server cannot.
Recommendation: add a FastMCP lifespan async context manager that
creates an asyncio.Task running tick_loop(engine, hz=tick_hz) and
cancels it on shutdown, calling engine.stop() on the way out. Wire
the cancellation through anyio so SIGTERM under systemd lands
cleanly. This is the single change that flips every L1 subsystem
read from "boot snapshot" to "live state" on the deployed server.
C6. CI still does not enforce the documented policy greps¶
.github/workflows/ci.yml:38-39 runs make check and nothing else.
scripts/ has gen_*.py but no policy_checks.sh. CLAUDE.md still
states "the CI grep checks both" (em-dashes and private-repo
references). The repository continues to pass by authorial discipline
only.
Recommendation: write scripts/policy_checks.sh with two greps
(! grep -rPn '\x{2014}' --include='*.md' . and the private-repo
allowlist) and add a policy workflow job. Land alongside an explicit
file in scripts/README.md so the contract is documented in code,
not in CLAUDE.md.
H1. Spine module test coverage is partly remediated¶
tests/unit/test_anthropic_client.py lands with 9 cases. tests/unit/test_state_machine_guards.py
lands with 8 cases. tests/unit/test_policy.py and tests/unit/test_policy_fuzz.py
both exist. The remaining gap is runner.py: there is no
tests/unit/test_runner.py. The runner is the audited execution
wrapper that every tool call passes through; the four tiers, three
policy modes, denial-path audit record, exception-to-body mapping, and
truncation behaviour are exercised only through integration tests.
Recommendation: write tests/unit/test_runner.py parametrised over the
four tiers and three policy modes; assert denied=True audit lines on
the denial path; assert exception.__class__.__name__ ends up in the
body envelope; assert truncation kicks in at max_output. Roughly
two hours of work.
H2. mypy strict still excludes the test tree¶
pyproject.toml sets files = ["src/nous"] for mypy. The test tree
has grown to 38 files and would benefit from strict typing; any
fixture decorator drift will silently slip past make check.
H6. OAuth file store still has no inter-handler lock¶
src/nous/auth/oauth.py has no asyncio.Lock. _Store.load() ...
_Store.save() sequences in register_client,
exchange_authorization_code, _issue, load_access_token, and
exchange_refresh_token race under concurrent FastMCP requests. The
single-client lockdown configuration limits the practical impact, but
the read-modify-write cycle is still wrong on its merits. The
directory chmod is correct (the install script sets 0750 on the
state directory) but _Store.save() writes via tmp.replace without
fsyncing the parent directory, so a hard reboot can lose an
already-rotated refresh token.
Recommendation: an asyncio.Lock field on FileOAuthProvider wrapping
every load/save sequence, plus an os.fsync on the parent directory
fd after Path.replace. A chmod(0o600) after tmp.replace would
also bring the state files to parity with the audit log.
H7. Refresh-token rotation still has no family revocation¶
src/nous/auth/oauth.py:247-257. exchange_refresh_token() deletes
the consumed refresh token and mints a fresh pair. OAuth 2.1 BCP §4.13
requires that on detected reuse of a rotated refresh token, the entire
chain be revoked. The current implementation has no chain id, so a
captured-then-replayed refresh token continues to mint access tokens
silently in parallel with the rightful client.
Recommendation: add an issue_id field set at first issuance and
copied across each rotation. On reuse or collision wipe every token
record carrying that id. Add an integration test under
tests/integration/test_oauth_rotation.py.
H8. Auto-update has post-restart health check but no rollback record¶
deploy/auto-update.sh:41-56 does git reset --hard "${REMOTE}",
runs install.sh, restarts nous.service, and asserts
systemctl is-active. That closes the worst-case "broken service
never noticed" gap from the baseline, but the script still does not
record the prior known-good commit SHA, so a controller wanting to
roll the live host back to the prior commit needs to read the
auto-update log and manually git reset --hard <prev>. The systemd
unit also has no hardening even though it runs as root.
Recommendation: capture LOCAL to
/var/lib/nous/auto-update.last_ok only on success; add a
auto-update-rollback.sh companion that reads that file and reverses
the update; document the kill switch
(systemctl disable --now nous-auto-update.timer) in SECURITY.md.
M1. Runner denial path still omits exit_code¶
src/nous/runner.py:54-69. The denial branch writes
denied=True, decision_reason=... but leaves exit_code=None.
The audit record has an exit_code: int | None = None field but it
is never populated on the denial path. Setting it to 1 would let
forensic queries count denials per tier per day without parsing the
body string.
M7, M8, M9, M10, L1, L2, L4, L5, L6, L7, L8, L9, L10¶
Not re-verified this round; the 2026-05-20 baseline remains
authoritative. The largest remaining item is M10 (no JSON-schema
validation at profile load time): now that ten subsystems consume
profile keys, a misspelled key still silently degrades to a default.
Wire scripts/gen_schemas.py output into engine._load_profile and
fail fast on validation error before the next profile schema change.
4. Live MCP audit¶
I exercised every advertised tool on the configured nous-mcp claude.ai
connector. Outputs below are verbatim modulo formatting.
4.1 Tool surface and posture¶
device_info reports:
version: 0.1.0
profile: jetson-agx-orin
transport: http
policy: open
tick_hz: 2.0
audit.path: /var/log/nous/audit.jsonl
audit.degraded: true
Findings:
-
audit.degradedis true. The audit sink cannot write to/var/log/nous/audit.jsonlon the live host. The path is set bySettings.resolved_audit_path()and the install script creates/var/log/nous/owned by thenoususer; the most likely cause is that the deployed binary is running under a user that does not own the directory, or that systemd'sProtectSystem=strictplus a staleReadWritePathsis denying the write. Either way the live server is logging every tool call to stderr only, with no append-only fsync guarantee. This is a regression from the audit invariants documented in ADR-0002 andLIMITATIONS.md. -
policy: openis the default. The server admits every tool regardless of tier. Acceptable in a single-tenant claude.ai integration but worth pinning explicitly inSECURITY.mdso the posture is documented rather than implicit. -
tick_hz: 2.0is set but the engine is not actually ticking (see 4.2).
4.2 Engine state and tick activity¶
device_health returns tick=0, ts_s=0.0, mode="boot",
operator_state="nominal", comms_state="connected".
state_history(limit=256) returns one entry: stowed -> boot.
state_get returns mode="boot", tick=0.
This is the live confirmation of baseline finding C3. The engine has been booted but no tick has ever run. Every subsystem read served by the live server returns the boot snapshot, not evolving physics.
4.3 Subsystem reads return v0.1 stubs¶
| Tool | Live response | Repository response |
|---|---|---|
power_status |
{"soc_pct": null, "draw_w": null, "endurance_min_p50": null, "note": "power subsystem ships as a typed stub in v0.1"} |
Full SoC, voltage, current, load, charge offered/accepted, endurance, plus Kalman estimate (server.py:192-223). |
apu_status |
{"solar_w": null, "fuelcell_w": null, "fuelcell_fuel_pct": null, "note": "APU subsystem ships as a typed stub in v0.1"} |
Per-source watts (solar, fuelcell, vehicle, USB-C), fuel mass, connected flags, estimator (server.py:225-252). |
comms_state |
{"state": "connected", "links": [], "note": "links emit through the comms estimator in L1"} |
Per-link beliefs, derive_state label + reason (server.py:348-363). |
self_estimator_status |
{"estimators": [], "note": "estimator framework lands in L1"} |
Nine estimators with point + covariance + ts (server.py:511-541). |
interop_formats |
Lists six adapters with the v0.1 stub note. | Same shape (matches). |
self_model_assess |
{"capabilities": {}, "note": "self-model layer lands in L1"} |
Same shape (still a placeholder by design until BL-018). |
inference_local |
Returns the synthetic response with model: nous-local-mock. |
Same shape, plus latency / energy / capacity metrics from the inference subsystem (server.py:543-565). |
The live server is missing seven of the seventeen MCP tools the
repository registers: thermal_status, compute_status,
storage_status, comms_status, position_status, sensors_status,
biometrics_status, and inference_status. The advertised
instructions block also lists only the v0.1 tool roster.
4.4 Diagnosis: deployment drift¶
The cause is the deployment loop. deploy/auto-update.sh:28-30
fetches origin/main and fast-forwards. The most recent commit on
origin/main is 09728d3 ("Merge pull request #18 from
rmednitzer/claude/auto-update-main"). The local branch has
02f2062 (Merge PR #37) at the head; git log origin/main..HEAD
contains forty-nine commits including PRs #29 through #37 (the L1
subsystem rollout). Those merges never landed on main. The live VM
is therefore correctly tracking main; the bug is upstream of the
deployer.
This is the deployment-side restatement of the architecture-vs-code drift the 2026-05-21 review called out: documentation describes the L1 surface, the L1 surface exists in code, but the live deployment is still v0.1.
Recommendation: open and merge a single squash PR (claude/main-catchup
or similar) that brings main up to 02f2062. Watch the
nous-auto-update.timer log on the live host (journalctl -u
nous-auto-update.service --since "10 minutes ago") to confirm the
pull and the post-restart systemctl is-active check. Then re-run the
live MCP audit checklist in skills/nous-getting-started.md to confirm
the seventeen-tool surface lands. After the live MCP catches up,
investigate the audit.degraded: true reading and remediate before
the next ADR or release commit lands.
4.5 Other live-MCP observations¶
The OAuth issuer is reachable through the claude.ai connector (the
session authenticated successfully). No introspection endpoint is
exposed by FastMCP, so I could not verify the access-token TTL or
the single-client lockdown state remotely; this needs a server-side
check from the operator (ls -la $NOUS_HOME/auth/clients.json and
cat $NOUS_HOME/auth/tokens.json | jq 'keys | length').
The interop_formats adapter list matches the repository, so the
adapter discovery surface is at parity even though the encoding
surface (interop_encode / interop_decode) is not yet wired into
the server.
5. New findings (introduced or first noticed in this audit)¶
N1 (High). Deployment drift between main and the development line¶
Covered in detail in §4.4. The local branch is forty-nine commits
ahead of origin/main; the live VM serves origin/main. The audit
discipline says STATUS.md describes the deployed device; today it
does not.
N2 (High). Live audit sink is degraded¶
device_info on the live MCP reports audit.degraded: true. The
audit invariants (ADR-0002, LIMITATIONS.md audit section) assume
the JSONL sink is authoritative. A degraded sink means the live host
is currently in an unaudited state for every tool call served by the
live MCP. Triage path: SSH to the host, check
/var/log/nous/audit.jsonl ownership and mode, check the systemd
unit's ReadWritePaths, check whether the nous user can open(...,
O_APPEND | O_CREAT) the path. The audit.py constructor falls back
to stderr on OSError, so the cause is almost certainly a filesystem
permission denial rather than a logic bug.
N3 (Medium). state_get returns a less informative payload than state_history¶
state_get returns {"mode": "boot", "tick": 0}. The richer payload
(device_health) includes operator_state, comms_state, profile,
scenario. For consistency, either deprecate state_get in favour of
device_health (they overlap) or extend it with the FSM refusal
counter from StateMachine.refusals() so a controller has a one-shot
path to ask "what refused, why".
N4 (Medium). No engine.snapshot() parity test against device_health¶
The snapshot keys (engine.py:239-306) and the device_health MCP
tool payload (server.py:158-164) drift each time a subsystem lands;
the recent biometrics merge added biometrics: {...} to the
snapshot but the snapshot dict was updated by hand. A snapshot/schema
parity test would catch the next contributor who forgets to add a
field. One small test under tests/unit/test_engine_restart.py could
do it.
N5 (Low). The _INSTRUCTIONS advertisement is stale¶
src/nous/server.py:611-626 advertises device_info / device_health
/ state_get / state_history / power_status / apu_status /
thermal_status / compute_status / comms_state / self_model_assess /
self_estimator_status / inference_local / interop_formats. The
server actually registers storage_status, comms_status,
position_status, sensors_status, biometrics_status, and
inference_status too. The instructions= field FastMCP advertises
to a controller therefore under-promises six tools. Add them, or
generate the list from the registration loop.
N6 (Low). engine._load_profile still has a silent fallback¶
src/nous/engine.py:309-317. A missing or non-dict profile silently
returns {"name": name, "source": "default-fallback"}. The 2026-05-21
review flagged this and the recommendation still stands: fail fast in
non-test mode, expose a strict_profile_load setting toggle, default
to true in production.
N7 (Low). MISB KLV decode emits hex strings, not typed values¶
src/nous/interop/misb_klv.py:101. The decoder returns
{"items": {k: v.hex() for k, v in items.items()}}. For round-trip
parity with the encoder's stringified UTF-8 values (line 65), the
decoder could attempt UTF-8 decode and fall back to hex on
UnicodeDecodeError. Today a encode -> decode round trip is
lossy: the encoder writes b"foo" and the decoder yields "666f6f".
Documented in docs/conformance/misb-klv.md (not re-verified) or
fix it.
6. Quality gates this run¶
| Gate | Result |
|---|---|
make lint (ruff) |
passes |
make typecheck (mypy strict, src only) |
passes, 58 source files, 0 issues |
make test (pytest) |
345 passed in 8.26s |
make docs-build (mkdocs strict) |
builds; eight markdown files exist under docs/ but are not in the nav (docs/review-2026-05-21.md, ADR 0018, seven showcase scenarios). Mermaid plugin warns about the MkDocs 2.0 contribution-model change, informational only. |
Em-dash grep grep -rPn '\x{2014}' --include='*.md' . |
clean |
| Live MCP smoke (claude.ai connector) | seventeen-tool surface advertised; eleven tools reachable on the live host (see §4.3) |
7. Recommended remediation order¶
- N1 catch-up PR. Merge the development line to
main; let the auto-update timer pick it up; verify the seventeen-tool surface lands on the live host. - N2 audit sink triage. Restore audit JSONL writes on the live
VM. Document the failure mode and root cause in
SECURITY.md. - C3 FastMCP lifespan tick task. One PR, no ADR (this fits the
"tool wiring" low-blast-radius zone of
CLAUDE.md). Land before the catch-up PR if possible, or in the same merge train. - C2 recursive redaction + nested-key test. Required before BL-014 (scenario YAML loader) lands.
- C6 policy greps in CI. Two-line script + workflow job.
- H1 runner unit tests.
tests/unit/test_runner.pycovering the denial path'sdenied=Trueaudit record, the four tiers, the three policy modes, and the truncation behaviour. - H6 OAuth lock + parent fsync + 0600 chmod on state files.
- H7 refresh-token family revocation.
- H8 auto-update rollback record + kill-switch documentation.
- M1 runner denial
exit_code=1. - N3-N7 cleanup.
- M10 profile JSON-schema validation at load time.
- H2 mypy strict for the test tree.
- H9 profile inline placeholder comments.
Items 1 through 3 are the live-VM remediation pack. Items 4 through 6 close the remaining baseline-critical findings. Items 7 through 10 close the remaining baseline-high findings. The rest is opportunistic.
8. Out of scope¶
Items the 2026-05-20 baseline explicitly excluded remain out of scope
here: stub adapters with note: lands with BL-NNN, mesh / DTN
absence (L7, L12), parametric biometrics by design (L6), absence of a
real local model (L9), FSM raising on unknown trigger (ADR-0004
design choice), audit.write swallowing internal exceptions
(correctness requirement, not a bug). The 2026-05-21 in-depth review
recommendations on assurance depth, calibration harnesses, and
EU-sovereignty-oriented architecture options are tracked in
docs/review-2026-05-21.md and not re-litigated here.
9. Cross-references¶
- 2026-05-20 baseline:
AUDIT.md - 2026-05-21 in-depth review:
docs/review-2026-05-21.md - Phase and per-document maturity:
STATUS.md - Scope boundaries:
LIMITATIONS.md - Boundaries / high blast radius:
CLAUDE.md, §"Risk posture" - Decision records:
docs/adr/ - Backlog tracker:
docs/backlog.md
10. Re-audit at HEAD 43d0db2 (post PR #40 / #41 / #42)¶
Conducted later on 2026-05-23, after the catch-up PR (#38) brought
origin/main up to the L1 development line and three remediation PRs
landed: PR #40 (FastMCP lifespan), PR #41 (CI policy greps), PR #42
(tick overrun + try/finally). Quality gates at re-audit time: ruff
clean, mypy strict clean across 58 source files, 351 pytest tests
pass (up from 345), make policy passes, mkdocs build --strict
builds clean.
10.1 Findings that closed since §2¶
| Id | Closed by | Evidence |
|---|---|---|
| N1 Deployment drift | PR #38 (84d2e06 docs sweep + the catch-up merge train) |
git log origin/main..HEAD is empty; origin/main matches HEAD at 43d0db2. The L1 surface is now on main and the auto-update timer will land it on the live VM on the next poll. |
| C3 Engine starts but the FastMCP server never ticks it | PR #40 (9a9cb85) + PR #42 (e8c2c3d, 4762496) |
src/nous/server.py:59-82 introduces tick_lifespan, an async context manager that spawns tick_loop(engine, tick_hz, stop) in an anyio.create_task_group() and calls engine.stop() in a finally so a tick-task crash still surrenders the engine cleanly. build_server wires the lifespan into FastMCP at server.py:90-93. Covered by tests/integration/test_server_lifespan.py (6 cases including a sustained-overrun cancellation guard and a tick-crash shutdown guard). |
| C3 follow-up Overrun starvation | PR #42 (e8c2c3d) |
src/nous/tick.py:32-33 calls await anyio.lowlevel.checkpoint() after the overrun counter bump, so the loop yields control even when every tick exceeds its budget; the cancellation regression test is test_tick_loop_yields_on_sustained_overrun. |
| C6 CI does not enforce the documented policy greps | PR #41 (4c799f8, f282618, 7730363) |
scripts/policy_checks.sh runs the em-dash and (placeholder) private-repo greps; Makefile adds the policy target; .github/workflows/ci.yml adds a policy job that calls make policy. The script forces LC_ALL=C.UTF-8 so grep -P '\x{2014}' compiles under any contributor locale, treats grep exit 2 as a policy failure (not a silent pass), and excludes itself from the private-repo scan so a future deny-list entry does not match its own declaration. Verified end-to-end under LC_ALL=C, LC_ALL=POSIX, and the default locale; injected em-dash still caught. |
10.2 Findings that remain open (re-verified against current code)¶
Most of §3 carries forward. Quick status table:
| Id | Status | Evidence at re-audit |
|---|---|---|
| C2 Flat redaction | Open | src/nous/audit.py redact() still walks only the top-level keys; nested mappings bypass masking. The unit test in tests/unit/test_audit.py still exercises a flat map. |
| N2 Audit sink degraded on the live VM | Open (needs server-side action) | Documented in SECURITY.md lines 84-104 as a kill-switch / triage path; the live-VM remediation is out of scope for this re-audit run. |
| H1 Spine module test coverage | Partial | tests/unit/ carries 32 modules including test_anthropic_client.py, test_state_machine_guards.py, test_policy.py, test_policy_fuzz.py; test_runner.py still missing. |
| H2 mypy strict excludes the test tree | Open | pyproject.toml still scopes mypy to ["src/nous"]. |
| H6 OAuth lock + parent fsync + 0600 chmod | Open | src/nous/auth/oauth.py _Store.save() uses tmp.replace() with no asyncio.Lock, no parent-dir fsync, no chmod(0o600) after write. |
| H7 Refresh-token family revocation | Open | exchange_refresh_token() still pops the consumed token and mints a fresh pair; no issue_id chain, no family-revocation on reuse. |
| H8 Auto-update rollback record | Open | deploy/auto-update.sh still has no known-good capture file and no auto-update-rollback.sh companion. |
M1 Runner denial path omits exit_code |
Open | src/nous/runner.py denial branch still leaves exit_code=None on the denied=True audit record. |
| M10 Profile JSON-schema validation | Open | engine._load_profile still falls back to {"name": name, "source": "default-fallback"} with no schema validation step. |
| H9 Profile constants without inline citations | Open | profiles/jetson-agx-orin.yaml inference_local.tok_per_s_p50 and energy_j_per_tok still uncited. |
N3 state_get payload |
Open | src/nous/server.py state_get still returns {"mode": ..., "tick": ...} only. |
| N4 snapshot/MCP parity test | Open | No test asserts engine.snapshot() keys match device_health output structure. |
N5 _INSTRUCTIONS advertisement stale |
Open | The instructions block still lists 13 of 19 registered tools; six per-subsystem reads (storage_status, comms_status, position_status, sensors_status, biometrics_status, inference_status) are not advertised. |
| N6 Silent profile fallback | Open | No strict_profile_load setting; the fallback is still silent. |
| N7 MISB KLV decode hex strings | Open | src/nous/interop/misb_klv.py decoder still returns hex-only; round-trip parity with the encoder is still lossy. |
10.3 Revised remediation order¶
The live-VM remediation pack is partially complete. Updated order:
- ~~N1 catch-up PR.~~ Done (PR #38 + the catch-up train;
origin/mainmatchesHEAD). - N2 audit sink triage. Restore audit JSONL writes on the live
VM. Document the failure mode and root cause in
SECURITY.md. Still the next live-VM action item. - ~~C3 FastMCP lifespan tick task.~~ Done (PRs #40 + #42).
- C2 recursive redaction + nested-key test. Required before BL-014 (scenario YAML loader) lands.
- ~~C6 policy greps in CI.~~ Done (PR #41).
- H1 runner unit tests.
tests/unit/test_runner.pycovering the denial path'sdenied=Trueaudit record, the four tiers, the three policy modes, and the truncation behaviour. - H6 OAuth lock + parent fsync + 0600 chmod on state files.
- H7 refresh-token family revocation.
- H8 auto-update rollback record + kill-switch documentation.
- M1 runner denial
exit_code=1. - N3-N7 cleanup.
- M10 profile JSON-schema validation at load time.
- H2 mypy strict for the test tree.
- H9 profile inline placeholder comments.
Three of the original fourteen items are closed (N1, C3, C6). The next single action with the largest leverage is N2 (server-side), followed by C2 (required before BL-014 lands).
10.4 New (light) findings from this re-audit¶
These are opportunistic improvements rather than blockers; recording them here so they are not lost.
- N8 (Low).
docs/showcase/capability-matrix.mddeployment note was stale. The note still referenced revision02f2062and the pre-catch-up drift. Updated in this PR to point at43d0db2and reference §10 of the audit. - N9 (Low).
test_tick_lifespan_stops_engine_when_tick_task_crashesassertion is broader than its intent. The test asserts that some exception in the surfacedExceptionGroupis the simulatedRuntimeError; an additional unrelated exception (e.g. from a fixture teardown) would still satisfy the assertion. The test's primary subject (the engine landing onMode.SHUTDOWN) is asserted separately, so the impact is bounded; tighten by assertinglen(excinfo.value.exceptions) == 1when the suite next touches that file (tests/integration/test_server_lifespan.py:115-128). - N10 (Low). AGENTS.md "Boundaries" and LIMITATIONS.md L17 did not
yet reference
scripts/policy_checks.sh. Both surfaces talked about the standalone-repo rule without naming the enforcement seam. Updated in this PR.
10.5 Quality gates this run¶
| Gate | Result |
|---|---|
make lint (ruff) |
passes |
make typecheck (mypy strict, src only) |
passes, 58 source files, 0 issues |
make test (pytest) |
351 passed in 9.43s (up from 345; +6 covering tick_lifespan, sustained-overrun cancellation, and tick-crash shutdown) |
make policy (em-dash + private-repo greps) |
passes; locale-stable under LC_ALL=C, POSIX, and default |
make docs-build (mkdocs strict) |
builds clean |
| Live MCP smoke | Not re-run this round; expected to land on the next auto-update cycle now that main carries the L1 surface. |