ADR-0012: Internal deep audit (2026-06) and remediation wave¶

Field	Value
Status	Accepted
Date	2026-06-07
Authors	Roman Mednitzer

Context¶

A deep internal audit read every source file, ran the gates, and added a dynamic adversarial harness that executed the real code against hostile inputs (the harness lived outside the repo; nothing was modified during the audit). Six parallel static reviews (execution, store, actuation, MCP surface, audit/skills, docs/deploy/CI) were cross-checked, and every Critical and High claim was either reproduced by execution or verified against the exact source.

The audit confirmed the architecture and the SQLite store are sound (append-only triggers block DELETE, value-mutation, actor-mutation, and a second active row; the hash chain detects edit, reorder, and head-truncation; the Merkle tree is RFC 6962 correct). It also found that several load-bearing security properties do not hold as claimed. The most serious, all reproduced:

The tamper-evidence layer is forgeable and not fail-closed: a checkpoint with tree_size=0 (plus a forgeable LocalStamper token over sha256(b"")) verifies as ok=True on a non-empty log, and malformed evidence raises instead of returning a negative result.
The production Postgres append-only trigger omits the identity columns the SQLite trigger guards, so fact_type or predicate is mutable on an active row, and both backends allow a t_invalid only mutation that supersedes a fact without an actor or reason.
Tier and deny classification holes: chmod -R 777 / classifies T0 and is not denied, and writes under /etc/ via cp/tee/ln/truncate classify T0.
The SSRF egress filter has numeric and trailing-dot bypasses and treats every DNS name as allowed, and its entry point has no callers.
Redaction misses space-separated credential flags (--password VALUE) and URL or DSN embedded credentials, and the stdio server returns exception text to the client without redaction.
The consent registry specified in ADR-0006 Decision 4 and presented as a control in SECURITY.md was never built (zero references in code).

Decision¶

Adopt the internal deep audit (static fan-out plus dynamic adversarial harness plus per-claim verification) as a recurring practice alongside the external audit cadence (ADR-0011). This ADR is the first internal wave.
Accept the findings as backlog items BL-037 to BL-061, each mapped to a security constraint or invariant and citing this ADR.
Remediate the reproduced P0 set and the reproduced P1 set in the change that accompanies this ADR (BL-037 to BL-045 resolved); the remainder stay open and tracked. Confirmed-correct controls are not reopened.
Correct the documentation that over-stated delivered guarantees. ADR-0006 is immutable, so its consent gap is recorded as an appended audit note there, not a rewrite.

Findings¶

Verification: R = reproduced by executing the code, V = verified against the exact source, S = static review (credible, not independently executed).

BL	Finding	Constraint	Sev	Verify	Status
037	`verify_evidence` accepts a `tree_size=0` checkpoint on a non-empty log and raises on malformed evidence (not fail-closed); `LocalStamper` tokens are forgeable	SEC-9, INV 3	Critical	R	resolved
038	Postgres append-only trigger omits `predicate`/`fact_type`/`fact_id`/`reason` (and edge identity columns); docstring claims exact SQLite parity	SEC-10, INV 4	Critical	V	resolved
039	Both backends allow a `t_invalid` or `superseded_actor` only mutation on an active row, superseding without actor or reason	SEC-10, INV 4	High	V	resolved
040	Deny and tier holes: `chmod -R 777 /` is T0 and undenied; `/etc/` writes via `cp`/`tee`/`ln`/`truncate` are T0	SEC-1, SEC-3, INV 2	High	R	resolved
041	Redaction misses `--password VALUE` style flags and URL/DSN credentials; the stdio server error path does not redact	SEC-9, INV 3	High	R	resolved
042	SSRF filter bypasses (decimal/hex/octal/trailing-dot IP) and allows all DNS names; `assert_egress_allowed` fail-open on non-IP host	SEC-7, INV 7	High	R	resolved
043	OpenTofu DRY_RUN is `plan -refresh-only` but apply is `apply -auto-approve`: preview scope does not match execute scope	SEC-6, INV 6	High	V	resolved
044	`_bounded_error` can raise (broken `__str__`), so `run()` can raise with no audit record	SEC-2, INV 1	Med	V	resolved
045	Docs over-state delivered controls: consent registry (ADR-0006/`SECURITY.md`), invariant 1 universality, STPA `_ssrf.py` path and read-tool audit claim	governance	High	V	resolved
046	SSRF: resolve hostnames and check every resolved IP (rebinding-aware) and wire the filter into the egress path	SEC-7	High	R	open
047	talosctl T3 single-target is checked on `host.name`, not the actual `host.nodes` list, so a T3 reset can wipe multiple nodes	SEC-6	High	V	open
048	talosctl `action.split()` appends attacker-influenced flags (`--insecure`, `--talosconfig`); use a verb allowlist	SEC-8	Med	V	open
049	`CredentialBroker` is never wired into the actuation path; scoped-credential enforcement is latent	INV 9	High	V	open
050	Audit hash chain tail-truncation is undetectable; needs an anchored high-water-mark	SEC-9	Med	R	open
051	Helm NetworkPolicy ingress has no `from:` selector; any pod can reach the MCP port	SEC-7	High	V	open
052	`ci-success` gates only `check`; CodeQL/fuzz/sbom/dependency-review rely on out-of-band branch protection	governance	High	V	open
053	No coverage tooling or gate (`pytest-cov`, `cov-fail-under`)	governance	Med	V	open
054	`_cosine` returns NaN on NaN/Inf input; `seq` is not unique and the `MAX(seq)+1` read races	SEC-10	Med	R/S	open
055	Audit degrade path reopens the file after `_degrade`, overriding the stderr sink and leaking the handle	SEC-8	Med	S	open
056	stdio server: unbounded line read (DoS); JSON-RPC notification and batch edge cases	SEC-7	Med	R/S	open
057	Manifest parser: `---extra` accepted as fence, no size cap, indented-key injection, duplicate-key last-wins	INV 8	Med	R	open
058	Collectors: AIDE empty output reads as clean (false negative); no size caps; non-finite numeric parse (with BL-026)	INV 8	Med	S	open
059	Drift: `UNEXPECTED` security-predicate findings are not escalated; multi-host Ansible output makes one invalid subject	SEC-6	Med	S	open
060	Deploy and config: Helm health probes, systemd drop-in duplication, unpinned `cyclonedx-bom`, whitespace `HTTP_HOST`, compliance-map path citations	governance	Med	S	open
061	Test and fuzz gaps: Postgres parity suite, evidence tamper matrix, host_type refusal per adapter, SSRF bypass tests, fuzz of manifest/merkle/evidence	governance	Med	V	open

BL-017 (read and ingest tools not audited) and BL-018 (trifecta denials not audited) were already open and are confirmed by this audit; they are not duplicated here.

Consequences¶

Positive: the highest-severity, reproduced security gaps are fixed with tests in the accompanying change; the rest are tracked with a verification level so the backlog reflects evidence, not conjecture. The internal-audit method is now repeatable.

Negative: the backlog grows by twenty-five items; several open items (BL-046, BL-049, BL-051, BL-052) are architectural and will need their own changes.

Neutral: this ADR records findings and acceptance; enforcement is the code and tests under each item.

Alternatives considered and rejected¶

Fix everything in one change. Rejected: the architectural items (credential wiring, hostname-resolving SSRF, CI gating) are larger than the surgical security fixes and merit separate, reviewable changes.
Trust the static reviews without reproduction. Rejected: dynamic reproduction confirmed the P0 set and corrected severities, which is the basis for fixing them with confidence.

Revisit triggers¶

The HTTP transport or a concurrent Postgres audit path is implemented (raises BL-050 and BL-046 in urgency).
An open item here is found exploitable in a wired path before it is scheduled.
A later audit contradicts a recorded verdict (append an audit note, never rewrite a resolved row).