ADR-0012: Internal deep audit (2026-06) and remediation wave¶
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-06-07 |
| Authors | Roman Mednitzer |
Context¶
A deep internal audit read every source file, ran the gates, and added a dynamic adversarial harness that executed the real code against hostile inputs (the harness lived outside the repo; nothing was modified during the audit). Six parallel static reviews (execution, store, actuation, MCP surface, audit/skills, docs/deploy/CI) were cross-checked, and every Critical and High claim was either reproduced by execution or verified against the exact source.
The audit confirmed the architecture and the SQLite store are sound (append-only triggers block DELETE, value-mutation, actor-mutation, and a second active row; the hash chain detects edit, reorder, and head-truncation; the Merkle tree is RFC 6962 correct). It also found that several load-bearing security properties do not hold as claimed. The most serious, all reproduced:
- The tamper-evidence layer is forgeable and not fail-closed: a checkpoint with
tree_size=0(plus a forgeableLocalStampertoken oversha256(b"")) verifies asok=Trueon a non-empty log, and malformed evidence raises instead of returning a negative result. - The production Postgres append-only trigger omits the identity columns the
SQLite trigger guards, so
fact_typeorpredicateis mutable on an active row, and both backends allow at_invalidonly mutation that supersedes a fact without an actor or reason. - Tier and deny classification holes:
chmod -R 777 /classifies T0 and is not denied, and writes under/etc/viacp/tee/ln/truncateclassify T0. - The SSRF egress filter has numeric and trailing-dot bypasses and treats every DNS name as allowed, and its entry point has no callers.
- Redaction misses space-separated credential flags (
--password VALUE) and URL or DSN embedded credentials, and the stdio server returns exception text to the client without redaction. - The consent registry specified in ADR-0006 Decision 4 and presented as a
control in
SECURITY.mdwas never built (zero references in code).
Decision¶
- Adopt the internal deep audit (static fan-out plus dynamic adversarial harness plus per-claim verification) as a recurring practice alongside the external audit cadence (ADR-0011). This ADR is the first internal wave.
- Accept the findings as backlog items BL-037 to BL-061, each mapped to a security constraint or invariant and citing this ADR.
- Remediate the reproduced P0 set and the reproduced P1 set in the change that accompanies this ADR (BL-037 to BL-045 resolved); the remainder stay open and tracked. Confirmed-correct controls are not reopened.
- Correct the documentation that over-stated delivered guarantees. ADR-0006 is immutable, so its consent gap is recorded as an appended audit note there, not a rewrite.
Findings¶
Verification: R = reproduced by executing the code, V = verified against the exact source, S = static review (credible, not independently executed).
| BL | Finding | Constraint | Sev | Verify | Status |
|---|---|---|---|---|---|
| 037 | verify_evidence accepts a tree_size=0 checkpoint on a non-empty log and raises on malformed evidence (not fail-closed); LocalStamper tokens are forgeable |
SEC-9, INV 3 | Critical | R | resolved |
| 038 | Postgres append-only trigger omits predicate/fact_type/fact_id/reason (and edge identity columns); docstring claims exact SQLite parity |
SEC-10, INV 4 | Critical | V | resolved |
| 039 | Both backends allow a t_invalid or superseded_actor only mutation on an active row, superseding without actor or reason |
SEC-10, INV 4 | High | V | resolved |
| 040 | Deny and tier holes: chmod -R 777 / is T0 and undenied; /etc/ writes via cp/tee/ln/truncate are T0 |
SEC-1, SEC-3, INV 2 | High | R | resolved |
| 041 | Redaction misses --password VALUE style flags and URL/DSN credentials; the stdio server error path does not redact |
SEC-9, INV 3 | High | R | resolved |
| 042 | SSRF filter bypasses (decimal/hex/octal/trailing-dot IP) and allows all DNS names; assert_egress_allowed fail-open on non-IP host |
SEC-7, INV 7 | High | R | resolved |
| 043 | OpenTofu DRY_RUN is plan -refresh-only but apply is apply -auto-approve: preview scope does not match execute scope |
SEC-6, INV 6 | High | V | resolved |
| 044 | _bounded_error can raise (broken __str__), so run() can raise with no audit record |
SEC-2, INV 1 | Med | V | resolved |
| 045 | Docs over-state delivered controls: consent registry (ADR-0006/SECURITY.md), invariant 1 universality, STPA _ssrf.py path and read-tool audit claim |
governance | High | V | resolved |
| 046 | SSRF: resolve hostnames and check every resolved IP (rebinding-aware) and wire the filter into the egress path | SEC-7 | High | R | open |
| 047 | talosctl T3 single-target is checked on host.name, not the actual host.nodes list, so a T3 reset can wipe multiple nodes |
SEC-6 | High | V | open |
| 048 | talosctl action.split() appends attacker-influenced flags (--insecure, --talosconfig); use a verb allowlist |
SEC-8 | Med | V | open |
| 049 | CredentialBroker is never wired into the actuation path; scoped-credential enforcement is latent |
INV 9 | High | V | open |
| 050 | Audit hash chain tail-truncation is undetectable; needs an anchored high-water-mark | SEC-9 | Med | R | open |
| 051 | Helm NetworkPolicy ingress has no from: selector; any pod can reach the MCP port |
SEC-7 | High | V | open |
| 052 | ci-success gates only check; CodeQL/fuzz/sbom/dependency-review rely on out-of-band branch protection |
governance | High | V | open |
| 053 | No coverage tooling or gate (pytest-cov, cov-fail-under) |
governance | Med | V | open |
| 054 | _cosine returns NaN on NaN/Inf input; seq is not unique and the MAX(seq)+1 read races |
SEC-10 | Med | R/S | open |
| 055 | Audit degrade path reopens the file after _degrade, overriding the stderr sink and leaking the handle |
SEC-8 | Med | S | open |
| 056 | stdio server: unbounded line read (DoS); JSON-RPC notification and batch edge cases | SEC-7 | Med | R/S | open |
| 057 | Manifest parser: ---extra accepted as fence, no size cap, indented-key injection, duplicate-key last-wins |
INV 8 | Med | R | open |
| 058 | Collectors: AIDE empty output reads as clean (false negative); no size caps; non-finite numeric parse (with BL-026) | INV 8 | Med | S | open |
| 059 | Drift: UNEXPECTED security-predicate findings are not escalated; multi-host Ansible output makes one invalid subject |
SEC-6 | Med | S | open |
| 060 | Deploy and config: Helm health probes, systemd drop-in duplication, unpinned cyclonedx-bom, whitespace HTTP_HOST, compliance-map path citations |
governance | Med | S | open |
| 061 | Test and fuzz gaps: Postgres parity suite, evidence tamper matrix, host_type refusal per adapter, SSRF bypass tests, fuzz of manifest/merkle/evidence | governance | Med | V | open |
BL-017 (read and ingest tools not audited) and BL-018 (trifecta denials not audited) were already open and are confirmed by this audit; they are not duplicated here.
Consequences¶
Positive: the highest-severity, reproduced security gaps are fixed with tests in the accompanying change; the rest are tracked with a verification level so the backlog reflects evidence, not conjecture. The internal-audit method is now repeatable.
Negative: the backlog grows by twenty-five items; several open items (BL-046, BL-049, BL-051, BL-052) are architectural and will need their own changes.
Neutral: this ADR records findings and acceptance; enforcement is the code and tests under each item.
Alternatives considered and rejected¶
- Fix everything in one change. Rejected: the architectural items (credential wiring, hostname-resolving SSRF, CI gating) are larger than the surgical security fixes and merit separate, reviewable changes.
- Trust the static reviews without reproduction. Rejected: dynamic reproduction confirmed the P0 set and corrected severities, which is the basis for fixing them with confidence.
Revisit triggers¶
- The HTTP transport or a concurrent Postgres audit path is implemented (raises BL-050 and BL-046 in urgency).
- An open item here is found exploitable in a wired path before it is scheduled.
- A later audit contradicts a recorded verdict (append an audit note, never rewrite a resolved row).