ADR-0013: Third audit wave (2026-06): actuation, audit, and untrusted-input hardening¶

Field	Value
Status	Accepted
Date	2026-06-07
Authors	Roman Mednitzer

Context¶

A third review wave read the full source again, re-ran every gate, and validated the architecture against established hardening practice for the surfaces praxis fuses: privileged SSH/subprocess execution (OpenSSH BatchMode and host-key policy, POSIX process groups), append-only audit logging (owner-only file mode, visible-seam chain resume), untrusted-input parsing (finite-or-default numeric coercion, frontmatter fence and size discipline), and secret redaction (structural provider-token anchors, value-complete Authorization redaction). The design was distilled from proven systems and reimplemented natively; this wave closes the gap between the distilled intent and the delivered code. No third-party code or cross-repository reference is introduced: praxis stays self-contained (ADR-0001).

The review confirmed the previously-audited controls still hold (the append-only triggers, the fail-closed evidence verifier, the SSRF numeric-form normalisation, the deny-first policy). It also confirmed a coherent cluster of open backlog items from ADR-0011 and ADR-0012 are real and surgically fixable, and surfaced a small set of new hardening gaps and one self-containment defect:

The SSH adapter built ["ssh", target, action] with no host-key policy and no BatchMode, so it could prompt (and hang a TTY-less MCP call) or accept an unverified host key; a leading-dash target was an ssh-option-injection vector.
The subprocess runner inherited the server environment and stdin, and on timeout killed only the direct child: a wrapped tool could read the MCP stdio stream, hang on a credential prompt, or leak a grandchild process tree.
The talosctl adapter enforced the T3 single-target rule on host.name rather than the actual host.nodes, and tokenised a free-form action into argv.
A trifecta refusal raised out of the tool handler with no audit record.
The audit logger reopened the file after degrading and dropped the sink to stderr on a corrupt tail (losing records); the log was created world/group readable.
A poisoned embedding (NaN) could drag a NaN score into vector ranking; the manifest parser accepted ---- as a fence, had no size cap, and allowed indented and duplicate keys; an empty AIDE report read as a clean host.
A context.py docstring cited an out-of-tree prototype by name as its rationale, a self-containment (ADR-0001) and docs-honesty defect.

Decision¶

Adopt this wave as the third recurring audit (after the external ADR-0011 and the internal ADR-0012), validating delivered code against established hardening practice and reimplementing every adopted technique natively. No sibling repository is named in code or docs; the self-contained invariant is upheld.
Remediate the coherent, surgical, security-relevant cluster in the change that accompanies this ADR, each fix with a regression test: BL-018, BL-020, BL-021, BL-034, BL-047, BL-048, BL-054, BL-055, BL-057, BL-058, BL-059 resolved, plus the new items BL-063 to BL-067. Architectural open items (BL-046 SSRF resolution, BL-049 credential wiring, BL-051/052 CI/deploy gating) stay open and tracked.
Where a resolved item carried a residual beyond the security-critical core, carve the residual into a new tracked item rather than over-claim closure: the store seq cross-connection race (residual of BL-054) becomes BL-068, and the talosctl structured-parameter refactor (beyond the BL-048 verb allowlist) is noted as a future refinement, not a delivered guarantee.

Findings¶

Verification: R = reproduced by executing the code, V = verified against the exact source.

BL	Finding	Constraint	Sev	Verify	Status
020	SSH adapter has no host-key policy or `BatchMode`; a leading-dash target is an option-injection vector	SEC-5, INV 5	High	V	resolved
021	Subprocess runner inherits stdin (can read the MCP stdio stream) and env (can hang on a prompt), and kills only the direct child on timeout (leaks the process tree)	SEC-8	High	R	resolved
047	talosctl T3 single-target rule is enforced on `host.name`, not the `host.nodes` list, so one T3 reset can wipe multiple nodes	SEC-6, INV 6	High	V	resolved
048	talosctl tokenises a free-form `action` into argv; constrain the leading verb to an allowlist	SEC-8	Med	V	resolved
018	A trifecta refusal raises out of the tool handler with no audit record	SEC-4, INV 3	Med	R	resolved
055	Audit logger reopens the file after `_degrade` and drops the sink to stderr on a corrupt tail (losing records) and leaks the handle	SEC-8, INV 3	Med	R	resolved
057	Manifest parser accepts `----` as a fence, has no size cap, and allows indented and duplicate keys; non-UTF-8 bytes crash the loader	INV 8	Med	R	resolved
058	AIDE empty output reads as a clean host (false negative); collected telemetry has no size cap before parsing	INV 8	Med	R	resolved
054	`_cosine`/`similar` propagate a `NaN` from a poisoned or corrupted embedding into the ranking	SEC-10	Med	R	resolved
034	`parse_ansible_check` only reads `changed:`; a `FAILED`/`UNREACHABLE` host during a check is dropped	SEC-6	Med	V	resolved
059	An `UNEXPECTED` security-predicate finding (a rogue port or user) is ranked `INFO`, below a changed/missing one	SEC-6	Med	V	resolved
063	Actuation subprocess does not scrub the env (no `GIT_TERMINAL_PROMPT=0`/`DEBIAN_FRONTEND=noninteractive`) or detach stdin	SEC-8	Med	R	resolved
064	Audit log is created world/group readable; not opened `O_APPEND` at the OS level	SEC-9	Med	R	resolved
065	Redaction misses common provider token shapes (`github_pat_`, `glpat-`, `npm_`, `AIza`, `ya29.`, Stripe, OpenAI scoped) and stops `Authorization` at the first space, leaking a comma-separated SigV4 signature	SEC-9, INV 3	Med	R	resolved
066	`context.py` cites an out-of-tree prototype by name as rationale (self-containment and docs-honesty defect)	governance, INV (self-contained)	Low	V	resolved
067	`PRAXIS_HTTP_HOST` is not whitespace-stripped, so a `"127.0.0.1\n"` value is misread as non-loopback	SEC-7	Low	R	resolved
068	Store `seq` is not unique; the `MAX(seq)+1` read can race across two store instances on one file (residual of BL-054)	SEC-10	Low	V	open

Consequences¶

Positive: the privileged-execution surface (SSH host-key policy and option-injection guard, process-group isolation, stdin detachment, env scrubbing, the talosctl verb allowlist and node-aware T3 gate) is materially hardened with tests; every denial is now audited; the audit log is owner-only and survives a corrupt tail without losing records; untrusted parsing (vectors, manifests, AIDE, telemetry size) is robust; and redaction covers more secret shapes value-completely. The repository is again fully self-contained in code and docs.

Negative: the actuation subprocess path moved from subprocess.run to a Popen plus explicit timeout/kill, which is more code on the trust boundary (covered by a new timeout test). The talosctl verb allowlist must be extended deliberately when a new subcommand is needed.

Neutral: this ADR records the wave and its acceptance; enforcement is the code and tests under each item. The architectural open items from ADR-0012 are unchanged.

Alternatives considered and rejected¶

Resolve every open item from ADR-0011/0012 in one change. Rejected: the architectural items (hostname-resolving SSRF, credential wiring, CI/deploy gating) are larger than this surgical security cluster and merit their own reviewable changes, consistent with ADR-0012's staging.
Default the SSH host-key policy to StrictHostKeyChecking=yes. Rejected for v0: a fleet with no pre-seeded known_hosts would refuse every first connection; accept-new (Trust-On-First-Use, refusing a changed key) is the secure default and is overridable to yes once known_hosts is seeded.
Degrade the audit sink to stderr on a corrupt tail (the prior behaviour). Rejected: losing the audit record is worse than a visible seq-reset seam that the verifier reports; the seam is the security signal.

Revisit triggers¶

The HTTP transport is implemented (raises BL-046 SSRF resolution and the consent registry in urgency).
A concurrent or multi-instance store path is implemented (raises BL-068).
A later audit contradicts a recorded verdict here (append an audit note, never rewrite a resolved row).