Architecture Decision Records¶

The decisions documented here shape the trust boundary, the runtime, and the operator-facing posture of relay-shell. Read them in order the first time; revisit one when its subject area changes.

The format follows Michael Nygard's ADR template: context, decision, consequences. Each ADR has a Status line at the top. Status values:

Status	Meaning
Proposed	Drafted, not yet adopted.
Accepted	Adopted; the codebase reflects the decision.
Superseded	Replaced by a later ADR; see `Superseded by` line.
Deprecated	No longer applicable; kept for history.

Index¶

ADR	Title	Status	Date	Subject
0001	Runtime, SDK, and SSH library	Accepted	2026-05-19	Why Python 3.12 + the official `mcp` SDK + `asyncssh`, with the alternatives that were rejected (`paramiko`, building a transport from scratch).
0002	Unsandboxed, full-access posture	Accepted	2026-05-19	Why the executor runs without a meaningful internal sandbox - the project exists to give an MCP client real administrative power, so the safety story is compensating controls (audit, tier policy, redaction, bounds, deployment discipline) instead.
0003	Tiered authority	Accepted	2026-05-19	The four-tier classification (read-only / reversible / stateful / irreversible) plus `open` / `guarded` / `readonly` admission modes that consume it. The deny list is enforced first in every mode.
0004	Automated TLS at the edge	Accepted	2026-05-20	Why `deploy/install-edge.sh` provisions Caddy + ACME (Let's Encrypt) for the HTTP transport, and why certbot+cron and native TLS in the Python service were rejected.
0005	Codebase validation against known-good sources	Accepted	2026-05-24	A repeatable validation pass against the upstream `mcp` / `asyncssh` / OAuth surfaces, the audit record schema, and the documented redaction / tier behavior. A running record that appends a dated outcome per pass: 2026-05-24 (three documentation-drift findings), 2026-05-31 (F-004, redaction coverage for bare provider-token shapes), 2026-06-01 (F-005 C-005 runbook drift; the ADR 0007 audit hash-chain landed in the same pass), 2026-06-12 (D-001, the `mcp` 1.27.1 → 1.27.2 pin-drift reconciliation, recorded as a full audit pass under `audit/`), and 2026-06-21 (DOC-1, the runbook §8.18 next-free-ADR marker corrected to match this index), plus a same-day 2026-06-21 full audit pass (scanner battery + steps 1-4 clean; SEC-3 dependency-floor hardening, TOOL-4 CODEOWNERS; deferrals in `audit/2026-06-21-engagement.md`).
0006	Syscall-level audit channel via seccomp-bpf notification mode	Accepted	2026-06-02	An audit-only seccomp-bpf channel (notify-mode, never blocking) that closes the audit gap on the child side of `asyncio.create_subprocess_*` without re-introducing a sandbox. Shipped in `src/relay_shell/seccomp.py` (pure `ctypes`, no new deps): opt-in via `RELAY_SHELL_SECCOMP_NOTIFY` (default off), `CAP_SYS_ADMIN`-gated so set-uid/`sudo` posture is preserved verbatim, Linux/`x86_64`/kernel ≥ 5.5, additive `syscall_notify` / `syscall_notify_overflow` audit lines (extend the ADR 0007 chain) plus two bounded `/metrics` counters. Proposed 2026-05-24; accepted with the implementing PR (runbook §7.5 B-021). Follow-ups landed 2026-06-09 (filter version 2): `prctl` notified for privilege-relevant options via an eq-any predicate (B-024) and coverage extended to local PTY sessions, whose transport adopts the monitor for the session lifetime (B-026).
0007	Tamper-evident audit log via per-record hash chaining	Accepted	2026-06-01	An opt-in (`RELAY_SHELL_AUDIT_CHAIN`, default off), additive per-record hash chain (`seq`/`prev`/`chain`) that makes edits, insertions, reorders, and interior deletions of the on-disk audit log detectable by recomputation; the fail-closed `relay-shell --verify-audit` also rejects a missing / empty / head-truncated log by default (`--segment` for a rotation segment; tail-truncation needs the off-host copy) — closing the integrity gap left by `chattr +a` + off-host shipping against the ADR 0002 residual-risk attacker. `jsonl` only; a CLI verb, not an MCP tool.
0008	Operating-guidance MCP prompt, audited like a resource read	Accepted	2026-06-08	Adds one MCP prompt (`operating_guide`) as the canonical home for detailed "when to use which tool" guidance (one-shot vs PTY session, the spawn+`session_*` workflow, fleet/transfer entry points), beyond the concise `instructions` string and per-tool descriptions. A fetch is a model-context pull, so it is audited (tier 0, stable `tool="prompt:<name>"`) and bounded by the same `max_output` cap, bypassing `Relay.run` exactly as resource reads do; `prompts/list` returns metadata only and does not audit. No audit-record-shape change — only a new `prompt:` `tool` namespace alongside `resource:` / `syscall_notify`.

When to write an ADR¶

Any of the following needs an ADR before code lands:

A new transport (e.g. unix-socket alongside stdio / streamable-http).
A new auth provider (e.g. JWT static-keys alongside the file-backed OAuth 2.1).
A change to the audit-record shape or to the no-sandbox posture.
A new policy category (not just another verb in the existing TIER2_PATTERN / TIER3_PATTERN; see docs/runbook.md §6.4).

Routine additions - a new tool, a new redaction pattern, a tightened test - do not need an ADR; they go through the normal review loop. The runbook §6 has recipes per case.

How to write one¶

Number sequentially. Next free number is 0009.
Filename pattern: NNNN-short-slug.md.
Required header:

# ADR NNNN: <Title>

- Status: Accepted
- Date: YYYY-MM-DD

Sections: Context, Decision, Consequences, Rejected alternatives (when applicable).
Reference the ADR by number in code or other docs, not by file path - the path is stable but the number is the canonical handle.

Cross-references¶

docs/architecture.md - request lifecycle, module table, and how the ADRs map onto the runtime.
docs/runbook.md §6 - extension recipes that may require an ADR before code lands.
SECURITY.md - the threat model and how ADRs 0002 / 0003 constrain it.