ADR-0042: Concurrent HTTP serving over a thread-safe store (2026-06-15)¶
Status¶
Accepted (resolves BL-110; the ADR-0041 concurrency revisit trigger)
Date¶
2026-06-15
Authors¶
praxis maintainers (BL-110)
Context¶
ADR-0041 delivered the multi-client HTTP transport on a single-threaded stdlib
HTTPServer. That choice was deliberate and conservative: per-session isolation (the
security goal) was full, but requests were serialised, so a slow actuation on one client
blocked every other client. ADR-0041 recorded the single-threaded server as v1 and named
the follow-up: make the store thread-safe and switch to ThreadingHTTPServer (BL-110).
The reason serving was single-threaded was the store, not the transport. The default
SqliteStore held one sqlite3 connection opened with check_same_thread=True, so a
handler thread other than the one that built the context could not touch it at all. Every
other shared component was already thread-safe or trivially safe: the audit hash chain
appends under its own lock (BL-029); the evidence scheduler holds its own lock and is
count-based, so out-of-order or concurrent on_record calls are correct; the session
manager, and each session's approval registry, are lock-guarded (ADR-0041); per-session
isolation keeps one client's taint latch or pending nonce off another. The remaining gaps
were the store connection and the per-session BudgetTracker counters.
Decision¶
-
Serialise every store method on a per-instance re-entrant lock. A
synchronizeddecorator instore/base.pywraps each connection-touching method ofSqliteStoreandPostgresStorewithwith self._lock:(anRLock, because a write method calls a read method on the same instance, e.g.put_fact->get_active, on one thread). The SQLite connection is openedcheck_same_thread=Falseso a handler thread may use it; the lock makes the single shared connection safe, the documented-safe single-connection pattern. This is per-instance, so the BL-103 two-instance compare-and-set test still exercises real cross-connection concurrency (theFOR UPDATEand unique-seqguarantees are unchanged); the lock only serialises one instance shared across threads, which is the threaded-server case. -
Make
BudgetTrackerthread-safe. The check-and-charge incharge(and the increment inrecord_spend) run under athreading.Lock, so two concurrent requests in the SAME session cannot read-modify-write the counters and let an extra action slip past the ceiling. Per-session budgets are otherwise isolated across sessions already (BL-104). -
Switch the transport to
ThreadingHTTPServerwithdaemon_threads = True. Each request runs on its own thread, so a slow actuation no longer blocks other clients. No other transport change: auth, the session lifecycle, the body cap, and the consent ceiling are as ADR-0041 left them.
The taint latch (SessionTaint) and the kill switch are intentionally left lock-free.
Both are monotonic and fail-safe: the taint latch only ever transitions unset -> set (a
concurrent double-mark is idempotent, and the worst case is over-tainting, which fails
closed), and the kill switch is a boolean plus an idempotent sentinel write whose
read-or-trip races resolve to "tripped" (also fail-safe). Adding locks there would buy no
correctness.
Consequences¶
Positive: BL-110 is resolved. The HTTP server serves clients in parallel (for example actuating several hosts at once), so one slow call no longer stalls the fleet, while every bitemporal/append-only invariant holds: store mutations are serialised, the audit chain and evidence stay single-writer-correct, and per-session isolation is unchanged. The store lock is a single, auditable mechanism shared by both backends, with no new dependency.
Negative: store operations are serialised process-wide, so two requests never touch the
store literally simultaneously. This is acceptable: store operations are short, and the
work the threaded server actually parallelises (actuation subprocesses, network I/O, the
DRY_RUN to approve to execute round trip) runs outside any store method, so it overlaps
freely. A future workload that is store-read-bound could move to per-thread connections or
a WAL read pool; the synchronized seam localises that change.
Neutral: stdio is unchanged and remains the default. The lock is per store instance, so
behaviour is identical for the single-threaded stdio path (the lock is uncontended). The
SQLite busy_timeout and BEGIN IMMEDIATE cross-connection serialisation (BL-027, BL-068)
remain in force for the multi-instance/multi-process case and are complementary to the new
in-process lock.
Alternatives considered and rejected¶
- A thread-safe wrapper/proxy around any
StoreProtocolbackend, locking in one place. Rejected: a universal proxy that exposes the vector and compare-and-set methods would make a backend that lacks them appear to have them, violating the store contract's stated rule that a backend never fakes an unsupported capability (store/base.py); per-backend locking keeps each backend's true capability surface andisinstancechecks honest. - Per-thread store connections via
threading.local(true read concurrency, no lock). Rejected for v1: a:memory:database is per-connection, so each thread would get a separate empty database, breaking the in-memory default used widely in tests and by the ephemeral server; connection lifecycle across the thread pool adds complexity. The single connection plus lock is correct for both:memory:and file stores. Per-thread or pooled connections remain the escalation path if read throughput demands it. - A non-reentrant
Lockon the store. Rejected:put_factcallsget_activeon the same instance, which would self-deadlock;RLockre-enters on the owning thread. - Holding the audit lock across the evidence
on_recordhook to serialise checkpoints. Rejected as unnecessary: the evidence scheduler already holds its own lock and counts records, so concurrent or out-of-order hook calls are already correct.
Revisit triggers¶
- A store-read-bound workload wants real read parallelism: move to per-thread connections
or a WAL reader pool behind the
synchronizedseam (SQLite), or a connection pool (Postgres). - A deployment wants bounded concurrency or backpressure: cap the worker threads (a pool)
rather than
ThreadingHTTPServer's unbounded thread-per-request. - Per-distinct-client tokens land (the ADR-0006 multi-operator revisit): concurrent distinct operators make the per-session budget and consent ceiling load-bearing across real principals, not just one operator's sessions.