ADR-0028: CIS-Talos drift baseline implementation (2026-06-14)¶
Status¶
Accepted
Date¶
2026-06-14
Authors¶
praxis maintainers (ratifies ADR-0024; implements BL-099, from ADR-0021)
Context¶
ADR-0024 fixed the CIS fact-predicate schema and was recorded Proposed, to be ratified before BL-099 was implemented (the backlog item required the schema decision first). The maintainer ratified ADR-0024; this wave adopts it as the schema of record and implements the baseline, the suppression policy, and the observing collector.
Decision¶
-
Ratify ADR-0024 as the schema of record, unchanged: subject
host:<name>/cluster:<name>, predicatecis:<benchmark>:<control_id>, the comparable-valueversus documentation-reasonsplit, CIS-aware severity through the engine'sseverity_forhook, and the explicitTALOS_SATISFIED/CIS_SUPPRESSEDsets. -
Implement the baseline in
src/praxis/drift/cis.py: aCisControlrecord, onenormalize_valueapplied identically on both sides (booleans lowercased, scalars trimmed, lists/comma-strings sorted), the vettedCIS_BASELINEacross the kubelet, API-server, controller-manager, scheduler, and cluster control families, pluscis_severity,cis_baseline_facts,cis_drift, andseed_cis_baseline. The engine (diff,default_severity) and the store are untouched. -
Suppression model:
CIS_SUPPRESSEDis a named waiver of a baseline control (the waived control is in the baseline and is dropped from the active set), whileTALOS_SATISFIEDdocuments controls Talos guarantees structurally that are not in the checkable baseline at all. Both are excluded at materialization, so neither the CIS diff nor the genericdrift_scanalerts on them. Excluding at materialization (rather than filtering findings afterwards) also keeps the generic scan'sflag_unexpectedpath from re-surfacing a waived control asUNEXPECTED. -
The observing collector (
src/praxis/collectors/cis.py) is a pure parser of captured CIS evidence (an out-of-process T0 read) intoOBSERVEDfacts under the same schema. It emits only controls in the active baseline, so observed is a subset of desired and the generic scan never manufactures a CISUNEXPECTED; a baselined control with no evidence still surfaces asMISSING. -
Wire CIS through the existing audited surface with no new tool and no new UCA: the collector plugs into
ingest_observation(already audited, marked untrusted, and UCA-covered, BL-085), anddrift_scanpassescis_severity(a tool-level severity choice, the engine stays generic per ADR-0024 decision 5) so a seeded baseline plus ingested evidence reports CIS drift atCRITICALthrough the existing read-only tool. Seeding the baseline is a library helper (seed_cis_baseline), an operator blessing like the other known-good seeds (BL-016), not an actuator. -
Baseline scope: an initial, vetted, must-equal-Y control set, benchmark-namespaced so a CIS Kubernetes Benchmark numeric revision (
k8s-1.x) or further controls extend it additively. Accuracy was favoured over breadth: thetalosnamespace uses stable, descriptive control ids with the CIS Kubernetes reference carried inreason, so an error-prone numeric id is never committed as a key (ADR-0024 decision 3 blesses both id forms).
Consequences¶
Positive: CIS drift drops into the existing engine, store, and tools with no engine
change; findings attach to the real host/cluster vertices and join the same bitemporal
history as other facts; the value/reason split keeps the equality diff reliable;
severity and suppression are explicit and reviewable; there is no new actuation or UCA
surface. The whole pipeline is unit-tested here without a live cluster, because the
collector is a pure parser and the diff is pure.
Negative: the baseline content is an initial set, not the full benchmark; growing it
(and adding a k8s-1.x numeric namespace) is tracked follow-up. A normalization
mismatch between real evidence and the baseline would show as a spurious CHANGED
until reconciled, so the first live collector bring-up must be validated against a
known-compliant node (the risk ADR-0024 named).
Neutral: drift_scan now ranks any cis: predicate CRITICAL; non-CIS predicates
are unchanged because cis_severity delegates to default_severity. The collector
couples to the active baseline keys by design (to keep observed a subset of desired),
so evidence for an unknown or waived control is dropped rather than recorded.
Alternatives considered and rejected¶
- A dedicated
cis_drift_scanMCP tool. Rejected: it would duplicatedrift_scanand add tool and STPA surface; makingdrift_scanCIS-aware via severity reuses the audited read path.cis_driftstays a library entry point for direct use and tests. - Transcribing the full CIS Kubernetes Benchmark by numeric id from memory. Rejected
as error-prone for a security baseline; the
talosdescriptive-id namespace with CIS references inreasonis accurate and extensible. - Post-diff finding suppression. Rejected: it leaves the generic scan's
flag_unexpectedpath re-surfacing waived controls asUNEXPECTED; excluding at materialization is consistent across both the CIS and the generic entry points. - Seeding the baseline via an MCP tool. Rejected: blessing a desired baseline is an operator action with no actuation; a library helper avoids a new state-writing tool and the UCA it would carry.
Revisit triggers¶
- A new CIS Kubernetes Benchmark revision (add a
k8s-1.xbenchmark namespace and a baseline migration). - The first live collector bring-up (validate normalization against a known-compliant
node; reconcile any mismatch into the
valueschema). - The baseline grows enough to warrant splitting the data out of
drift/cis.py. - A need to rank CIS Level 2 controls below Level 1 (the level side table ADR-0024 decision 5 anticipated).