ADR-0043: Kubernetes actuation credential contract: scoped static kubeconfig, or a bastion-host skill (2026-06-15)¶
Status¶
Proposed
Date¶
2026-06-15
Authors¶
praxis maintainers (design decision before code; files BL-111)
Context¶
praxis today manages Kubernetes only at the Talos control-plane layer: the
talosctl adapter (src/praxis/actuation/talosctl.py) carries cluster-lifecycle
verbs (bootstrap, upgrade-k8s, kubeconfig, with reset/upgrade at T3) for
HostType.TALOS hosts, and the classifier recognises kubectl
apply|delete|scale|rollout|patch|edit|drain|cordon|uncordon as T2
(execution/patterns.py). There is no first-class kubectl or helm actuation
adapter: a kubectl invocation today only reaches the fleet through the free-form
SSH adapter (floored at T2, ADR-0016), without native dry-run or structured-argument
hardening.
Whether kubectl and helm can become first-class audited actuators (rather than
remaining documented skills run on a bastion host) is decided by one thing: the
credential model. Two invariants bear directly on it. Invariant 9 requires
least-privilege, scoped, revocable credentials and forbids the NOPASSWD: ALL
equivalent. Invariant 8 (lethal-trifecta containment) treats all collected data as
untrusted and keeps a human gate between observation and actuation. A cluster
credential that is broad, ambient, or that runs an arbitrary helper per call cuts
against both.
The mechanism praxis already has:
CredentialBroker(actuation/credentials.py) holds the authorization record only (role, exact hosts, max tier), independently revocable withkill_all; the secret material is injected out of band and never stored or logged (SEC-8).scrubbed_env()(actuation/base.py) forwards a small allowlist of credential references by name:SSH_AUTH_SOCK,TALOSCONFIG, and alreadyKUBECONFIG(BL-080). Everything else in the server environment, including unrelated secrets, never reaches a wrapped tool or its plugins.talosctlis the precedent: aTALOSCONFIGprovisioned out of band, with--nodes/--endpointspinned from the trusted inventory (config/inventory.yaml), never from a free-form flag (BL-082).
Three Kubernetes-specific credential hazards make a naive adapter unsafe:
- Context selection is ambient.
KUBECONFIGis one process-wide variable whose current-context is mutable, so a call could silently target the wrong cluster. - Kubeconfigs are admin by default. A typical kubeconfig (including the one
talosctl kubeconfigemits) grants cluster-admin, which is the standing-privilege posture invariant 9 forbids. Tiers gate the verb, not the blast radius of the credential. - Cloud and
execauth runs an arbitrary binary per call. EKS/GKE/AKS kubeconfigs carry auser.execcredential plugin (aws eks get-token,gke-gcloud-auth-plugin,kubelogin) thatkubectl/helmexecute on every invocation to mint a token. That plugin needs ambient cloud environment thatscrubbed_env()deliberately strips (so it fails closed), and it is itself an unaudited arbitrary-subprocess actuation surface that praxis would be implicitly executing.
ArgoCD was considered alongside kubectl/helm and set aside: it is itself a GitOps
reconciler, overlapping the human-gated drift engine (ADR-0007 rejected continuous
auto-reconcile), and fits better as a future read-only desired-state drift source than
as an actuator.
Decision¶
-
A first-class
kubectl/helmactuation adapter is admissible only under the scoped-static-kubeconfig contract below. When that contract cannot be met (notably cloud orexec-plugin auth, or when only an admin kubeconfig exists), Kubernetes and Helm operations remain a bastion-host tool skill: a knowledge skill that documents the gated procedure on a bastion that already holds the cluster auth, where those actions are audited by the bastion rather than by praxis. This dividing line is the decision. -
The scoped-static-kubeconfig contract:
a. Credential. A static kubeconfig referenced out of band (KUBECONFIG is already
allowlisted), holding a bearer token or client certificate for an RBAC-scoped
principal, never cluster-admin. Scoping the kubeconfig is the operator's
responsibility, the same posture as scoped SSH keys and TALOSCONFIG today. The
CredentialBroker holds the cluster-host grant (role, host, max tier).
b. Targeting from trusted inventory. The cluster is an inventory host vertex with a
new HostType.KUBERNETES, carrying the kubeconfig path and the context name. The
adapter pins --kubeconfig <path> --context <ctx> from the trusted inventory,
validated, never from the caller and never from the ambient current-context.
Kubeconfig paths are confined with confine_to_root (as ansible/tofu/runbook
roots are), fail-closed when unset.
c. Refuse exec-stanza kubeconfigs (fail closed). A kubeconfig whose selected
context resolves to a user.exec credential plugin is rejected before any argv
is built, for the two reasons in Context hazard 3. Cloud and exec auth therefore
route to the bastion skill, not the adapter.
d. Tiering and DRY_RUN. Verbs are allowlisted (the talosctl pattern). Reads are T0;
kubectl apply|scale|rollout|patch|cordon|drain and helm install|upgrade|rollback
are T2 (the kubectl mutators are already classified T2 in patterns.py);
helm uninstall, kubectl delete of a namespace/PVC/CRD, and other
hard-to-reverse verbs are T3 (typed token, one target at a time). Every real run
is DRY_RUN then approve then execute, with native preview where the tool offers
one (kubectl --dry-run=server/kubectl diff, helm --dry-run).
e. host_type gate (SEC-5). The kubectl/helm adapters support only
HostType.KUBERNETES; SSH, ansible, and talosctl refuse a KUBERNETES host and
vice versa, as a HARD audited precondition, exactly as the Talos SSH refusal works.
f. No free-form options (BL-082). Option-shaped tokens in the action string are refused; all options are set by the adapter from structured params; resource and namespace names are validated.
-
STPA traceability. Two new control actions,
act_kubectlandact_helm, are added todocs/stpa/05-ucas.md, each with provide-type UCAs (act without DRY_RUN and approval; accept anexec-plugin or admin kubeconfig; wrong host_type), and each mapped to a covering security constraint in07-security-constraints.md(SEC-5 host_type, SEC-6 DRY_RUN then approve, SEC-8 scoped credential). They are pre-staged and flagged[planned], like theact_redfish/act_cloudrows (ADR-0022), until the adapter is implemented. -
Default posture unchanged. This ADR builds nothing. With no KUBERNETES host in the inventory and no adapter registered, behaviour and the dependency set are unchanged.
kubectlandhelmare runtime tools discovered onPATHand wrapped (never vendored), consistent with the actuation-wraps-real-tools rule and the dependency-free core (ADR-0001/0014): no new Python dependency.
Consequences¶
Positive: a clean, invariant-aligned dividing line between what praxis actuates
directly and what stays a bastion procedure. The scoped-kubeconfig path reuses
existing machinery (the talosctl adapter shape, the credential broker, confine_to_root,
the verb-allowlist and structured-argv hardening) with no new dependency. Cloud-auth
complexity is kept off the audited path by routing it to a skill. ArgoCD is kept out
of the actuation surface, avoiding a second reconciler beside the drift engine.
Negative: cloud-managed clusters (EKS/GKE/AKS) cannot be actuated through praxis directly; their operators use the bastion skill and accept that those actions are audited by the bastion, not by praxis. Multi-cluster requires one inventory host plus one scoped kubeconfig and context per cluster; there is no ambient context switching.
Neutral: the eventual zero-standing-privilege answer is the roadmap Phase 4
SecretSource (JIT dynamic kubeconfig minting behind the broker,
docs/roadmap/iam-access-and-secrets-expansion.md); until that lands, a scoped static
kubeconfig is the contract. A short-TTL OIDC/exec credential plugin could be
allowlisted deliberately in future if the ambient-environment and arbitrary-subprocess
hazards are separately mitigated.
Alternatives considered and rejected¶
- Widen
scrubbed_env()to pass cloud credentials (AWS_*,GOOGLE_APPLICATION_CREDENTIALS,kubeloginvariables) soexec-plugin kubeconfigs work. Rejected: it broadens the secret surface reaching every wrapped tool (against BL-080) and admits an arbitrary per-call subprocess onto the audited path. - Accept an admin kubeconfig and rely on tiering alone. Rejected: standing
cluster-admin is the
NOPASSWD: ALLequivalent invariant 9 forbids; tiers gate the verb, not the credential's blast radius. - Trust the kubeconfig current-context. Rejected: it is ambient and mutable, so a call could silently hit the wrong cluster; the context must come from trusted inventory.
- Run
kubectl/helmonly through the existing free-form SSH adapter with no dedicated adapter. Rejected as the default for the scoped case (it loses native dry-run, the structured-argument hardening, and the KUBERNETES host_type gate), but retained explicitly as the sanctioned path for the cloud/exec-auth case (the bastion skill). - Build an ArgoCD actuator as well. Rejected: ArgoCD is itself a GitOps reconciler, overlapping the human-gated drift engine (ADR-0007); it fits a future read-only desired-state drift source, not the actuation surface.
Revisit triggers¶
- The roadmap Phase 4
SecretSourcelands, making short-TTL minted kubeconfigs the preferred path over a static scoped one. - A vetted, bounded way to run an OIDC/
execcredential plugin under an allowlisted environment, without widening the generalscrubbed_env(), is designed. - A requirement to actuate cloud-managed clusters directly (not via a bastion) is raised.
- ArgoCD or Flux desired-state ingestion is wanted as a drift source.