ADR 0005: Anthropic client with a hard daily cap and prompt-cache discipline¶
- Status: Accepted
- Date: 2026-05-20
- Authors: rmednitzer
- Builds on: ADR 0001
Context¶
inference_cloud is the seam through which a nous deployment talks to
Anthropic. Three operational concerns drove the design:
- Cost. An unattended simulator can hit a billing surprise quickly.
- Latency. Repeated calls with the same system prompt should hit the prompt cache.
- Prompt injection. Sensor and operator inputs reach the model; trusted and untrusted content must not share a cache slot.
Decision¶
src/nous/anthropic_client.py:
- Counts every call against a file-locked daily counter at
$NOUS_HOME/.anthropic_daily_count. The default cap is 100 calls per UTC day and is set byNOUS_ANTHROPIC_DAILY_CAP. When the cap is exhausted,inference_cloudraisesCapExhaustedand the caller is expected to fall back toinference_local. - Marks the system prompt and every trusted-context block with
cache_control={"type": "ephemeral"}so repeated calls within the cache window pay the input-token discount. - Reserves the system slot for trusted content (controller instructions, self-model claims, structured engine outputs) and confines untrusted content (sensor text, intercepted radio payloads) to the user slot.
The default model is claude-haiku-4-5-20251001. Operators who need the
larger model set NOUS_ANTHROPIC_MODEL_ADVANCED=claude-sonnet-4-6.
Consequences¶
Easier: budget surprises become a configuration knob, not a billing ticket. Cache discipline is enforced by the client, not by every caller.
Harder: the file-locked counter introduces a small but real I/O cost per call. The slot discipline must be respected by every callsite.
Revisit triggers¶
- Anthropic releases a new model that supersedes the current defaults.
- The cache TTL changes (currently 5 minutes for ephemeral).
- A use case needs multi-tenant per-workspace caps; the file lock is single-host today.