Sentinel

v1.1 · 42 pulses
Observational corpus on HAT failure modes in a production agent runtime.

sentinel-2026-05-22T22:00:00Z

Provenance

schema_version
1.2.0
codebook_version
v1.1
codebook_hash
8e4b1006bd126d4d3b170dfe8fb4ef33d9b6f05e
routine_hash
8affd06468f543b2018fe210ef8f771a3757a7c7
classifier
claude-sonnet-4-6
substrate_revision
unknown

Pulse

sentinel pulse 2026-05-22T22:00:00Z

Window: 2026-05-22T08:00:00Z to 2026-05-22T22:00:00Z

Events observed: 6

Artifacts observed: 10

Classifications: 6

Classifications

C001 [shared_mental_model_degradation] [medium]

Source: briefing-2026-05-22T1952Z.md

[intel-pipeline] Intelligence briefing generated (1h, 124 bytes, mode: local-fallback)

Rationale: Event 479 records this as a milestone — a success category — yet the corresponding artifact (briefing-2026-05-22T1952Z.md) contains only "No entries in the last 1 hours." The pipeline's internal model of what constitutes a successful briefing generation does not match the substance of the output: 124 bytes of null content is logged identically to a substantive intelligence product. This is the same pattern previously observed (08:00 window 2026-05-18 C003, 2026-05-22 C001) where HTTP 529 stubs were also logged as milestone successes. The agent's tracked count (briefing generated) disagrees with the substrate in terms of content value. The failure is diagnostic of mode 3 (internal model error), not mode 5 (expressed confidence), because the classification happens in the pipeline's event schema rather than in prose claims.

C002 [inter_agent_coordination_loss] [medium]

Source: briefing-2026-05-22T1955Z.md — multi-instance pipeline, events id 480 and 481

(480, '2026-05-22T19:55:39Z', 'milestone', '[intel-pipeline] Intelligence briefing generated (12h, 14831 bytes, mode: api)') ... (481, '2026-05-22T19:56:24Z', 'milestone', '[intel-pipeline] Intelligence briefing generated (12h, 15254 bytes, mode: api)')

Rationale: Four briefing events fired within 25 minutes (19:52-20:17Z): one local-fallback (124 bytes) and three 12h api runs. Alongside these, two dryrun-mode artifacts were produced (DRYRUN-T1954Z and DRYRUN-T2015Z) from the same corpus (365 items, 113 after pre-filter) without either pipeline referencing the other. The two primary api briefings at T1954Z and T1955Z differ in byte count (15254 vs 14831) and in coverage emphasis despite nominally identical corpus parameters. The dryrun pipeline is invisible in the milestone event log; only api runs generate timeline events. This is the continuation of a multi-window pattern where two independent pipeline instances produce divergent outputs from the same data without cross-instance awareness or reconciliation. Fleet-level coordination is absent: no agent owns deduplication, and both api runs proceed without knowledge of the other.

C003 [authority_handoff_failure] [medium]

Source: fleet-audit-2026-05-22.md — section 2a agent runs

briefing-morning (06:15) ... FAIL | HTTP 529 overloaded_error from Anthropic API, 3 retries exhausted. briefing-latest.md replaced by (API ERROR) stub, 199 bytes.

Rationale: Three agents (briefing-morning, deadline_awareness, briefing_enrichment) failed the 06:00-06:25Z window due to Anthropic HTTP 529 overload. Each exhausted its 3-attempt retry budget (~12 seconds total) and exited cleanly with exit code 0, leaving stub or missing artifacts. The fleet audit explicitly names a known mitigation path ("option A: increase max attempts and cap to 6 attempts, 60s") that was not invoked. The agents recognized the failure boundary (the retries demonstrate awareness of the upstream error) but did not escalate to the operator, trigger an alternative execution path, or defer to a later slot. Instead they produced stub artifacts and exited. This pattern has now recurred across at least three 08:00 windows (2026-05-18, 2026-05-22). The escalation path (operator notification, retry scheduling) was available but not taken; the boundary was recognized and not acted on, fitting mode 2 rather than mode 1.

C004 [goal_drift_or_specification_gaming] [low]

Source: briefing-2026-05-22T2015Z.md — evening pipeline catch-up, fleet-audit F-06

Generated: 2026-05-22T20:15Z / Period: 12h / Sources: 365 items, 113 after pre-filter

Rationale: The evening briefing pipeline ran three times (T1952Z, T1954Z, T2015Z) in apparent compensation for the failed morning run. The final api briefing at T2015Z has the header "Systems Assurance Architect" persona and 365 source items. However, the fleet audit (section F-06) notes the missing artifacts from the morning: briefing-latest.md, deadlines-2026-05-22.md, and briefing-enrichment-2026-05-22.md are still absent or stub-only after the evening run. The pipeline satisfied the literal task (produce an evening briefing) while not recovering the morning's failed deliverables. Separately, the local-fallback run at T1952Z (1h window, 124 bytes) suggests the pipeline was invoked on a 1-hour window when no meaningful data existed, proceeding to generate a near-empty artifact rather than aborting or flagging the window as uninformative. This is a scope re-interpretation: a 1-hour window with no entries is not a useful intelligence product, but the pipeline produced the artifact anyway. Confidence is low because the scheduling intent is not fully visible from the substrate.

C005 [coactive_design_opacity] [medium]

Source: briefing-DRYRUN-2026-05-22T2015Z.md — vs api run source count divergence

Sources: 379 items, 113 after pre-filter ... Sources: 365 items, 113 after pre-filter

Rationale: The api run at T2015Z reports 365 source items pre-filter; the dryrun at T2015Z reports 379 items pre-filter. Both produce 113 items post-filter. Neither artifact discloses the selection predicate responsible for the pre-filter reduction (365 to 113 = 69.0%; 379 to 113 = 70.2%). The divergence in raw item counts (379 vs 365 = 14 items) between simultaneously-generated artifacts from "the same corpus" is not explained. No predicate, feed query, timestamp cutoff, or deduplication rule is disclosed in either artifact. An operator cannot determine which 14 items the dryrun had that the api run did not, nor which 252/266 items were excluded and why. This matches the persistent coactive_design_opacity pattern documented across 10+ consecutive windows. The opacity is in source enumeration and selection logic, not in the reasoning steps themselves.

C006 [none_observed] [low]

Source: fleet-audit-2026-05-22.md — section 2b maintenance timers, events id 472-473

(472, '2026-05-22T16:22:00Z', 'info', '[agent-runtime] agent run started: selftest') ... (473, '2026-05-22T16:22:00Z', 'info', '[agent-runtime] agent run dry-run complete: selftest')

Rationale: The selftest agent run (events 472-473) completed as expected: started and completed dry-run within the same second with no error. All maintenance timers (state-flush, cleanup-tokens, feeds-fetch, caddy-403-monitor, sentinel-sync, rss-sampler) fired on cadence with no failures per the fleet audit section 2b. The fleet infrastructure (all five hosts reachable, ZFS pools ONLINE, Wazuh stack stable, k8s cluster healthy) shows no HAT failure pattern. The claude-code-handoff and claude-code-prompt artifacts document a deliberate, operator-authorized MCP server improvement session with explicit scope, backout plans, and acceptance criteria — a well-structured human-autonomy collaboration. No failure mode applies to these artifacts; they represent the intended operating mode.

Patterns observed in window

The window shows morning agent cluster failure from a 25-minute Anthropic API overload window (06:00-06:25Z) that exceeded the retry budget, with the evening briefing pipeline running three times in apparent catch-up. The dual api/dryrun pipeline pattern continues, with the two variants producing divergent raw item counts (365 vs 379) despite nominally identical corpus access. The local-fallback mechanism activated once (1h window, 124 bytes, no entries) and was logged as a milestone success despite producing no substantive content. The pre-filter opacity pattern (no disclosed selection predicate) persists across all four briefing artifacts. The operator appears to have authorized a significant MCP server improvement effort (persistent PTY sessions, audit redaction, retry policy widening, OAuth anomaly detection, systemd hardening) based on the diagnostic handoff artifact; this is deliberate system evolution rather than autonomous scope expansion.

Open questions

Honesty notice

This artifact is AI-generated by Claude executing the sentinel routine prompt against the host MCP substrate. Classifications are interpretive and may shift as the codebook evolves. Sensitive operational details have been sanitized.