docs: detail per-agent prompt caching configuration

This commit is contained in:
Peter Steinberger
2026-02-23 18:45:30 +00:00
parent d637fd4801
commit 78e7f41d28
4 changed files with 77 additions and 13 deletions

View File

@@ -88,6 +88,9 @@ Heartbeat can keep the cache **warm** across idle gaps. If your model cache TTL
is `1h`, setting the heartbeat interval just under that (e.g., `55m`) can avoid
re-caching the full prompt, reducing cache write costs.
In multi-agent setups, you can keep one shared model config and tune cache behavior
per agent with `agents.list[].params.cacheRetention`.
For Anthropic API pricing, cache reads are significantly cheaper than input
tokens, while cache writes are billed at a higher multiplier. See Anthropics
prompt caching pricing for the latest rates and TTL multipliers:
@@ -108,6 +111,30 @@ agents:
every: "55m"
```
### Example: mixed traffic with per-agent cache strategy
```yaml
agents:
defaults:
model:
primary: "anthropic/claude-opus-4-6"
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "long" # default baseline for most agents
list:
- id: "research"
default: true
heartbeat:
every: "55m" # keep long cache warm for deep sessions
- id: "alerts"
params:
cacheRetention: "none" # avoid cache writes for bursty notifications
```
`agents.list[].params` merges on top of the selected model's `params`, so you can
override only `cacheRetention` and inherit other model defaults unchanged.
### Example: enable Anthropic 1M context beta header
Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the