docs: detail per-agent prompt caching configuration

This commit is contained in:
Peter Steinberger
2026-02-23 18:45:30 +00:00
parent d637fd4801
commit 78e7f41d28
4 changed files with 77 additions and 13 deletions

View File

@@ -15,13 +15,13 @@ Session pruning trims **old tool results** from the in-memory context right befo
- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
- Only affects the messages sent to the model for that request.
- Only active for Anthropic API calls (and OpenRouter Anthropic models).
- For best results, match `ttl` to your model `cacheControlTtl`.
- For best results, match `ttl` to your model `cacheRetention` policy (`short` = 5m, `long` = 1h).
- After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again.
## Smart defaults (Anthropic)
- **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`.
- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheControlTtl` to `1h` on Anthropic models.
- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheRetention: "short"` on Anthropic models.
- If you set any of these values explicitly, OpenClaw does **not** override them.
## What this improves (cost + cache behavior)
@@ -91,9 +91,7 @@ Default (off):
```json5
{
agent: {
contextPruning: { mode: "off" },
},
agents: { defaults: { contextPruning: { mode: "off" } } },
}
```
@@ -101,9 +99,7 @@ Enable TTL-aware pruning:
```json5
{
agent: {
contextPruning: { mode: "cache-ttl", ttl: "5m" },
},
agents: { defaults: { contextPruning: { mode: "cache-ttl", ttl: "5m" } } },
}
```
@@ -111,10 +107,12 @@ Restrict pruning to specific tools:
```json5
{
agent: {
contextPruning: {
mode: "cache-ttl",
tools: { allow: ["exec", "read"], deny: ["*image*"] },
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
tools: { allow: ["exec", "read"], deny: ["*image*"] },
},
},
},
}

View File

@@ -725,7 +725,8 @@ Time format in system prompt. Default: `auto` (OS preference).
- Used by the `image` tool path as its vision-model config.
- Also used as fallback routing when the selected/default model cannot accept image input.
- `model.primary`: format `provider/model` (e.g. `anthropic/claude-opus-4-6`). If you omit the provider, OpenClaw assumes `anthropic` (deprecated).
- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific: `temperature`, `maxTokens`).
- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific, for example `temperature`, `maxTokens`, `cacheRetention`, `context1m`).
- `params` merge precedence (config): `agents.defaults.models["provider/model"].params` is the base, then `agents.list[].params` (matching agent id) overrides by key.
- Config writers that mutate these fields (for example `/models set`, `/models set-image`, and fallback add/remove commands) save canonical object form and preserve existing fallback lists when possible.
- `maxConcurrent`: max parallel agent runs across sessions (each session still serialized). Default: 1.
@@ -1050,6 +1051,7 @@ scripts/sandbox-browser-setup.sh # optional browser image
workspace: "~/.openclaw/workspace",
agentDir: "~/.openclaw/agents/main/agent",
model: "anthropic/claude-opus-4-6", // or { primary, fallbacks }
params: { cacheRetention: "none" }, // overrides matching defaults.models params by key
identity: {
name: "Samantha",
theme: "helpful sloth",
@@ -1074,6 +1076,7 @@ scripts/sandbox-browser-setup.sh # optional browser image
- `id`: stable agent id (required).
- `default`: when multiple are set, first wins (warning logged). If none set, first list entry is default.
- `model`: string form overrides `primary` only; object form `{ primary, fallbacks }` overrides both (`[]` disables global fallbacks). Cron jobs that only override `primary` still inherit default fallbacks unless you set `fallbacks: []`.
- `params`: per-agent stream params merged over the selected model entry in `agents.defaults.models`. Use this for agent-specific overrides like `cacheRetention`, `temperature`, or `maxTokens` without duplicating the whole model catalog.
- `identity.avatar`: workspace-relative path, `http(s)` URL, or `data:` URI.
- `identity` derives defaults: `ackReaction` from `emoji`, `mentionPatterns` from `name`/`emoji`.
- `subagents.allowAgents`: allowlist of agent ids for `sessions_spawn` (`["*"]` = any; default: same agent only).

View File

@@ -67,6 +67,42 @@ Use the `cacheRetention` parameter in your model config:
When using Anthropic API Key authentication, OpenClaw automatically applies `cacheRetention: "short"` (5-minute cache) for all Anthropic models. You can override this by explicitly setting `cacheRetention` in your config.
### Per-agent cacheRetention overrides
Use model-level params as your baseline, then override specific agents via `agents.list[].params`.
```json5
{
agents: {
defaults: {
model: { primary: "anthropic/claude-opus-4-6" },
models: {
"anthropic/claude-opus-4-6": {
params: { cacheRetention: "long" }, // baseline for most agents
},
},
},
list: [
{ id: "research", default: true },
{ id: "alerts", params: { cacheRetention: "none" } }, // override for this agent only
],
},
}
```
Config merge order for cache-related params:
1. `agents.defaults.models["provider/model"].params`
2. `agents.list[].params` (matching `id`, overrides by key)
This lets one agent keep a long-lived cache while another agent on the same model disables caching to avoid write costs on bursty/low-reuse traffic.
### Bedrock Claude notes
- Anthropic Claude models on Bedrock (`amazon-bedrock/*anthropic.claude*`) accept `cacheRetention` pass-through when configured.
- Non-Anthropic Bedrock models are forced to `cacheRetention: "none"` at runtime.
- Anthropic API-key smart defaults also seed `cacheRetention: "short"` for Claude-on-Bedrock model refs when no explicit value is set.
### Legacy parameter
The older `cacheControlTtl` parameter is still supported for backwards compatibility:

View File

@@ -88,6 +88,9 @@ Heartbeat can keep the cache **warm** across idle gaps. If your model cache TTL
is `1h`, setting the heartbeat interval just under that (e.g., `55m`) can avoid
re-caching the full prompt, reducing cache write costs.
In multi-agent setups, you can keep one shared model config and tune cache behavior
per agent with `agents.list[].params.cacheRetention`.
For Anthropic API pricing, cache reads are significantly cheaper than input
tokens, while cache writes are billed at a higher multiplier. See Anthropics
prompt caching pricing for the latest rates and TTL multipliers:
@@ -108,6 +111,30 @@ agents:
every: "55m"
```
### Example: mixed traffic with per-agent cache strategy
```yaml
agents:
defaults:
model:
primary: "anthropic/claude-opus-4-6"
models:
"anthropic/claude-opus-4-6":
params:
cacheRetention: "long" # default baseline for most agents
list:
- id: "research"
default: true
heartbeat:
every: "55m" # keep long cache warm for deep sessions
- id: "alerts"
params:
cacheRetention: "none" # avoid cache writes for bursty notifications
```
`agents.list[].params` merges on top of the selected model's `params`, so you can
override only `cacheRetention` and inherit other model defaults unchanged.
### Example: enable Anthropic 1M context beta header
Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the