diff --git a/docs/concepts/session-pruning.md b/docs/concepts/session-pruning.md index 0fcb2b78d0a..ba9f39f37f1 100644 --- a/docs/concepts/session-pruning.md +++ b/docs/concepts/session-pruning.md @@ -15,13 +15,13 @@ Session pruning trims **old tool results** from the in-memory context right befo - When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`. - Only affects the messages sent to the model for that request. - Only active for Anthropic API calls (and OpenRouter Anthropic models). -- For best results, match `ttl` to your model `cacheControlTtl`. +- For best results, match `ttl` to your model `cacheRetention` policy (`short` = 5m, `long` = 1h). - After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again. ## Smart defaults (Anthropic) - **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`. -- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheControlTtl` to `1h` on Anthropic models. +- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheRetention: "short"` on Anthropic models. - If you set any of these values explicitly, OpenClaw does **not** override them. ## What this improves (cost + cache behavior) @@ -91,9 +91,7 @@ Default (off): ```json5 { - agent: { - contextPruning: { mode: "off" }, - }, + agents: { defaults: { contextPruning: { mode: "off" } } }, } ``` @@ -101,9 +99,7 @@ Enable TTL-aware pruning: ```json5 { - agent: { - contextPruning: { mode: "cache-ttl", ttl: "5m" }, - }, + agents: { defaults: { contextPruning: { mode: "cache-ttl", ttl: "5m" } } }, } ``` @@ -111,10 +107,12 @@ Restrict pruning to specific tools: ```json5 { - agent: { - contextPruning: { - mode: "cache-ttl", - tools: { allow: ["exec", "read"], deny: ["*image*"] }, + agents: { + defaults: { + contextPruning: { + mode: "cache-ttl", + tools: { allow: ["exec", "read"], deny: ["*image*"] }, + }, }, }, } diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index 209427ca277..8cafb13839c 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -725,7 +725,8 @@ Time format in system prompt. Default: `auto` (OS preference). - Used by the `image` tool path as its vision-model config. - Also used as fallback routing when the selected/default model cannot accept image input. - `model.primary`: format `provider/model` (e.g. `anthropic/claude-opus-4-6`). If you omit the provider, OpenClaw assumes `anthropic` (deprecated). -- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific: `temperature`, `maxTokens`). +- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific, for example `temperature`, `maxTokens`, `cacheRetention`, `context1m`). +- `params` merge precedence (config): `agents.defaults.models["provider/model"].params` is the base, then `agents.list[].params` (matching agent id) overrides by key. - Config writers that mutate these fields (for example `/models set`, `/models set-image`, and fallback add/remove commands) save canonical object form and preserve existing fallback lists when possible. - `maxConcurrent`: max parallel agent runs across sessions (each session still serialized). Default: 1. @@ -1050,6 +1051,7 @@ scripts/sandbox-browser-setup.sh # optional browser image workspace: "~/.openclaw/workspace", agentDir: "~/.openclaw/agents/main/agent", model: "anthropic/claude-opus-4-6", // or { primary, fallbacks } + params: { cacheRetention: "none" }, // overrides matching defaults.models params by key identity: { name: "Samantha", theme: "helpful sloth", @@ -1074,6 +1076,7 @@ scripts/sandbox-browser-setup.sh # optional browser image - `id`: stable agent id (required). - `default`: when multiple are set, first wins (warning logged). If none set, first list entry is default. - `model`: string form overrides `primary` only; object form `{ primary, fallbacks }` overrides both (`[]` disables global fallbacks). Cron jobs that only override `primary` still inherit default fallbacks unless you set `fallbacks: []`. +- `params`: per-agent stream params merged over the selected model entry in `agents.defaults.models`. Use this for agent-specific overrides like `cacheRetention`, `temperature`, or `maxTokens` without duplicating the whole model catalog. - `identity.avatar`: workspace-relative path, `http(s)` URL, or `data:` URI. - `identity` derives defaults: `ackReaction` from `emoji`, `mentionPatterns` from `name`/`emoji`. - `subagents.allowAgents`: allowlist of agent ids for `sessions_spawn` (`["*"]` = any; default: same agent only). diff --git a/docs/providers/anthropic.md b/docs/providers/anthropic.md index b12780ff022..40f86630dba 100644 --- a/docs/providers/anthropic.md +++ b/docs/providers/anthropic.md @@ -67,6 +67,42 @@ Use the `cacheRetention` parameter in your model config: When using Anthropic API Key authentication, OpenClaw automatically applies `cacheRetention: "short"` (5-minute cache) for all Anthropic models. You can override this by explicitly setting `cacheRetention` in your config. +### Per-agent cacheRetention overrides + +Use model-level params as your baseline, then override specific agents via `agents.list[].params`. + +```json5 +{ + agents: { + defaults: { + model: { primary: "anthropic/claude-opus-4-6" }, + models: { + "anthropic/claude-opus-4-6": { + params: { cacheRetention: "long" }, // baseline for most agents + }, + }, + }, + list: [ + { id: "research", default: true }, + { id: "alerts", params: { cacheRetention: "none" } }, // override for this agent only + ], + }, +} +``` + +Config merge order for cache-related params: + +1. `agents.defaults.models["provider/model"].params` +2. `agents.list[].params` (matching `id`, overrides by key) + +This lets one agent keep a long-lived cache while another agent on the same model disables caching to avoid write costs on bursty/low-reuse traffic. + +### Bedrock Claude notes + +- Anthropic Claude models on Bedrock (`amazon-bedrock/*anthropic.claude*`) accept `cacheRetention` pass-through when configured. +- Non-Anthropic Bedrock models are forced to `cacheRetention: "none"` at runtime. +- Anthropic API-key smart defaults also seed `cacheRetention: "short"` for Claude-on-Bedrock model refs when no explicit value is set. + ### Legacy parameter The older `cacheControlTtl` parameter is still supported for backwards compatibility: diff --git a/docs/reference/token-use.md b/docs/reference/token-use.md index 8e05d6ba638..5672eb1929f 100644 --- a/docs/reference/token-use.md +++ b/docs/reference/token-use.md @@ -88,6 +88,9 @@ Heartbeat can keep the cache **warm** across idle gaps. If your model cache TTL is `1h`, setting the heartbeat interval just under that (e.g., `55m`) can avoid re-caching the full prompt, reducing cache write costs. +In multi-agent setups, you can keep one shared model config and tune cache behavior +per agent with `agents.list[].params.cacheRetention`. + For Anthropic API pricing, cache reads are significantly cheaper than input tokens, while cache writes are billed at a higher multiplier. See Anthropic’s prompt caching pricing for the latest rates and TTL multipliers: @@ -108,6 +111,30 @@ agents: every: "55m" ``` +### Example: mixed traffic with per-agent cache strategy + +```yaml +agents: + defaults: + model: + primary: "anthropic/claude-opus-4-6" + models: + "anthropic/claude-opus-4-6": + params: + cacheRetention: "long" # default baseline for most agents + list: + - id: "research" + default: true + heartbeat: + every: "55m" # keep long cache warm for deep sessions + - id: "alerts" + params: + cacheRetention: "none" # avoid cache writes for bursty notifications +``` + +`agents.list[].params` merges on top of the selected model's `params`, so you can +override only `cacheRetention` and inherit other model defaults unchanged. + ### Example: enable Anthropic 1M context beta header Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the