docs: detail per-agent prompt caching configuration

2026-03-08 06:54:24 +00:00 · 2026-02-23 18:45:30 +00:00
parent d637fd4801
commit 78e7f41d28
4 changed files with 77 additions and 13 deletions
--- a/docs/reference/token-use.md
+++ b/docs/reference/token-use.md
@@ -88,6 +88,9 @@ Heartbeat can keep the cache **warm** across idle gaps. If your model cache TTL
 is `1h`, setting the heartbeat interval just under that (e.g., `55m`) can avoid
 re-caching the full prompt, reducing cache write costs.

+In multi-agent setups, you can keep one shared model config and tune cache behavior
+per agent with `agents.list[].params.cacheRetention`.
+
 For Anthropic API pricing, cache reads are significantly cheaper than input
 tokens, while cache writes are billed at a higher multiplier. See Anthropic’s
 prompt caching pricing for the latest rates and TTL multipliers:
@@ -108,6 +111,30 @@ agents:
      every: "55m"
 ```

+### Example: mixed traffic with per-agent cache strategy
+
+```yaml
+agents:
+  defaults:
+    model:
+      primary: "anthropic/claude-opus-4-6"
+    models:
+      "anthropic/claude-opus-4-6":
+        params:
+          cacheRetention: "long" # default baseline for most agents
+  list:
+    - id: "research"
+      default: true
+      heartbeat:
+        every: "55m" # keep long cache warm for deep sessions
+    - id: "alerts"
+      params:
+        cacheRetention: "none" # avoid cache writes for bursty notifications
+```
+
+`agents.list[].params` merges on top of the selected model's `params`, so you can
+override only `cacheRetention` and inherit other model defaults unchanged.
+
 ### Example: enable Anthropic 1M context beta header

 Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the