diff --git a/docs/concepts/session-pruning.md b/docs/concepts/session-pruning.md
index 0fcb2b78d0a..ba9f39f37f1 100644
--- a/docs/concepts/session-pruning.md
+++ b/docs/concepts/session-pruning.md
@@ -15,13 +15,13 @@ Session pruning trims **old tool results** from the in-memory context right befo
 - When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
 - Only affects the messages sent to the model for that request.
 - Only active for Anthropic API calls (and OpenRouter Anthropic models).
-- For best results, match `ttl` to your model `cacheControlTtl`.
+- For best results, match `ttl` to your model `cacheRetention` policy (`short` = 5m, `long` = 1h).
 - After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again.
 
 ## Smart defaults (Anthropic)
 
 - **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`.
-- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheControlTtl` to `1h` on Anthropic models.
+- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheRetention: "short"` on Anthropic models.
 - If you set any of these values explicitly, OpenClaw does **not** override them.
 
 ## What this improves (cost + cache behavior)
@@ -91,9 +91,7 @@ Default (off):
 
 ```json5
 {
-  agent: {
-    contextPruning: { mode: "off" },
-  },
+  agents: { defaults: { contextPruning: { mode: "off" } } },
 }
 ```
 
@@ -101,9 +99,7 @@ Enable TTL-aware pruning:
 
 ```json5
 {
-  agent: {
-    contextPruning: { mode: "cache-ttl", ttl: "5m" },
-  },
+  agents: { defaults: { contextPruning: { mode: "cache-ttl", ttl: "5m" } } },
 }
 ```
 
@@ -111,10 +107,12 @@ Restrict pruning to specific tools:
 
 ```json5
 {
-  agent: {
-    contextPruning: {
-      mode: "cache-ttl",
-      tools: { allow: ["exec", "read"], deny: ["*image*"] },
+  agents: {
+    defaults: {
+      contextPruning: {
+        mode: "cache-ttl",
+        tools: { allow: ["exec", "read"], deny: ["*image*"] },
+      },
     },
   },
 }
diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md
index 209427ca277..8cafb13839c 100644
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -725,7 +725,8 @@ Time format in system prompt. Default: `auto` (OS preference).
   - Used by the `image` tool path as its vision-model config.
   - Also used as fallback routing when the selected/default model cannot accept image input.
 - `model.primary`: format `provider/model` (e.g. `anthropic/claude-opus-4-6`). If you omit the provider, OpenClaw assumes `anthropic` (deprecated).
-- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific: `temperature`, `maxTokens`).
+- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific, for example `temperature`, `maxTokens`, `cacheRetention`, `context1m`).
+- `params` merge precedence (config): `agents.defaults.models["provider/model"].params` is the base, then `agents.list[].params` (matching agent id) overrides by key.
 - Config writers that mutate these fields (for example `/models set`, `/models set-image`, and fallback add/remove commands) save canonical object form and preserve existing fallback lists when possible.
 - `maxConcurrent`: max parallel agent runs across sessions (each session still serialized). Default: 1.
 
@@ -1050,6 +1051,7 @@ scripts/sandbox-browser-setup.sh   # optional browser image
         workspace: "~/.openclaw/workspace",
         agentDir: "~/.openclaw/agents/main/agent",
         model: "anthropic/claude-opus-4-6", // or { primary, fallbacks }
+        params: { cacheRetention: "none" }, // overrides matching defaults.models params by key
         identity: {
           name: "Samantha",
           theme: "helpful sloth",
@@ -1074,6 +1076,7 @@ scripts/sandbox-browser-setup.sh   # optional browser image
 - `id`: stable agent id (required).
 - `default`: when multiple are set, first wins (warning logged). If none set, first list entry is default.
 - `model`: string form overrides `primary` only; object form `{ primary, fallbacks }` overrides both (`[]` disables global fallbacks). Cron jobs that only override `primary` still inherit default fallbacks unless you set `fallbacks: []`.
+- `params`: per-agent stream params merged over the selected model entry in `agents.defaults.models`. Use this for agent-specific overrides like `cacheRetention`, `temperature`, or `maxTokens` without duplicating the whole model catalog.
 - `identity.avatar`: workspace-relative path, `http(s)` URL, or `data:` URI.
 - `identity` derives defaults: `ackReaction` from `emoji`, `mentionPatterns` from `name`/`emoji`.
 - `subagents.allowAgents`: allowlist of agent ids for `sessions_spawn` (`["*"]` = any; default: same agent only).
diff --git a/docs/providers/anthropic.md b/docs/providers/anthropic.md
index b12780ff022..40f86630dba 100644
--- a/docs/providers/anthropic.md
+++ b/docs/providers/anthropic.md
@@ -67,6 +67,42 @@ Use the `cacheRetention` parameter in your model config:
 
 When using Anthropic API Key authentication, OpenClaw automatically applies `cacheRetention: "short"` (5-minute cache) for all Anthropic models. You can override this by explicitly setting `cacheRetention` in your config.
 
+### Per-agent cacheRetention overrides
+
+Use model-level params as your baseline, then override specific agents via `agents.list[].params`.
+
+```json5
+{
+  agents: {
+    defaults: {
+      model: { primary: "anthropic/claude-opus-4-6" },
+      models: {
+        "anthropic/claude-opus-4-6": {
+          params: { cacheRetention: "long" }, // baseline for most agents
+        },
+      },
+    },
+    list: [
+      { id: "research", default: true },
+      { id: "alerts", params: { cacheRetention: "none" } }, // override for this agent only
+    ],
+  },
+}
+```
+
+Config merge order for cache-related params:
+
+1. `agents.defaults.models["provider/model"].params`
+2. `agents.list[].params` (matching `id`, overrides by key)
+
+This lets one agent keep a long-lived cache while another agent on the same model disables caching to avoid write costs on bursty/low-reuse traffic.
+
+### Bedrock Claude notes
+
+- Anthropic Claude models on Bedrock (`amazon-bedrock/*anthropic.claude*`) accept `cacheRetention` pass-through when configured.
+- Non-Anthropic Bedrock models are forced to `cacheRetention: "none"` at runtime.
+- Anthropic API-key smart defaults also seed `cacheRetention: "short"` for Claude-on-Bedrock model refs when no explicit value is set.
+
 ### Legacy parameter
 
 The older `cacheControlTtl` parameter is still supported for backwards compatibility:
diff --git a/docs/reference/token-use.md b/docs/reference/token-use.md
index 8e05d6ba638..5672eb1929f 100644
--- a/docs/reference/token-use.md
+++ b/docs/reference/token-use.md
@@ -88,6 +88,9 @@ Heartbeat can keep the cache **warm** across idle gaps. If your model cache TTL
 is `1h`, setting the heartbeat interval just under that (e.g., `55m`) can avoid
 re-caching the full prompt, reducing cache write costs.
 
+In multi-agent setups, you can keep one shared model config and tune cache behavior
+per agent with `agents.list[].params.cacheRetention`.
+
 For Anthropic API pricing, cache reads are significantly cheaper than input
 tokens, while cache writes are billed at a higher multiplier. See Anthropic’s
 prompt caching pricing for the latest rates and TTL multipliers:
@@ -108,6 +111,30 @@ agents:
       every: "55m"
 ```
 
+### Example: mixed traffic with per-agent cache strategy
+
+```yaml
+agents:
+  defaults:
+    model:
+      primary: "anthropic/claude-opus-4-6"
+    models:
+      "anthropic/claude-opus-4-6":
+        params:
+          cacheRetention: "long" # default baseline for most agents
+  list:
+    - id: "research"
+      default: true
+      heartbeat:
+        every: "55m" # keep long cache warm for deep sessions
+    - id: "alerts"
+      params:
+        cacheRetention: "none" # avoid cache writes for bursty notifications
+```
+
+`agents.list[].params` merges on top of the selected model's `params`, so you can
+override only `cacheRetention` and inherit other model defaults unchanged.
+
 ### Example: enable Anthropic 1M context beta header
 
 Anthropic's 1M context window is currently beta-gated. OpenClaw can inject the