docs: clarify prompt caching intro

This commit is contained in:
Peter Steinberger
2026-02-24 05:21:55 +00:00
parent cafa8226d7
commit 8ea936cdda

View File

@@ -9,6 +9,10 @@ read_when:
# Prompt caching
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (`cacheWrite`), and later matching requests can read them back (`cacheRead`).
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
For Anthropic pricing details, see: