test: add live cache provider probes

This commit is contained in:
Peter Steinberger
2026-04-04 12:46:00 +09:00
parent efefa5560d
commit ca99ad0af8
5 changed files with 544 additions and 4 deletions

View File

@@ -26,7 +26,7 @@ openclaw models scan
`openclaw models status` shows the resolved default/fallbacks plus an auth overview.
When provider usage snapshots are available, the OAuth/token status section includes
provider usage headers.
provider usage windows and quota snapshots.
Add `--probe` to run live auth probes against each configured provider profile.
Probes are real requests (may consume tokens and trigger rate limits).
Use `--agent <id>` to inspect a configured agents model/auth state. When omitted,

View File

@@ -9,14 +9,18 @@ read_when:
# Prompt caching
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (`cacheWrite`), and later matching requests can read them back (`cacheRead`).
Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. OpenClaw normalizes provider usage into `cacheRead` and `cacheWrite` where the upstream API exposes those counters directly.
Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
This page covers all cache-related knobs that affect prompt reuse and token cost.
For Anthropic pricing details, see:
[https://docs.anthropic.com/docs/build-with-claude/prompt-caching](https://docs.anthropic.com/docs/build-with-claude/prompt-caching)
Provider references:
- Anthropic prompt caching: [https://platform.claude.com/docs/en/build-with-claude/prompt-caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
- OpenAI prompt caching: [https://developers.openai.com/api/docs/guides/prompt-caching](https://developers.openai.com/api/docs/guides/prompt-caching)
- OpenAI API headers and request IDs: [https://developers.openai.com/api/reference/overview](https://developers.openai.com/api/reference/overview)
- Anthropic request IDs and errors: [https://platform.claude.com/docs/en/api/errors](https://platform.claude.com/docs/en/api/errors)
## Primary knobs
@@ -100,6 +104,16 @@ Per-agent heartbeat is supported at `agents.list[].heartbeat`.
- `cacheRetention` is supported.
- With Anthropic API-key auth profiles, OpenClaw seeds `cacheRetention: "short"` for Anthropic model refs when unset.
- Anthropic native Messages responses expose both `cache_read_input_tokens` and `cache_creation_input_tokens`, so OpenClaw can show both `cacheRead` and `cacheWrite`.
- For native Anthropic requests, `cacheRetention: "short"` maps to the default 5-minute ephemeral cache, and `cacheRetention: "long"` upgrades to the 1-hour TTL only on direct `api.anthropic.com` hosts.
### OpenAI (direct API)
- Prompt caching is automatic on supported recent models. OpenClaw does not need to inject block-level cache markers.
- OpenClaw uses `prompt_cache_key` to keep cache routing stable across turns and uses `prompt_cache_retention: "24h"` only when `cacheRetention: "long"` is selected on direct OpenAI hosts.
- OpenAI responses expose cached prompt tokens via `usage.prompt_tokens_details.cached_tokens` (or `input_tokens_details.cached_tokens` on Responses API events). OpenClaw maps that to `cacheRead`.
- OpenAI does not expose a separate cache-write token counter, so `cacheWrite` stays `0` on OpenAI paths even when the provider is warming a cache.
- OpenAI returns useful tracing and rate-limit headers such as `x-request-id`, `openai-processing-ms`, and `x-ratelimit-*`, but cache-hit accounting should come from the usage payload, not from headers.
### Amazon Bedrock
@@ -180,10 +194,15 @@ Defaults:
- Cache trace events are JSONL and include staged snapshots like `session:loaded`, `prompt:before`, `stream:context`, and `session:after`.
- Per-turn cache token impact is visible in normal usage surfaces via `cacheRead` and `cacheWrite` (for example `/usage full` and session usage summaries).
- For Anthropic, expect both `cacheRead` and `cacheWrite` when caching is active.
- For OpenAI, expect `cacheRead` on cache hits and `cacheWrite` to remain `0`; OpenAI does not publish a separate cache-write token field.
- If you need request tracing, log request IDs and rate-limit headers separately from cache metrics. OpenClaw's current cache-trace output is focused on prompt/session shape and normalized token usage rather than raw provider response headers.
## Quick troubleshooting
- High `cacheWrite` on most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings.
- High `cacheWrite` on Anthropic: often means the cache breakpoint is landing on content that changes every request.
- Low OpenAI `cacheRead`: verify the stable prefix is at the front, the repeated prefix is at least 1024 tokens, and the same `prompt_cache_key` is reused for turns that should share a cache.
- No effect from `cacheRetention`: confirm model key matches `agents.defaults.models["provider/model"]`.
- Bedrock Nova/Mistral requests with cache settings: expected runtime force to `none`.

View File

@@ -0,0 +1,194 @@
import { completeSimple, type Api, type AssistantMessage, type Model } from "@mariozechner/pi-ai";
import { loadConfig } from "../config/config.js";
import { isTruthyEnvValue } from "../infra/env.js";
import { resolveOpenClawAgentDir } from "./agent-paths.js";
import { collectProviderApiKeys } from "./live-auth-keys.js";
import { isLiveTestEnabled } from "./live-test-helpers.js";
import { getApiKeyForModel, requireApiKey } from "./model-auth.js";
import { normalizeProviderId, parseModelRef } from "./model-selection.js";
import { ensureOpenClawModelsJson } from "./models-config.js";
import { discoverAuthStorage, discoverModels } from "./pi-model-discovery.js";
export const LIVE_CACHE_TEST_ENABLED =
isLiveTestEnabled() && isTruthyEnvValue(process.env.OPENCLAW_LIVE_CACHE_TEST);
const DEFAULT_HEARTBEAT_MS = 20_000;
const DEFAULT_TIMEOUT_MS = 90_000;
type LiveResolvedModel = {
apiKey: string;
model: Model<Api>;
};
function toInt(value: string | undefined, fallback: number): number {
const trimmed = value?.trim();
if (!trimmed) {
return fallback;
}
const parsed = Number.parseInt(trimmed, 10);
return Number.isFinite(parsed) ? parsed : fallback;
}
export function logLiveCache(message: string): void {
process.stderr.write(`[live-cache] ${message}\n`);
}
export async function withLiveCacheHeartbeat<T>(
operation: Promise<T>,
context: string,
): Promise<T> {
const heartbeatMs = Math.max(
1_000,
toInt(process.env.OPENCLAW_LIVE_HEARTBEAT_MS, DEFAULT_HEARTBEAT_MS),
);
const startedAt = Date.now();
let heartbeatCount = 0;
const timer = setInterval(() => {
heartbeatCount += 1;
logLiveCache(
`${context}: still running (${Math.max(1, Math.round((Date.now() - startedAt) / 1_000))}s)`,
);
}, heartbeatMs);
timer.unref?.();
try {
return await operation;
} finally {
clearInterval(timer);
if (heartbeatCount > 0) {
logLiveCache(
`${context}: completed (${Math.max(1, Math.round((Date.now() - startedAt) / 1_000))}s)`,
);
}
}
}
export async function completeSimpleWithLiveTimeout<TApi extends Api>(
model: Model<TApi>,
context: Parameters<typeof completeSimple<TApi>>[1],
options: Parameters<typeof completeSimple<TApi>>[2],
progressContext: string,
timeoutMs = Math.max(
1_000,
toInt(process.env.OPENCLAW_LIVE_MODEL_TIMEOUT_MS, DEFAULT_TIMEOUT_MS),
),
): Promise<AssistantMessage> {
const controller = new AbortController();
const abortTimer = setTimeout(() => controller.abort(), timeoutMs);
abortTimer.unref?.();
let hardTimer: ReturnType<typeof setTimeout> | undefined;
const timeout = new Promise<never>((_, reject) => {
hardTimer = setTimeout(() => {
reject(new Error(`${progressContext} timed out after ${timeoutMs}ms`));
}, timeoutMs);
hardTimer.unref?.();
});
try {
return await withLiveCacheHeartbeat(
Promise.race([
completeSimple(model, context, {
...options,
signal: controller.signal,
}),
timeout,
]),
progressContext,
);
} finally {
clearTimeout(abortTimer);
if (hardTimer) {
clearTimeout(hardTimer);
}
}
}
export function buildStableCachePrefix(tag: string, sections = 160): string {
const lines = [
`Stable cache prefix for ${tag}.`,
"Preserve this prefix byte-for-byte across retries.",
"Return only the requested marker from the final user message.",
];
for (let index = 0; index < sections; index += 1) {
lines.push(
`Section ${index + 1}: deterministic cache prose with repeated lexical material about routing, invariants, transcript stability, prefix locality, provider usage accounting, and session affinity.`,
);
}
return lines.join("\n");
}
export function extractAssistantText(message: AssistantMessage): string {
return message.content
.filter((block) => block.type === "text")
.map((block) => block.text.trim())
.filter(Boolean)
.join(" ");
}
export function computeCacheHitRate(usage: {
input?: number;
cacheRead?: number;
cacheWrite?: number;
}): number {
const input = usage.input ?? 0;
const cacheRead = usage.cacheRead ?? 0;
const cacheWrite = usage.cacheWrite ?? 0;
const totalPrompt = input + cacheRead + cacheWrite;
if (totalPrompt <= 0 || cacheRead <= 0) {
return 0;
}
return cacheRead / totalPrompt;
}
export async function resolveLiveDirectModel(params: {
provider: "anthropic" | "openai";
api: "anthropic-messages" | "openai-responses";
envVar: string;
preferredModelIds: readonly string[];
}): Promise<LiveResolvedModel> {
const cfg = loadConfig();
await ensureOpenClawModelsJson(cfg);
const agentDir = resolveOpenClawAgentDir();
const authStorage = discoverAuthStorage(agentDir);
const models = discoverModels(authStorage, agentDir).getAll();
const rawModel = process.env[params.envVar]?.trim();
const parsed = rawModel ? parseModelRef(rawModel, params.provider) : null;
const candidates = models.filter(
(model) => normalizeProviderId(model.provider) === params.provider && model.api === params.api,
);
let resolvedModel: Model<Api> | undefined;
if (parsed) {
resolvedModel = candidates.find(
(model) =>
normalizeProviderId(model.provider) === parsed.provider && model.id === parsed.model,
);
}
if (!resolvedModel) {
resolvedModel = params.preferredModelIds
.map((id) => candidates.find((model) => model.id === id))
.find(Boolean);
}
if (!resolvedModel) {
throw new Error(
rawModel
? `Model not found for ${params.provider}: ${rawModel}`
: `No ${params.provider} ${params.api} model available in registry.`,
);
}
const liveKeys = collectProviderApiKeys(params.provider);
const apiKey =
liveKeys[0] ??
requireApiKey(
await getApiKeyForModel({
model: resolvedModel,
cfg,
agentDir,
}),
resolvedModel.provider,
);
return {
model: resolvedModel,
apiKey,
};
}

View File

@@ -0,0 +1,234 @@
import type { AssistantMessage } from "@mariozechner/pi-ai";
import { beforeAll, describe, expect, it } from "vitest";
import {
buildStableCachePrefix,
completeSimpleWithLiveTimeout,
computeCacheHitRate,
extractAssistantText,
LIVE_CACHE_TEST_ENABLED,
logLiveCache,
resolveLiveDirectModel,
} from "./live-cache-test-support.js";
const describeCacheLive = LIVE_CACHE_TEST_ENABLED ? describe : describe.skip;
const OPENAI_TIMEOUT_MS = 120_000;
const ANTHROPIC_TIMEOUT_MS = 120_000;
const OPENAI_SESSION_ID = "live-cache-openai-stable-session";
const ANTHROPIC_SESSION_ID = "live-cache-anthropic-stable-session";
const OPENAI_PREFIX = buildStableCachePrefix("openai");
const ANTHROPIC_PREFIX = buildStableCachePrefix("anthropic");
type CacheRun = {
hitRate: number;
suffix: string;
text: string;
usage: AssistantMessage["usage"];
};
async function runOpenAiCacheProbe(params: {
apiKey: string;
model: Awaited<ReturnType<typeof resolveLiveDirectModel>>["model"];
sessionId: string;
suffix: string;
}): Promise<CacheRun> {
const response = await completeSimpleWithLiveTimeout(
params.model,
{
systemPrompt: OPENAI_PREFIX,
messages: [
{
role: "user",
content: `Reply with exactly CACHE-OK ${params.suffix}.`,
timestamp: Date.now(),
},
],
},
{
apiKey: params.apiKey,
cacheRetention: "short",
sessionId: params.sessionId,
maxTokens: 32,
temperature: 0,
reasoning: "none",
},
`openai cache probe ${params.suffix}`,
OPENAI_TIMEOUT_MS,
);
const text = extractAssistantText(response);
expect(text.toLowerCase()).toContain(params.suffix.toLowerCase());
return {
suffix: params.suffix,
text,
usage: response.usage,
hitRate: computeCacheHitRate(response.usage),
};
}
async function runAnthropicCacheProbe(params: {
apiKey: string;
model: Awaited<ReturnType<typeof resolveLiveDirectModel>>["model"];
sessionId: string;
suffix: string;
cacheRetention: "none" | "short" | "long";
}): Promise<CacheRun> {
const response = await completeSimpleWithLiveTimeout(
params.model,
{
systemPrompt: ANTHROPIC_PREFIX,
messages: [
{
role: "user",
content: `Reply with exactly CACHE-OK ${params.suffix}.`,
timestamp: Date.now(),
},
],
},
{
apiKey: params.apiKey,
cacheRetention: params.cacheRetention,
sessionId: params.sessionId,
maxTokens: 32,
temperature: 0,
},
`anthropic cache probe ${params.suffix} (${params.cacheRetention})`,
ANTHROPIC_TIMEOUT_MS,
);
const text = extractAssistantText(response);
expect(text.toLowerCase()).toContain(params.suffix.toLowerCase());
return {
suffix: params.suffix,
text,
usage: response.usage,
hitRate: computeCacheHitRate(response.usage),
};
}
describeCacheLive("pi embedded runner prompt caching (live)", () => {
describe("openai", () => {
let fixture: Awaited<ReturnType<typeof resolveLiveDirectModel>>;
beforeAll(async () => {
fixture = await resolveLiveDirectModel({
provider: "openai",
api: "openai-responses",
envVar: "OPENCLAW_LIVE_OPENAI_CACHE_MODEL",
preferredModelIds: ["gpt-5.4-mini", "gpt-5.4", "gpt-5.2"],
});
logLiveCache(`openai model=${fixture.model.provider}/${fixture.model.id}`);
}, 120_000);
it(
"hits a high cache-read rate on repeated stable prefixes",
async () => {
const warmup = await runOpenAiCacheProbe({
...fixture,
sessionId: OPENAI_SESSION_ID,
suffix: "warmup",
});
logLiveCache(
`openai warmup cacheRead=${warmup.usage.cacheRead} input=${warmup.usage.input} rate=${warmup.hitRate.toFixed(3)}`,
);
const hitRuns = [
await runOpenAiCacheProbe({
...fixture,
sessionId: OPENAI_SESSION_ID,
suffix: "hit-a",
}),
await runOpenAiCacheProbe({
...fixture,
sessionId: OPENAI_SESSION_ID,
suffix: "hit-b",
}),
];
const bestHit = hitRuns.reduce((best, candidate) =>
(candidate.usage.cacheRead ?? 0) > (best.usage.cacheRead ?? 0) ? candidate : best,
);
logLiveCache(
`openai best-hit suffix=${bestHit.suffix} cacheRead=${bestHit.usage.cacheRead} input=${bestHit.usage.input} rate=${bestHit.hitRate.toFixed(3)}`,
);
expect(bestHit.usage.cacheRead ?? 0).toBeGreaterThan(1_024);
expect(bestHit.hitRate).toBeGreaterThanOrEqual(0.7);
},
6 * 60_000,
);
});
describe("anthropic", () => {
let fixture: Awaited<ReturnType<typeof resolveLiveDirectModel>>;
beforeAll(async () => {
fixture = await resolveLiveDirectModel({
provider: "anthropic",
api: "anthropic-messages",
envVar: "OPENCLAW_LIVE_ANTHROPIC_CACHE_MODEL",
preferredModelIds: ["claude-sonnet-4-6", "claude-sonnet-4-5", "claude-haiku-3-5"],
});
logLiveCache(`anthropic model=${fixture.model.provider}/${fixture.model.id}`);
}, 120_000);
it(
"writes cache on warmup and reads it back on repeated stable prefixes",
async () => {
const warmup = await runAnthropicCacheProbe({
...fixture,
sessionId: ANTHROPIC_SESSION_ID,
suffix: "warmup",
cacheRetention: "short",
});
logLiveCache(
`anthropic warmup cacheWrite=${warmup.usage.cacheWrite} cacheRead=${warmup.usage.cacheRead} input=${warmup.usage.input} rate=${warmup.hitRate.toFixed(3)}`,
);
expect(warmup.usage.cacheWrite ?? 0).toBeGreaterThan(0);
const hitRuns = [
await runAnthropicCacheProbe({
...fixture,
sessionId: ANTHROPIC_SESSION_ID,
suffix: "hit-a",
cacheRetention: "short",
}),
await runAnthropicCacheProbe({
...fixture,
sessionId: ANTHROPIC_SESSION_ID,
suffix: "hit-b",
cacheRetention: "short",
}),
];
const bestHit = hitRuns.reduce((best, candidate) =>
(candidate.usage.cacheRead ?? 0) > (best.usage.cacheRead ?? 0) ? candidate : best,
);
logLiveCache(
`anthropic best-hit suffix=${bestHit.suffix} cacheWrite=${bestHit.usage.cacheWrite} cacheRead=${bestHit.usage.cacheRead} input=${bestHit.usage.input} rate=${bestHit.hitRate.toFixed(3)}`,
);
expect(bestHit.usage.cacheRead ?? 0).toBeGreaterThan(1_024);
expect(bestHit.hitRate).toBeGreaterThanOrEqual(0.7);
},
6 * 60_000,
);
it(
"does not report meaningful cache activity when retention is disabled",
async () => {
const disabled = await runAnthropicCacheProbe({
...fixture,
sessionId: `${ANTHROPIC_SESSION_ID}-disabled`,
suffix: "no-cache",
cacheRetention: "none",
});
logLiveCache(
`anthropic none cacheWrite=${disabled.usage.cacheWrite} cacheRead=${disabled.usage.cacheRead} input=${disabled.usage.input}`,
);
expect(disabled.usage.cacheRead ?? 0).toBeLessThanOrEqual(32);
expect(disabled.usage.cacheWrite ?? 0).toBeLessThanOrEqual(32);
},
3 * 60_000,
);
});
});

View File

@@ -0,0 +1,93 @@
import { beforeAll, describe, expect, it } from "vitest";
import {
LIVE_CACHE_TEST_ENABLED,
logLiveCache,
resolveLiveDirectModel,
withLiveCacheHeartbeat,
} from "./live-cache-test-support.js";
const describeLive = LIVE_CACHE_TEST_ENABLED ? describe : describe.skip;
describeLive("provider response headers (live)", () => {
describe("openai", () => {
let fixture: Awaited<ReturnType<typeof resolveLiveDirectModel>>;
beforeAll(async () => {
fixture = await resolveLiveDirectModel({
provider: "openai",
api: "openai-responses",
envVar: "OPENCLAW_LIVE_OPENAI_CACHE_MODEL",
preferredModelIds: ["gpt-5.4-mini", "gpt-5.4", "gpt-5.2"],
});
}, 120_000);
it("returns request-id style headers from Responses", async () => {
const response = await withLiveCacheHeartbeat(
fetch("https://api.openai.com/v1/responses", {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${fixture.apiKey}`,
},
body: JSON.stringify({
model: fixture.model.id,
input: "Reply with OK.",
max_output_tokens: 32,
}),
}),
"openai headers probe",
);
const bodyText = await response.text();
expect(response.ok, bodyText).toBe(true);
const requestId = response.headers.get("x-request-id");
const processingMs = response.headers.get("openai-processing-ms");
const rateLimitHeaders = [...response.headers.entries()]
.filter(([key]) => key.startsWith("x-ratelimit-"))
.map(([key, value]) => `${key}=${value}`);
logLiveCache(
`openai headers x-request-id=${requestId ?? "(missing)"} openai-processing-ms=${processingMs ?? "(missing)"} ${rateLimitHeaders.join(" ")}`.trim(),
);
expect(requestId).toBeTruthy();
}, 120_000);
});
describe("anthropic", () => {
let fixture: Awaited<ReturnType<typeof resolveLiveDirectModel>>;
beforeAll(async () => {
fixture = await resolveLiveDirectModel({
provider: "anthropic",
api: "anthropic-messages",
envVar: "OPENCLAW_LIVE_ANTHROPIC_CACHE_MODEL",
preferredModelIds: ["claude-sonnet-4-6", "claude-sonnet-4-5", "claude-haiku-3-5"],
});
}, 120_000);
it("returns request-id from Messages", async () => {
const response = await withLiveCacheHeartbeat(
fetch("https://api.anthropic.com/v1/messages", {
method: "POST",
headers: {
"content-type": "application/json",
"x-api-key": fixture.apiKey,
"anthropic-version": "2023-06-01",
},
body: JSON.stringify({
model: fixture.model.id,
max_tokens: 32,
messages: [{ role: "user", content: "Reply with OK." }],
}),
}),
"anthropic headers probe",
);
const bodyText = await response.text();
expect(response.ok, bodyText).toBe(true);
const requestId = response.headers.get("request-id");
logLiveCache(`anthropic headers request-id=${requestId ?? "(missing)"}`);
expect(requestId).toBeTruthy();
}, 120_000);
});
});