Align websocket continuity resolution with the HTTP Codex path, make auth-affinity principal keys use a stable string representation, and extract small helpers that remove duplicated continuity and affinity logic without changing the validated cache-hit behavior.
Prompt caching on Codex was not reliably reusable through the proxy because repeated chat-completions requests could reach the upstream without the same continuity envelope. In practice this showed up most clearly with OpenCode, where cache reads worked in the reference client but not through CLIProxyAPI, although the root cause is broader than OpenCode itself.
The proxy was breaking continuity in several ways: executor-layer Codex request preparation stripped prompt_cache_retention, chat-completions translation did not preserve that field, continuity headers used a different shape than the working client behavior, and OpenAI-style Codex requests could be sent without a stable prompt_cache_key. When that happened, session_id fell back to a fresh random value per request, so upstream Codex treated repeated requests as unrelated turns instead of as part of the same cacheable context.
This change fixes that by preserving caller-provided prompt_cache_retention on Codex execution paths, preserving prompt_cache_retention when translating OpenAI chat-completions requests to Codex, aligning Codex continuity headers to session_id, and introducing an explicit Codex continuity policy that derives a stable continuity key from the best available signal. The resolution order prefers an explicit prompt_cache_key, then execution session metadata, then an explicit idempotency key, then stable request-affinity metadata, then a stable client-principal hash, and finally a stable auth-ID hash when no better continuity signal exists.
The same continuity key is applied to both prompt_cache_key in the request body and session_id in the request headers so repeated requests reuse the same upstream cache/session identity. The auth manager also keeps auth selection sticky for repeated request sequences, preventing otherwise-equivalent Codex requests from drifting across different upstream auth contexts and accidentally breaking cache reuse.
To keep the implementation maintainable, the continuity resolution and diagnostics are centralized in a dedicated Codex continuity helper instead of being scattered across executor flow code. Regression coverage now verifies retention preservation, continuity-key precedence, stable auth-ID fallback, websocket parity, translator preservation, and auth-affinity behavior. Manual validation confirmed prompt cache reads now occur through CLIProxyAPI when using Codex via OpenCode, and the fix should also benefit other clients that rely on stable repeated Codex request continuity.
- Introduced a new method `buildRecord` in `usageReporter` to encapsulate record creation, improving code readability and maintainability.
- Added latency tracking to usage records, ensuring accurate reporting of request latencies.
- Updated tests to validate the inclusion of latency in usage records and ensure proper functionality of the new reporting structure.
feat(proxy): centralize proxy handling with `proxyutil` package and enhance test coverage
- Added `proxyutil` package to simplify proxy handling across the codebase.
- Refactored various components (`executor`, `cliproxy`, `auth`, etc.) to use `proxyutil` for consistent and reusable proxy logic.
- Introduced support for "direct" proxy mode to explicitly bypass all proxies.
- Updated tests to validate proxy behavior (e.g., `direct`, HTTP/HTTPS, and SOCKS5).
- Enhanced YAML configuration documentation for proxy options.
- Simplified connection logic by removing `connCreateSent` and related state handling.
- Updated `buildCodexWebsocketRequestBody` to always use `response.create`.
- Added unit tests to validate `response.create` behavior and beta header preservation.
- Dropped unsupported `response.append` and outdated `response.done` event types.
normalizeCacheControlTTL unconditionally re-serializes the entire request
body through json.Unmarshal/json.Marshal even when no TTL normalization
is needed. Go's json.Marshal randomizes map key order and HTML-escapes
<, >, & characters (to \u003c, \u003e, \u0026), producing different raw
bytes on every call.
Anthropic's prompt caching uses byte-prefix matching, so any byte-level
difference causes a cache miss. This means the ~119K system prompt and
tools are re-processed on every request when routed through CPA.
The fix adds a bool return to normalizeTTLForBlock to indicate whether
it actually modified anything, and skips the marshal step in
normalizeCacheControlTTL when no blocks were changed.
fix(gemini): add `deprecated` to unsupported schema keywords
Add `deprecated` to the list of unsupported schema metadata fields in Gemini and update tests to verify its removal.
- Set Accept-Encoding: identity for SSE streams; upstream must not compress
line-delimited SSE bodies that bufio.Scanner reads directly
- Re-enforce identity after ApplyCustomHeadersFromAttrs to prevent auth
attribute injection from re-enabling compression on the stream path
- Add peekableBody type wrapping bufio.Reader for non-consuming magic-byte
inspection of the first 4 bytes without affecting downstream readers
- Detect gzip (0x1f 0x8b) and zstd (0x28 0xb5 0x2f 0xfd) by magic bytes
when Content-Encoding header is absent, covering misbehaving upstreams
- Remove if-Content-Encoding guard on all three error paths (Execute,
ExecuteStream, CountTokens); unconditionally delegate to decodeResponseBody
so magic-byte detection applies consistently to all response paths
- Add 10 tests covering stream identity enforcement, compressed success bodies,
magic-byte detection without headers, error path decoding, and
auth attribute override prevention
Add support for Claude's "adaptive" and "auto" thinking modes using `output_config.effort`. Introduce support for new effort level "max" in adaptive thinking. Update thinking logic, validate model capabilities, and extend converters and handling to ensure compatibility with adaptive modes. Adjust static model data with supported levels and refine handling across translators and executors.
Captured and compared outgoing requests from CLIProxyAPI against real
Claude Code 2.1.63 and fixed all detectable differences:
Headers:
- Update anthropic-beta to match 2.1.63: replace fine-grained-tool-streaming
and prompt-caching-2024-07-31 with context-management-2025-06-27 and
prompt-caching-scope-2026-01-05
- Remove X-Stainless-Helper-Method header (real Claude Code does not send it)
- Update default User-Agent from "claude-cli/2.1.44 (external, sdk-cli)" to
"claude-cli/2.1.63 (external, cli)"
- Force Claude Code User-Agent for non-Claude clients to avoid leaking
real client identity (e.g. curl, OpenAI SDKs) during cloaking
Body:
- Inject x-anthropic-billing-header as system[0] (matches real format)
- Change system prompt identifier from "You are Claude Code..." to
"You are a Claude agent, built on Anthropic's Claude Agent SDK."
- Add cache_control with ttl:"1h" to match real request format
- Fix user_id format: user_[64hex]_account_[uuid]_session_[uuid]
(was missing account UUID)
- Disable tool name prefix (set claudeToolPrefix to empty string)
TLS:
- Switch utls fingerprint from HelloFirefox_Auto to HelloChrome_Auto
(closer to Node.js/OpenSSL used by real Claude Code)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>