Claude's Tool Search feature (advanced-tool-use-2025-11-20 beta) adds
defer_loading field to tool definitions. When proxying Claude requests
to Codex or Gemini, this unknown field causes 400 errors upstream.
Strip defer_loading (and cache_control where missing) in all three
Claude-to-upstream translation paths:
- codex/claude: defer_loading + cache_control
- gemini-cli/claude: defer_loading
- gemini/claude: defer_loading
Fixes#1725, Fixes#1375
Add support for Claude's "adaptive" and "auto" thinking modes using `output_config.effort`. Introduce support for new effort level "max" in adaptive thinking. Update thinking logic, validate model capabilities, and extend converters and handling to ensure compatibility with adaptive modes. Adjust static model data with supported levels and refine handling across translators and executors.
## Problem
When using Antigravity Claude models through CLIProxyAPI, the thinking
chain (reasoning content) does not display in the Amp client.
## Root Cause
The Amp client sends `thinking: {"type": "auto"}` in its requests,
but `ConvertClaudeRequestToAntigravity` only handled `"enabled"` and
`"adaptive"` types in its switch statement. The `"auto"` type was
silently ignored, resulting in no `thinkingConfig` being set in the
translated Gemini request. Without `thinkingConfig`, the Antigravity
API returns responses without any thinking content.
Additionally, the Antigravity API for Claude models does not support
`thinkingBudget: -1` (auto mode sentinel). It requires a concrete
positive budget value. The fix uses 128000 as the budget for "auto"
mode, which `ApplyThinking` will then normalize to stay within the
model's actual limits (e.g., capped to `maxOutputTokens - 1`).
## Changes
### internal/translator/antigravity/claude/antigravity_claude_request.go
1. **Add "auto" case** to the thinking type switch statement.
Sets `thinkingBudget: 128000` and `includeThoughts: true`.
The budget is subsequently normalized by `ApplyThinking` based
on model-specific limits.
2. **Add "auto" to hasThinking check** so that interleaved thinking
hints are injected for tool-use scenarios when Amp sends
`thinking.type="auto"`.
### internal/registry/model_definitions_static_data.go
3. **Add Thinking configuration** for `claude-sonnet-4-6`,
`claude-sonnet-4-5`, and `claude-opus-4-6` in
`GetAntigravityModelConfig()` -- these were previously missing,
causing `ApplyThinking` to skip thinking config entirely.
## Testing
- Deployed to Railway test instance (cpa-thinking-test)
- Verified via debug logging that:
- Amp sends `thinking: {"type": "auto"}`
- CPA now translates this to `thinkingConfig: {thinkingBudget: 128000, includeThoughts: true}`
- `ApplyThinking` normalizes the budget to model-specific limits
- Antigravity API receives the correct thinkingConfig
Amp-Thread-ID: https://ampcode.com/threads/T-019ca511-710d-776d-a07c-4b750f871a93
Co-authored-by: Amp <amp@ampcode.com>
Move base64 image data from Claude tool_result into functionResponse.parts
as inlineData instead of outer sibling parts, preventing context bloat.
Unify all inlineData field naming to camelCase mimeType across Claude,
OpenAI, and Gemini translators. Add comprehensive edge case tests and
Gemini-side regression test for functionResponse.parts preservation.
Non-spark codex models (gpt-5.3-codex, gpt-5.2-codex) stream function call
arguments via multiple delta events followed by a done event. The done handler
unconditionally emitted the full arguments, duplicating what deltas already
streamed. This produced invalid double JSON that Claude Code couldn't parse,
causing tool calls to fail with missing parameters and infinite retry loops.
Add HasReceivedArgumentsDelta flag to track whether delta events were received.
The done handler now only emits arguments when no deltas preceded it (spark
models), while delta-based streaming continues to work for non-spark models.
Some Codex models (e.g. gpt-5.3-codex-spark) send function call arguments
in a single "done" event without preceding "delta" events. The streaming
translator only handled "delta" events, causing tool call arguments to be
lost — resulting in empty tool inputs and infinite retry loops in clients
like Claude Code.
Emit the full arguments from the "done" event as a single input_json_delta
so downstream clients receive the complete tool input.
sjson.SetRaw with an empty string produces malformed JSON (e.g. "result":}).
This happens when a Claude tool_result block has no content field, causing
functionResponseResult.Raw to be "". Guard against this by falling back to
sjson.Set with an empty string only when .Raw is empty.
The OpenAI Chat Completions translator was silently dropping
response.function_call_arguments.delta and
response.function_call_arguments.done Codex SSE events, meaning
tool call arguments were never streamed incrementally to clients.
Add proper handling mirroring the proven Claude translator pattern:
- response.output_item.added: announce tool call (id, name, empty args)
- response.function_call_arguments.delta: stream argument chunks
- response.function_call_arguments.done: emit full args if no deltas
- response.output_item.done: defensive fallback for backward compat
State tracking via HasReceivedArgumentsDelta and HasToolCallAnnounced
ensures no duplicate argument emission and correct behavior for models
like codex-spark that skip delta events entirely.
- Added support for single and array-based `content` cases.
- Enhanced `system_instruction` structure population logic.
- Improved handling of user role assignment for string-based `content`.
The `ConvertOpenAIResponsesRequestToGemini` function had code that attempted
to uppercase JSON Schema type values (e.g. "string" -> "STRING") for Gemini
compatibility. This broke nullable types because when `type` is a JSON array
like `["string", "null"]`:
1. `gjson.Result.String()` returns the raw JSON text `["string","null"]`
2. `strings.ToUpper()` produces `["STRING","NULL"]`
3. `sjson.Set()` stores it as a JSON **string** `"[\"STRING\",\"NULL\"]"`
instead of a JSON array
4. The downstream `CleanJSONSchemaForGemini()` / `flattenTypeArrays()`
cannot detect it (since `IsArray()` returns false on a string)
5. Gemini/Antigravity API rejects it with:
`400 Invalid value at '...type' (Type), "["STRING","NULL"]"`
This was confirmed and tested with Droid Factory (Antigravity) Gemini models
where Claude Code sends tool schemas with nullable parameters.
The fix removes the uppercasing logic entirely and passes the raw schema
through to `parametersJsonSchema`. This is safe because:
- Antigravity executor already runs `CleanJSONSchemaForGemini()` which
properly handles type arrays, nullable fields, and all schema cleanup
- Gemini/Vertex executors use `parametersJsonSchema` which accepts raw
JSON Schema directly (no uppercasing needed)
- The uppercasing code also only iterated top-level properties, missing
nested schemas entirely
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(translator): restructure message content handling to support multiple content types
- Consolidated `input_text` and `output_text` handling into a single case.
- Added support for processing `input_image` content with associated URLs.
- Standardized the handling of `stop_reason` and `finish_reason` across Codex and Gemini responses.
- Restricted pass-through of specific reasons (`max_tokens`, `stop`) for consistency.
- Enhanced fallback logic for undefined reasons.
- Added support to extract and include `cachedContentTokenCount` in `usage.prompt_tokens_details`.
- Logged warnings for failures to set cached token count for better debugging.
- Replaced all instances of `bytes.Clone` with direct references to enhance efficiency.
- Simplified payload handling across executors and translators by eliminating unnecessary data duplication.
- Replaced repetitive string operations with a centralized `escapeGJSONPathKey` function.
- Streamlined handling of JSON schema cleaning for Gemini and Antigravity requests.
- Improved payload management by transitioning from byte slices to strings for processing.
- Removed unnecessary cloning of byte slices in several places.