The Copilot API enforces per-account prompt token limits (128K individual,
168K business) that are lower than the total context window (200K). When
the dynamic /models API fetch fails or returns no capabilities.limits,
the static fallback of 200K exceeds the real enforced limit, causing
intermittent "prompt token count exceeds the limit" errors.
Two complementary fixes:
1. Lower static Copilot Claude model ContextLength from 200000 to 128000
(the conservative default matching defaultCopilotContextLength). Dynamic
API limits override this when available.
2. Add context_length and max_completion_tokens to Claude-format model
responses so Claude Code CLI can learn the actual Copilot limit instead
of relying on its built-in 1M context configuration.
This commit addresses three issues with Claude Code through GitHub
Copilot:
1. **Premium request inflation**: Responses API requests were missing
Openai-Intent headers and proper defaults, causing Copilot to bill
each tool-loop continuation as a new premium request. Fixed by adding
isAgentInitiated() heuristic (checks for tool_result content or
preceding assistant tool_use), applying Responses API defaults
(store, include, reasoning.summary), and local tiktoken-based token
counting to avoid extra API calls.
2. **Context overflow**: Claude Code's modelSupports1M() hardcodes
opus-4-6 as 1M-capable, but Copilot only supports ~128K-200K.
Fixed by stripping the context-1m-2025-08-07 beta from translated
request bodies. Also forwards response headers in non-streaming
Execute() and registers the GET /copilot-quota management API route.
3. **Thinking not working**: Add ThinkingSupport with level-based
reasoning to Claude models in the static definitions. Normalize
Copilot's non-standard 'reasoning_text' response field to
'reasoning_content' before passing to the SDK translator. Use
caller-provided context in CountTokens instead of Background().
- Add model to GetGeminiModels()
- Add model to GetGeminiVertexModels()
- Add model to GetGeminiCLIModels()
- Add model to GetAIStudioModels()
- Add to AntigravityModelConfig with thinking levels
- Update gemini-3-flash-preview description
Registers the new lightweight Gemini model across all provider
endpoints for cost-effective high-volume usage scenarios.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add support for Claude's "adaptive" and "auto" thinking modes using `output_config.effort`. Introduce support for new effort level "max" in adaptive thinking. Update thinking logic, validate model capabilities, and extend converters and handling to ensure compatibility with adaptive modes. Adjust static model data with supported levels and refine handling across translators and executors.