The Copilot API enforces per-account prompt token limits (128K individual,
168K business) that differ from the static 200K context length advertised
by the proxy. This mismatch caused Claude Code to accumulate context
beyond the actual limit, triggering "prompt token count exceeds the limit
of 128000" errors.
Changes:
- Extract max_prompt_tokens and max_output_tokens from the Copilot
/models API response (capabilities.limits) and use them as the
authoritative ContextLength and MaxCompletionTokens values
- Add CopilotModelLimits struct and Limits() helper to parse limits
from the existing Capabilities map
- Fix GitLab Duo context-1m beta header not being set when routing
through the Anthropic gateway (gitlab_duo_force_context_1m attr
was set but only gin headers were checked)
- Fix flaky parallel tests that shared global model registry state
- Strip SSE `data:` prefix before normalizing reasoning_text→reasoning_content
in streaming mode; re-wrap afterward for the translator
- Iterate all choices in normalizeGitHubCopilotReasoningField (not just
choices[0]) to support n>1 requests
- Remove over-broad tool-role fallback in isAgentInitiated that scanned
all messages for role:"tool", aligning with opencode's approach of only
detecting active tool loops — genuine user follow-ups after tool use are
no longer mis-classified as agent-initiated
- Add 5 reasoning normalization tests; update 2 X-Initiator tests to match
refined semantics
This commit addresses three issues with Claude Code through GitHub
Copilot:
1. **Premium request inflation**: Responses API requests were missing
Openai-Intent headers and proper defaults, causing Copilot to bill
each tool-loop continuation as a new premium request. Fixed by adding
isAgentInitiated() heuristic (checks for tool_result content or
preceding assistant tool_use), applying Responses API defaults
(store, include, reasoning.summary), and local tiktoken-based token
counting to avoid extra API calls.
2. **Context overflow**: Claude Code's modelSupports1M() hardcodes
opus-4-6 as 1M-capable, but Copilot only supports ~128K-200K.
Fixed by stripping the context-1m-2025-08-07 beta from translated
request bodies. Also forwards response headers in non-streaming
Execute() and registers the GET /copilot-quota management API route.
3. **Thinking not working**: Add ThinkingSupport with level-based
reasoning to Claude models in the static definitions. Normalize
Copilot's non-standard 'reasoning_text' response field to
'reasoning_content' before passing to the SDK translator. Use
caller-provided context in CountTokens instead of Background().
Follow-up review found two real framing hazards in the handler-layer
framer: it could flush a partial `data:` payload before the JSON was
complete, and it could inject an extra newline before chunks that
already began with `\n`/`\r\n`. This commit tightens the framer so it
only emits undelimited events when the buffered `data:` payload is
already valid JSON (or `[DONE]`), skips newline injection for chunks
that already start with a line break, and avoids the heavier
`bytes.Split` path while scanning SSE fields.
The regression suite now covers split `data:` payload chunks,
newline-prefixed chunks, and dropping incomplete trailing data on
flush, so the original Responses fix remains intact while the review
concerns are explicitly locked down.
Constraint: Keep the follow-up limited to handler-layer framing and tests
Rejected: Ignore the review and rely on current executor chunk shapes | leaves partial data payload corruption possible
Rejected: Build a fully generic SSE parser | wider change than needed for the identified risks
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Do not emit undelimited Responses SSE events unless buffered `data:` content is already complete and valid
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1
Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/...
Not-tested: Full repository test suite outside sdk/api/handlers packages
Line-oriented upstream executors can emit `event:` and `data:` as
separate chunks, but the Responses handler had started terminating
each incoming chunk as a full SSE event. That split `response.created`
into an empty event plus a later data block, which broke downstream
clients like OpenClaw.
This keeps the fix in the handler layer: a small stateful framer now
buffers standalone `event:` lines until the matching `data:` arrives,
preserves already-framed events, and ignores delimiter-only leftovers.
The regression suite now covers split event/data framing, full-event
passthrough, terminal errors, and the bootstrap path that forwards
line-oriented openai-response streams from non-Codex executors too.
Constraint: Keep the fix localized to Responses handler framing instead of patching every executor
Rejected: Revert to v6.9.7 chunk writing | would reintroduce data-only framing regressions
Rejected: Patch each line-oriented executor separately | duplicates fragile SSE assembly logic
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Do not assume incoming Responses stream chunks are already complete SSE events; preserve handler-layer reassembly for split `event:`/`data:` inputs
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1
Tested: /tmp/go1.26.1/go test ./sdk/api/handlers/... -count=1
Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/...
Tested: Temporary patched server on 127.0.0.1:18317 -> /v1/models 200, /v1/responses non-stream 200, /v1/responses stream emitted combined `event:` + `data:` frames
Not-tested: Full repository test suite outside sdk/api/handlers packages
- Introduced new logging functions for websocket requests, handshakes, errors, and responses in `logging_helpers.go`.
- Updated `CodexWebsocketsExecutor` to utilize the new logging functions for improved clarity and consistency in websocket operations.
- Modified the handling of websocket upgrade rejections to log relevant metadata.
- Changed the request body key to a timeline body key in `openai_responses_websocket.go` to better reflect its purpose.
- Enhanced tests to verify the correct logging of websocket events and responses, including disconnect events and error handling scenarios.