17 KiB
Plan: GitLab Duo Codex Parity
Generated: 2026-03-10 Estimated Complexity: High
Overview
Bring GitLab Duo support from the current "auth + basic executor" stage to the same practical level as codex inside CLIProxyAPI: a user logs in once, points external clients such as Claude Code at CLIProxyAPI, selects GitLab Duo-backed models, and gets stable streaming, multi-turn behavior, tool calling compatibility, and predictable model routing without manual provider-specific workarounds.
The core architectural shift is to stop treating GitLab Duo as only two REST wrappers (/api/v4/chat/completions and /api/v4/code_suggestions/completions) and instead use GitLab's direct_access contract as the primary runtime entrypoint wherever possible. Official GitLab docs confirm that direct_access returns AI gateway connection details, headers, token, and expiry; that contract is the closest path to codex-like provider behavior.
Prerequisites
- Official GitLab Duo API references confirmed during implementation:
POST /api/v4/code_suggestions/direct_accessPOST /api/v4/code_suggestions/completionsPOST /api/v4/chat/completions
- Access to at least one real GitLab Duo account for manual verification.
- One downstream client target for acceptance testing:
- Claude Code against Claude-compatible endpoint
- OpenAI-compatible client against
/v1/chat/completionsand/v1/responses
- Existing PR branch as starting point:
feat/gitlab-duo-auth- PR #2028
Definition Of Done
- GitLab Duo models can be used via
CLIProxyAPIfrom the same client surfaces that already work forcodex. - Upstream streaming is real passthrough or faithful chunked forwarding, not synthetic whole-response replay.
- Tool/function calling survives translation layers without dropping fields or corrupting names.
- Multi-turn and session semantics are stable across
chat/completions,responses, and Claude-compatible routes. - Model exposure stays current from GitLab metadata or gateway discovery without hardcoded stale model tables.
go test ./...stays green and at least one real manual end-to-end client flow is documented.
Sprint 1: Contract And Gap Closure
Goal: Replace assumptions with a hard compatibility contract between current codex behavior and what GitLab Duo can actually support.
Demo/Validation:
- Written matrix showing
codexfeatures vs current GitLab Duo behavior. - One checked-in developer note or test fixture for real GitLab Duo payload examples.
Task 1.1: Freeze Codex Parity Checklist
- Location: internal/runtime/executor/codex_executor.go, internal/runtime/executor/codex_websockets_executor.go, sdk/api/handlers/openai/openai_responses_handlers.go, sdk/api/handlers/openai/openai_responses_websocket.go
- Description: Produce a concrete feature matrix for
codex: HTTP execute, SSE execute,/v1/responses, websocket downstream path, tool calling, request IDs, session close semantics, and model registration behavior. - Dependencies: None
- Acceptance Criteria:
- A checklist exists in repo docs or issue notes.
- Each capability is marked
required,optional, ornot possiblefor GitLab Duo.
- Validation:
- Review against current
codexcode paths.
- Review against current
Task 1.2: Lock GitLab Duo Runtime Contract
- Location: internal/auth/gitlab/gitlab.go, internal/runtime/executor/gitlab_executor.go
- Description: Validate the exact upstream contract we can rely on:
direct_accessfields and refresh cadence- whether AI gateway path is usable directly
- when
chat/completionsis available vs when fallback is required - what streaming shape is returned by
code_suggestions/completions?stream=true
- Dependencies: Task 1.1
- Acceptance Criteria:
- GitLab transport decision is explicit:
gateway-first,REST-first, orhybrid. - Unknown areas are isolated behind feature flags, not spread across executor logic.
- GitLab transport decision is explicit:
- Validation:
- Official docs + captured real responses from a Duo account.
Task 1.3: Define Client-Facing Compatibility Targets
- Location: README.md, gitlab-duo-codex-parity-plan.md
- Description: Define exactly which external flows must work to call GitLab Duo support "like codex".
- Dependencies: Task 1.2
- Acceptance Criteria:
- Required surfaces are listed:
- Claude-compatible route
- OpenAI
chat/completions - OpenAI
responses - optional downstream websocket path
- Non-goals are explicit if GitLab upstream cannot support them.
- Required surfaces are listed:
- Validation:
- Maintainer review of stated scope.
Sprint 2: Primary Transport Parity
Goal: Move GitLab Duo execution onto a transport that supports codex-like runtime behavior.
Demo/Validation:
- A GitLab Duo model works over real streaming through
/v1/chat/completions. - No synthetic "collect full body then fake stream" path remains on the primary flow.
Task 2.1: Refactor GitLab Executor Into Strategy Layers
- Location: internal/runtime/executor/gitlab_executor.go
- Description: Split current executor into explicit strategies:
- auth refresh/direct access refresh
- gateway transport
- GitLab REST fallback transport
- downstream translation helpers
- Dependencies: Sprint 1
- Acceptance Criteria:
- Executor no longer mixes discovery, refresh, fallback selection, and response synthesis in one path.
- Transport choice is testable in isolation.
- Validation:
- Unit tests for strategy selection and fallback boundaries.
Task 2.2: Implement Real Streaming Path
- Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/gitlab_executor_test.go
- Description: Replace synthetic streaming with true upstream incremental forwarding:
- use gateway stream if available
- otherwise consume GitLab Code Suggestions streaming response and map chunks incrementally
- Dependencies: Task 2.1
- Acceptance Criteria:
ExecuteStreamemits chunks before upstream completion.- error handling preserves status and early failure semantics.
- Validation:
- tests with chunked upstream server
- manual curl check against
/v1/chat/completionswithstream=true
Task 2.3: Preserve Upstream Auth And Headers Correctly
- Location: internal/runtime/executor/gitlab_executor.go, internal/auth/gitlab/gitlab.go
- Description: Use
direct_accessconnection details as first-class transport state:- gateway token
- expiry
- mandatory forwarded headers
- model metadata
- Dependencies: Task 2.1
- Acceptance Criteria:
- executor stops ignoring gateway headers/token when transport requires them
- refresh logic never over-fetches
direct_access
- Validation:
- tests verifying propagated headers and refresh interval behavior
Sprint 3: Request/Response Semantics Parity
Goal: Make GitLab Duo behave correctly under the same request shapes that current codex consumers send.
Demo/Validation:
- OpenAI and Claude-compatible clients can do non-streaming and streaming conversations without losing structure.
Task 3.1: Normalize Multi-Turn Message Mapping
- Location: internal/runtime/executor/gitlab_executor.go, sdk/translator
- Description: Replace the current "flatten prompt into one instruction" behavior with stable multi-turn mapping:
- preserve system context
- preserve user/assistant ordering
- maintain bounded context truncation
- Dependencies: Sprint 2
- Acceptance Criteria:
- multi-turn requests are not collapsed into a lossy single string unless fallback mode explicitly requires it
- truncation policy is deterministic and tested
- Validation:
- golden tests for request mapping
Task 3.2: Tool Calling Compatibility Layer
- Location: internal/runtime/executor/gitlab_executor.go, sdk/api/handlers/openai/openai_responses_handlers.go
- Description: Decide and implement one of two paths:
- native pass-through if GitLab gateway supports tool/function structures
- strict downgrade path with explicit unsupported errors instead of silent field loss
- Dependencies: Task 3.1
- Acceptance Criteria:
- tool-related fields are either preserved correctly or rejected explicitly
- no silent corruption of tool names, tool calls, or tool results
- Validation:
- table-driven tests for tool payloads
- one manual client scenario using tools
Task 3.3: Token Counting And Usage Reporting Fidelity
- Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/usage_helpers.go
- Description: Improve token/usage reporting so GitLab models behave like first-class providers in logs and scheduling.
- Dependencies: Sprint 2
- Acceptance Criteria:
CountTokensuses the closest supported estimation path- usage logging distinguishes prompt vs completion when possible
- Validation:
- unit tests for token estimation outputs
Sprint 4: Responses And Session Parity
Goal: Reach codex-level support for OpenAI Responses clients and long-lived sessions where GitLab upstream permits it.
Demo/Validation:
/v1/responsesworks with GitLab Duo in a realistic client flow.- If websocket parity is not possible, the code explicitly declines it and keeps HTTP paths stable.
Task 4.1: Make GitLab Compatible With /v1/responses
- Location: sdk/api/handlers/openai/openai_responses_handlers.go, internal/runtime/executor/gitlab_executor.go
- Description: Ensure GitLab transport can safely back the Responses API path, including compact responses if applicable.
- Dependencies: Sprint 3
- Acceptance Criteria:
- GitLab Duo can be selected behind
/v1/responses - response IDs and follow-up semantics are defined
- GitLab Duo can be selected behind
- Validation:
- handler tests analogous to codex/openai responses tests
Task 4.2: Evaluate Downstream Websocket Parity
- Location: sdk/api/handlers/openai/openai_responses_websocket.go, internal/runtime/executor/gitlab_executor.go
- Description: Decide whether GitLab Duo can support downstream websocket sessions like codex:
- if yes, add session-aware execution path
- if no, mark GitLab auth as websocket-ineligible and keep HTTP routes first-class
- Dependencies: Task 4.1
- Acceptance Criteria:
- websocket behavior is explicit, not accidental
- no route claims websocket support when the upstream cannot honor it
- Validation:
- websocket handler tests or explicit capability tests
Task 4.3: Add Session Cleanup And Failure Recovery Semantics
- Location: internal/runtime/executor/gitlab_executor.go, sdk/cliproxy/auth/conductor.go
- Description: Add codex-like session cleanup, retry boundaries, and model suspension/resume behavior for GitLab failures and quota events.
- Dependencies: Sprint 2
- Acceptance Criteria:
- auth/model cooldown behavior is predictable on GitLab 4xx/5xx/quota responses
- executor cleans up per-session resources if any are introduced
- Validation:
- tests for quota and retry behavior
Sprint 5: Client UX, Model UX, And Manual E2E
Goal: Make GitLab Duo feel like a normal built-in provider to operators and downstream clients.
Demo/Validation:
- A documented setup exists for "login once, point Claude Code at CLIProxyAPI, use GitLab Duo-backed model".
Task 5.1: Model Alias And Provider UX Cleanup
- Location: sdk/cliproxy/service.go, README.md
- Description: Normalize what users see:
- stable alias such as
gitlab-duo - discovered upstream model names
- optional prefix behavior
- account labels that clearly distinguish OAuth vs PAT
- stable alias such as
- Dependencies: Sprint 3
- Acceptance Criteria:
- users can select a stable GitLab alias even when upstream model changes
- dynamic model discovery does not cause confusing model churn
- Validation:
- registry tests and manual
/v1/modelsinspection
- registry tests and manual
Task 5.2: Add Real End-To-End Acceptance Tests
- Location: internal/runtime/executor/gitlab_executor_test.go, sdk/api/handlers/openai
- Description: Add higher-level tests covering the actual proxy surfaces:
- OpenAI
chat/completions - OpenAI
responses - Claude-compatible request path if GitLab is routed there
- OpenAI
- Dependencies: Sprint 4
- Acceptance Criteria:
- tests fail if streaming regresses into synthetic buffering again
- tests cover at least one tool-related request and one multi-turn request
- Validation:
go test ./...
Task 5.3: Publish Operator Documentation
- Location: README.md
- Description: Document:
- OAuth setup requirements
- PAT requirements
- current capability matrix
- known limitations if websocket/tool parity is partial
- Dependencies: Sprint 5.1
- Acceptance Criteria:
- setup instructions are enough for a new user to reproduce the GitLab Duo flow
- limitations are explicit
- Validation:
- dry-run docs review from a clean environment
Testing Strategy
- Keep
go test ./...green after every committable task. - Add table-driven tests first for request mapping, refresh behavior, and dynamic model registration.
- Add transport tests with
httptest.Serverfor:- real chunked streaming
- header propagation from
direct_access - upstream fallback rules
- Add at least one manual acceptance checklist:
- login via OAuth
- login via PAT
- list models
- run one streaming prompt via OpenAI route
- run one prompt from the target downstream client
Potential Risks & Gotchas
- GitLab public docs expose
direct_access, but do not fully document every possible AI gateway path. We should isolate any empirically discovered gateway assumptions behind one transport layer and feature flags. chat/completionsavailability differs by GitLab offering and version. The executor must not assume it always exists.- Code Suggestions is completion-oriented; lossy mapping from rich chat/tool payloads will make GitLab Duo feel worse than codex unless explicitly handled.
- Synthetic streaming is not good enough for codex parity and will cause regressions in interactive clients.
- Dynamic model discovery can create unstable UX if the stable alias and discovered model IDs are not separated cleanly.
- PAT auth may validate successfully while still lacking effective Duo permissions. Error reporting must surface this explicitly.
Rollback Plan
- Keep the current basic GitLab executor behind a fallback mode until the new transport path is stable.
- If parity work destabilizes existing providers, revert only GitLab-specific executor changes and leave auth support intact.
- Preserve the stable
gitlab-duoalias so rollback does not break client configuration.