Files
CLIProxyAPIPlus/gitlab-duo-codex-parity-plan.md
2026-03-10 22:19:36 +04:00

17 KiB

Plan: GitLab Duo Codex Parity

Generated: 2026-03-10 Estimated Complexity: High

Overview

Bring GitLab Duo support from the current "auth + basic executor" stage to the same practical level as codex inside CLIProxyAPI: a user logs in once, points external clients such as Claude Code at CLIProxyAPI, selects GitLab Duo-backed models, and gets stable streaming, multi-turn behavior, tool calling compatibility, and predictable model routing without manual provider-specific workarounds.

The core architectural shift is to stop treating GitLab Duo as only two REST wrappers (/api/v4/chat/completions and /api/v4/code_suggestions/completions) and instead use GitLab's direct_access contract as the primary runtime entrypoint wherever possible. Official GitLab docs confirm that direct_access returns AI gateway connection details, headers, token, and expiry; that contract is the closest path to codex-like provider behavior.

Prerequisites

  • Official GitLab Duo API references confirmed during implementation:
    • POST /api/v4/code_suggestions/direct_access
    • POST /api/v4/code_suggestions/completions
    • POST /api/v4/chat/completions
  • Access to at least one real GitLab Duo account for manual verification.
  • One downstream client target for acceptance testing:
    • Claude Code against Claude-compatible endpoint
    • OpenAI-compatible client against /v1/chat/completions and /v1/responses
  • Existing PR branch as starting point:
    • feat/gitlab-duo-auth
    • PR #2028

Definition Of Done

  • GitLab Duo models can be used via CLIProxyAPI from the same client surfaces that already work for codex.
  • Upstream streaming is real passthrough or faithful chunked forwarding, not synthetic whole-response replay.
  • Tool/function calling survives translation layers without dropping fields or corrupting names.
  • Multi-turn and session semantics are stable across chat/completions, responses, and Claude-compatible routes.
  • Model exposure stays current from GitLab metadata or gateway discovery without hardcoded stale model tables.
  • go test ./... stays green and at least one real manual end-to-end client flow is documented.

Sprint 1: Contract And Gap Closure

Goal: Replace assumptions with a hard compatibility contract between current codex behavior and what GitLab Duo can actually support.

Demo/Validation:

  • Written matrix showing codex features vs current GitLab Duo behavior.
  • One checked-in developer note or test fixture for real GitLab Duo payload examples.

Task 1.1: Freeze Codex Parity Checklist

Task 1.2: Lock GitLab Duo Runtime Contract

  • Location: internal/auth/gitlab/gitlab.go, internal/runtime/executor/gitlab_executor.go
  • Description: Validate the exact upstream contract we can rely on:
    • direct_access fields and refresh cadence
    • whether AI gateway path is usable directly
    • when chat/completions is available vs when fallback is required
    • what streaming shape is returned by code_suggestions/completions?stream=true
  • Dependencies: Task 1.1
  • Acceptance Criteria:
    • GitLab transport decision is explicit: gateway-first, REST-first, or hybrid.
    • Unknown areas are isolated behind feature flags, not spread across executor logic.
  • Validation:
    • Official docs + captured real responses from a Duo account.

Task 1.3: Define Client-Facing Compatibility Targets

  • Location: README.md, gitlab-duo-codex-parity-plan.md
  • Description: Define exactly which external flows must work to call GitLab Duo support "like codex".
  • Dependencies: Task 1.2
  • Acceptance Criteria:
    • Required surfaces are listed:
      • Claude-compatible route
      • OpenAI chat/completions
      • OpenAI responses
      • optional downstream websocket path
    • Non-goals are explicit if GitLab upstream cannot support them.
  • Validation:
    • Maintainer review of stated scope.

Sprint 2: Primary Transport Parity

Goal: Move GitLab Duo execution onto a transport that supports codex-like runtime behavior.

Demo/Validation:

  • A GitLab Duo model works over real streaming through /v1/chat/completions.
  • No synthetic "collect full body then fake stream" path remains on the primary flow.

Task 2.1: Refactor GitLab Executor Into Strategy Layers

  • Location: internal/runtime/executor/gitlab_executor.go
  • Description: Split current executor into explicit strategies:
    • auth refresh/direct access refresh
    • gateway transport
    • GitLab REST fallback transport
    • downstream translation helpers
  • Dependencies: Sprint 1
  • Acceptance Criteria:
    • Executor no longer mixes discovery, refresh, fallback selection, and response synthesis in one path.
    • Transport choice is testable in isolation.
  • Validation:
    • Unit tests for strategy selection and fallback boundaries.

Task 2.2: Implement Real Streaming Path

  • Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/gitlab_executor_test.go
  • Description: Replace synthetic streaming with true upstream incremental forwarding:
    • use gateway stream if available
    • otherwise consume GitLab Code Suggestions streaming response and map chunks incrementally
  • Dependencies: Task 2.1
  • Acceptance Criteria:
    • ExecuteStream emits chunks before upstream completion.
    • error handling preserves status and early failure semantics.
  • Validation:
    • tests with chunked upstream server
    • manual curl check against /v1/chat/completions with stream=true

Task 2.3: Preserve Upstream Auth And Headers Correctly

  • Location: internal/runtime/executor/gitlab_executor.go, internal/auth/gitlab/gitlab.go
  • Description: Use direct_access connection details as first-class transport state:
    • gateway token
    • expiry
    • mandatory forwarded headers
    • model metadata
  • Dependencies: Task 2.1
  • Acceptance Criteria:
    • executor stops ignoring gateway headers/token when transport requires them
    • refresh logic never over-fetches direct_access
  • Validation:
    • tests verifying propagated headers and refresh interval behavior

Sprint 3: Request/Response Semantics Parity

Goal: Make GitLab Duo behave correctly under the same request shapes that current codex consumers send.

Demo/Validation:

  • OpenAI and Claude-compatible clients can do non-streaming and streaming conversations without losing structure.

Task 3.1: Normalize Multi-Turn Message Mapping

  • Location: internal/runtime/executor/gitlab_executor.go, sdk/translator
  • Description: Replace the current "flatten prompt into one instruction" behavior with stable multi-turn mapping:
    • preserve system context
    • preserve user/assistant ordering
    • maintain bounded context truncation
  • Dependencies: Sprint 2
  • Acceptance Criteria:
    • multi-turn requests are not collapsed into a lossy single string unless fallback mode explicitly requires it
    • truncation policy is deterministic and tested
  • Validation:
    • golden tests for request mapping

Task 3.2: Tool Calling Compatibility Layer

  • Location: internal/runtime/executor/gitlab_executor.go, sdk/api/handlers/openai/openai_responses_handlers.go
  • Description: Decide and implement one of two paths:
    • native pass-through if GitLab gateway supports tool/function structures
    • strict downgrade path with explicit unsupported errors instead of silent field loss
  • Dependencies: Task 3.1
  • Acceptance Criteria:
    • tool-related fields are either preserved correctly or rejected explicitly
    • no silent corruption of tool names, tool calls, or tool results
  • Validation:
    • table-driven tests for tool payloads
    • one manual client scenario using tools

Task 3.3: Token Counting And Usage Reporting Fidelity

  • Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/usage_helpers.go
  • Description: Improve token/usage reporting so GitLab models behave like first-class providers in logs and scheduling.
  • Dependencies: Sprint 2
  • Acceptance Criteria:
    • CountTokens uses the closest supported estimation path
    • usage logging distinguishes prompt vs completion when possible
  • Validation:
    • unit tests for token estimation outputs

Sprint 4: Responses And Session Parity

Goal: Reach codex-level support for OpenAI Responses clients and long-lived sessions where GitLab upstream permits it.

Demo/Validation:

  • /v1/responses works with GitLab Duo in a realistic client flow.
  • If websocket parity is not possible, the code explicitly declines it and keeps HTTP paths stable.

Task 4.1: Make GitLab Compatible With /v1/responses

Task 4.2: Evaluate Downstream Websocket Parity

  • Location: sdk/api/handlers/openai/openai_responses_websocket.go, internal/runtime/executor/gitlab_executor.go
  • Description: Decide whether GitLab Duo can support downstream websocket sessions like codex:
    • if yes, add session-aware execution path
    • if no, mark GitLab auth as websocket-ineligible and keep HTTP routes first-class
  • Dependencies: Task 4.1
  • Acceptance Criteria:
    • websocket behavior is explicit, not accidental
    • no route claims websocket support when the upstream cannot honor it
  • Validation:
    • websocket handler tests or explicit capability tests

Task 4.3: Add Session Cleanup And Failure Recovery Semantics

  • Location: internal/runtime/executor/gitlab_executor.go, sdk/cliproxy/auth/conductor.go
  • Description: Add codex-like session cleanup, retry boundaries, and model suspension/resume behavior for GitLab failures and quota events.
  • Dependencies: Sprint 2
  • Acceptance Criteria:
    • auth/model cooldown behavior is predictable on GitLab 4xx/5xx/quota responses
    • executor cleans up per-session resources if any are introduced
  • Validation:
    • tests for quota and retry behavior

Sprint 5: Client UX, Model UX, And Manual E2E

Goal: Make GitLab Duo feel like a normal built-in provider to operators and downstream clients.

Demo/Validation:

  • A documented setup exists for "login once, point Claude Code at CLIProxyAPI, use GitLab Duo-backed model".

Task 5.1: Model Alias And Provider UX Cleanup

  • Location: sdk/cliproxy/service.go, README.md
  • Description: Normalize what users see:
    • stable alias such as gitlab-duo
    • discovered upstream model names
    • optional prefix behavior
    • account labels that clearly distinguish OAuth vs PAT
  • Dependencies: Sprint 3
  • Acceptance Criteria:
    • users can select a stable GitLab alias even when upstream model changes
    • dynamic model discovery does not cause confusing model churn
  • Validation:
    • registry tests and manual /v1/models inspection

Task 5.2: Add Real End-To-End Acceptance Tests

  • Location: internal/runtime/executor/gitlab_executor_test.go, sdk/api/handlers/openai
  • Description: Add higher-level tests covering the actual proxy surfaces:
    • OpenAI chat/completions
    • OpenAI responses
    • Claude-compatible request path if GitLab is routed there
  • Dependencies: Sprint 4
  • Acceptance Criteria:
    • tests fail if streaming regresses into synthetic buffering again
    • tests cover at least one tool-related request and one multi-turn request
  • Validation:
    • go test ./...

Task 5.3: Publish Operator Documentation

  • Location: README.md
  • Description: Document:
    • OAuth setup requirements
    • PAT requirements
    • current capability matrix
    • known limitations if websocket/tool parity is partial
  • Dependencies: Sprint 5.1
  • Acceptance Criteria:
    • setup instructions are enough for a new user to reproduce the GitLab Duo flow
    • limitations are explicit
  • Validation:
    • dry-run docs review from a clean environment

Testing Strategy

  • Keep go test ./... green after every committable task.
  • Add table-driven tests first for request mapping, refresh behavior, and dynamic model registration.
  • Add transport tests with httptest.Server for:
    • real chunked streaming
    • header propagation from direct_access
    • upstream fallback rules
  • Add at least one manual acceptance checklist:
    • login via OAuth
    • login via PAT
    • list models
    • run one streaming prompt via OpenAI route
    • run one prompt from the target downstream client

Potential Risks & Gotchas

  • GitLab public docs expose direct_access, but do not fully document every possible AI gateway path. We should isolate any empirically discovered gateway assumptions behind one transport layer and feature flags.
  • chat/completions availability differs by GitLab offering and version. The executor must not assume it always exists.
  • Code Suggestions is completion-oriented; lossy mapping from rich chat/tool payloads will make GitLab Duo feel worse than codex unless explicitly handled.
  • Synthetic streaming is not good enough for codex parity and will cause regressions in interactive clients.
  • Dynamic model discovery can create unstable UX if the stable alias and discovered model IDs are not separated cleanly.
  • PAT auth may validate successfully while still lacking effective Duo permissions. Error reporting must surface this explicitly.

Rollback Plan

  • Keep the current basic GitLab executor behind a fallback mode until the new transport path is stable.
  • If parity work destabilizes existing providers, revert only GitLab-specific executor changes and leave auth support intact.
  • Preserve the stable gitlab-duo alias so rollback does not break client configuration.