LLM/CLIProxyAPIPlus

Fork 0

mirror of https://github.com/router-for-me/CLIProxyAPIPlus.git synced 2026-03-27 22:27:28 +00:00

Files

LuxVTZ c631df8c3b feat(gitlab): add duo streaming transport

2026-03-10 22:19:36 +04:00

17 KiB

Raw Blame History

Plan: GitLab Duo Codex Parity

Generated: 2026-03-10 Estimated Complexity: High

Overview

Bring GitLab Duo support from the current "auth + basic executor" stage to the same practical level as codex inside CLIProxyAPI: a user logs in once, points external clients such as Claude Code at CLIProxyAPI, selects GitLab Duo-backed models, and gets stable streaming, multi-turn behavior, tool calling compatibility, and predictable model routing without manual provider-specific workarounds.

The core architectural shift is to stop treating GitLab Duo as only two REST wrappers (/api/v4/chat/completions and /api/v4/code_suggestions/completions) and instead use GitLab's direct_access contract as the primary runtime entrypoint wherever possible. Official GitLab docs confirm that direct_access returns AI gateway connection details, headers, token, and expiry; that contract is the closest path to codex-like provider behavior.

Prerequisites

Official GitLab Duo API references confirmed during implementation:
- POST /api/v4/code_suggestions/direct_access
- POST /api/v4/code_suggestions/completions
- POST /api/v4/chat/completions
Access to at least one real GitLab Duo account for manual verification.
One downstream client target for acceptance testing:
- Claude Code against Claude-compatible endpoint
- OpenAI-compatible client against /v1/chat/completions and /v1/responses
Existing PR branch as starting point:
- feat/gitlab-duo-auth
- PR #2028

Definition Of Done

GitLab Duo models can be used via CLIProxyAPI from the same client surfaces that already work for codex.
Upstream streaming is real passthrough or faithful chunked forwarding, not synthetic whole-response replay.
Tool/function calling survives translation layers without dropping fields or corrupting names.
Multi-turn and session semantics are stable across chat/completions, responses, and Claude-compatible routes.
Model exposure stays current from GitLab metadata or gateway discovery without hardcoded stale model tables.
go test ./... stays green and at least one real manual end-to-end client flow is documented.

Sprint 1: Contract And Gap Closure

Goal: Replace assumptions with a hard compatibility contract between current codex behavior and what GitLab Duo can actually support.

Demo/Validation:

Written matrix showing codex features vs current GitLab Duo behavior.
One checked-in developer note or test fixture for real GitLab Duo payload examples.

Task 1.1: Freeze Codex Parity Checklist

Location: internal/runtime/executor/codex_executor.go, internal/runtime/executor/codex_websockets_executor.go, sdk/api/handlers/openai/openai_responses_handlers.go, sdk/api/handlers/openai/openai_responses_websocket.go
Description: Produce a concrete feature matrix for codex: HTTP execute, SSE execute, /v1/responses, websocket downstream path, tool calling, request IDs, session close semantics, and model registration behavior.
Dependencies: None
Acceptance Criteria:
- A checklist exists in repo docs or issue notes.
- Each capability is marked required, optional, or not possible for GitLab Duo.
Validation:
- Review against current codex code paths.

Task 1.2: Lock GitLab Duo Runtime Contract

Location: internal/auth/gitlab/gitlab.go, internal/runtime/executor/gitlab_executor.go
Description: Validate the exact upstream contract we can rely on:
- direct_access fields and refresh cadence
- whether AI gateway path is usable directly
- when chat/completions is available vs when fallback is required
- what streaming shape is returned by code_suggestions/completions?stream=true
Dependencies: Task 1.1
Acceptance Criteria:
- GitLab transport decision is explicit: gateway-first, REST-first, or hybrid.
- Unknown areas are isolated behind feature flags, not spread across executor logic.
Validation:
- Official docs + captured real responses from a Duo account.

Task 1.3: Define Client-Facing Compatibility Targets

Location: README.md, gitlab-duo-codex-parity-plan.md
Description: Define exactly which external flows must work to call GitLab Duo support "like codex".
Dependencies: Task 1.2
Acceptance Criteria:
- Required surfaces are listed:
  - Claude-compatible route
  - OpenAI chat/completions
  - OpenAI responses
  - optional downstream websocket path
- Non-goals are explicit if GitLab upstream cannot support them.
Validation:
- Maintainer review of stated scope.

Sprint 2: Primary Transport Parity

Goal: Move GitLab Duo execution onto a transport that supports codex-like runtime behavior.

Demo/Validation:

A GitLab Duo model works over real streaming through /v1/chat/completions.
No synthetic "collect full body then fake stream" path remains on the primary flow.

Task 2.1: Refactor GitLab Executor Into Strategy Layers

Location: internal/runtime/executor/gitlab_executor.go
Description: Split current executor into explicit strategies:
- auth refresh/direct access refresh
- gateway transport
- GitLab REST fallback transport
- downstream translation helpers
Dependencies: Sprint 1
Acceptance Criteria:
- Executor no longer mixes discovery, refresh, fallback selection, and response synthesis in one path.
- Transport choice is testable in isolation.
Validation:
- Unit tests for strategy selection and fallback boundaries.

Task 2.2: Implement Real Streaming Path

Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/gitlab_executor_test.go
Description: Replace synthetic streaming with true upstream incremental forwarding:
- use gateway stream if available
- otherwise consume GitLab Code Suggestions streaming response and map chunks incrementally
Dependencies: Task 2.1
Acceptance Criteria:
- ExecuteStream emits chunks before upstream completion.
- error handling preserves status and early failure semantics.
Validation:
- tests with chunked upstream server
- manual curl check against /v1/chat/completions with stream=true

Task 2.3: Preserve Upstream Auth And Headers Correctly

Location: internal/runtime/executor/gitlab_executor.go, internal/auth/gitlab/gitlab.go
Description: Use direct_access connection details as first-class transport state:
- gateway token
- expiry
- mandatory forwarded headers
- model metadata
Dependencies: Task 2.1
Acceptance Criteria:
- executor stops ignoring gateway headers/token when transport requires them
- refresh logic never over-fetches direct_access
Validation:
- tests verifying propagated headers and refresh interval behavior

Sprint 3: Request/Response Semantics Parity

Goal: Make GitLab Duo behave correctly under the same request shapes that current codex consumers send.

Demo/Validation:

OpenAI and Claude-compatible clients can do non-streaming and streaming conversations without losing structure.

Task 3.1: Normalize Multi-Turn Message Mapping

Location: internal/runtime/executor/gitlab_executor.go, sdk/translator
Description: Replace the current "flatten prompt into one instruction" behavior with stable multi-turn mapping:
- preserve system context
- preserve user/assistant ordering
- maintain bounded context truncation
Dependencies: Sprint 2
Acceptance Criteria:
- multi-turn requests are not collapsed into a lossy single string unless fallback mode explicitly requires it
- truncation policy is deterministic and tested
Validation:
- golden tests for request mapping

Task 3.2: Tool Calling Compatibility Layer

Location: internal/runtime/executor/gitlab_executor.go, sdk/api/handlers/openai/openai_responses_handlers.go
Description: Decide and implement one of two paths:
- native pass-through if GitLab gateway supports tool/function structures
- strict downgrade path with explicit unsupported errors instead of silent field loss
Dependencies: Task 3.1
Acceptance Criteria:
- tool-related fields are either preserved correctly or rejected explicitly
- no silent corruption of tool names, tool calls, or tool results
Validation:
- table-driven tests for tool payloads
- one manual client scenario using tools

Task 3.3: Token Counting And Usage Reporting Fidelity

Location: internal/runtime/executor/gitlab_executor.go, internal/runtime/executor/usage_helpers.go
Description: Improve token/usage reporting so GitLab models behave like first-class providers in logs and scheduling.
Dependencies: Sprint 2
Acceptance Criteria:
- CountTokens uses the closest supported estimation path
- usage logging distinguishes prompt vs completion when possible
Validation:
- unit tests for token estimation outputs

Sprint 4: Responses And Session Parity

Goal: Reach codex-level support for OpenAI Responses clients and long-lived sessions where GitLab upstream permits it.

Demo/Validation:

/v1/responses works with GitLab Duo in a realistic client flow.
If websocket parity is not possible, the code explicitly declines it and keeps HTTP paths stable.

Task 4.1: Make GitLab Compatible With `/v1/responses`

Location: sdk/api/handlers/openai/openai_responses_handlers.go, internal/runtime/executor/gitlab_executor.go
Description: Ensure GitLab transport can safely back the Responses API path, including compact responses if applicable.
Dependencies: Sprint 3
Acceptance Criteria:
- GitLab Duo can be selected behind /v1/responses
- response IDs and follow-up semantics are defined
Validation:
- handler tests analogous to codex/openai responses tests

Task 4.2: Evaluate Downstream Websocket Parity

Location: sdk/api/handlers/openai/openai_responses_websocket.go, internal/runtime/executor/gitlab_executor.go
Description: Decide whether GitLab Duo can support downstream websocket sessions like codex:
- if yes, add session-aware execution path
- if no, mark GitLab auth as websocket-ineligible and keep HTTP routes first-class
Dependencies: Task 4.1
Acceptance Criteria:
- websocket behavior is explicit, not accidental
- no route claims websocket support when the upstream cannot honor it
Validation:
- websocket handler tests or explicit capability tests

Task 4.3: Add Session Cleanup And Failure Recovery Semantics

Location: internal/runtime/executor/gitlab_executor.go, sdk/cliproxy/auth/conductor.go
Description: Add codex-like session cleanup, retry boundaries, and model suspension/resume behavior for GitLab failures and quota events.
Dependencies: Sprint 2
Acceptance Criteria:
- auth/model cooldown behavior is predictable on GitLab 4xx/5xx/quota responses
- executor cleans up per-session resources if any are introduced
Validation:
- tests for quota and retry behavior

Sprint 5: Client UX, Model UX, And Manual E2E

Goal: Make GitLab Duo feel like a normal built-in provider to operators and downstream clients.

Demo/Validation:

A documented setup exists for "login once, point Claude Code at CLIProxyAPI, use GitLab Duo-backed model".

Task 5.1: Model Alias And Provider UX Cleanup

Location: sdk/cliproxy/service.go, README.md
Description: Normalize what users see:
- stable alias such as gitlab-duo
- discovered upstream model names
- optional prefix behavior
- account labels that clearly distinguish OAuth vs PAT
Dependencies: Sprint 3
Acceptance Criteria:
- users can select a stable GitLab alias even when upstream model changes
- dynamic model discovery does not cause confusing model churn
Validation:
- registry tests and manual /v1/models inspection

Task 5.2: Add Real End-To-End Acceptance Tests

Location: internal/runtime/executor/gitlab_executor_test.go, sdk/api/handlers/openai
Description: Add higher-level tests covering the actual proxy surfaces:
- OpenAI chat/completions
- OpenAI responses
- Claude-compatible request path if GitLab is routed there
Dependencies: Sprint 4
Acceptance Criteria:
- tests fail if streaming regresses into synthetic buffering again
- tests cover at least one tool-related request and one multi-turn request
Validation:
- go test ./...

Task 5.3: Publish Operator Documentation

Location: README.md
Description: Document:
- OAuth setup requirements
- PAT requirements
- current capability matrix
- known limitations if websocket/tool parity is partial
Dependencies: Sprint 5.1
Acceptance Criteria:
- setup instructions are enough for a new user to reproduce the GitLab Duo flow
- limitations are explicit
Validation:
- dry-run docs review from a clean environment

Testing Strategy

Keep go test ./... green after every committable task.
Add table-driven tests first for request mapping, refresh behavior, and dynamic model registration.
Add transport tests with httptest.Server for:
- real chunked streaming
- header propagation from direct_access
- upstream fallback rules
Add at least one manual acceptance checklist:
- login via OAuth
- login via PAT
- list models
- run one streaming prompt via OpenAI route
- run one prompt from the target downstream client

Potential Risks & Gotchas

GitLab public docs expose direct_access, but do not fully document every possible AI gateway path. We should isolate any empirically discovered gateway assumptions behind one transport layer and feature flags.
chat/completions availability differs by GitLab offering and version. The executor must not assume it always exists.
Code Suggestions is completion-oriented; lossy mapping from rich chat/tool payloads will make GitLab Duo feel worse than codex unless explicitly handled.
Synthetic streaming is not good enough for codex parity and will cause regressions in interactive clients.
Dynamic model discovery can create unstable UX if the stable alias and discovered model IDs are not separated cleanly.
PAT auth may validate successfully while still lacking effective Duo permissions. Error reporting must surface this explicitly.

Rollback Plan

Keep the current basic GitLab executor behind a fallback mode until the new transport path is stable.
If parity work destabilizes existing providers, revert only GitLab-specific executor changes and leave auth support intact.
Preserve the stable gitlab-duo alias so rollback does not break client configuration.

17 KiB Raw Blame History

Plan: GitLab Duo Codex Parity

Overview

Prerequisites

Definition Of Done

Sprint 1: Contract And Gap Closure

Task 1.1: Freeze Codex Parity Checklist

Task 1.2: Lock GitLab Duo Runtime Contract

Task 1.3: Define Client-Facing Compatibility Targets

Sprint 2: Primary Transport Parity

Task 2.1: Refactor GitLab Executor Into Strategy Layers

Task 2.2: Implement Real Streaming Path

Task 2.3: Preserve Upstream Auth And Headers Correctly

Sprint 3: Request/Response Semantics Parity

Task 3.1: Normalize Multi-Turn Message Mapping

Task 3.2: Tool Calling Compatibility Layer

Task 3.3: Token Counting And Usage Reporting Fidelity

Sprint 4: Responses And Session Parity

Task 4.1: Make GitLab Compatible With /v1/responses

Task 4.2: Evaluate Downstream Websocket Parity

Task 4.3: Add Session Cleanup And Failure Recovery Semantics

Sprint 5: Client UX, Model UX, And Manual E2E

Task 5.1: Model Alias And Provider UX Cleanup

Task 5.2: Add Real End-To-End Acceptance Tests

Task 5.3: Publish Operator Documentation

Testing Strategy

Potential Risks & Gotchas

Rollback Plan

17 KiB

Raw Blame History

Task 4.1: Make GitLab Compatible With `/v1/responses`