Merge pull request #20 from Ravens2121/master

feat(kiro): 支持思考模型 (Thinking Mode) 并通过多配额故障转移增强稳定性
2026-03-09 15:25:17 +00:00 · 2025-12-12 16:09:28 +08:00
parent 35fdd7bc05 fdeb84db2b
commit 8dc690a638
9 changed files with 998 additions and 148 deletions
--- a/PR_DOCUMENTATION.md
+++ b/PR_DOCUMENTATION.md
@@ -0,0 +1,49 @@
+# PR Title / 拉取请求标题
+
+`feat(kiro): Add Thinking Mode support & enhance reliability with multi-quota failover`
+`feat(kiro): 支持思考模型 (Thinking Mode) 并通过多配额故障转移增强稳定性`
+
+---
+
+# PR Description / 拉取请求描述
+
+## 📝 Summary / 摘要
+
+This PR introduces significant upgrades to the Kiro (AWS CodeWhisperer/Amazon Q) module. It adds native support for **Thinking/Reasoning models** (similar to OpenAI o1/Claude 3.7), implements a robust **Multi-Endpoint Failover** system to handle rate limits (429), and optimizes configuration flexibility.
+
+本次 PR 对 Kiro (AWS CodeWhisperer/Amazon Q) 模块进行了重大升级。它增加了对 **思考/推理模型 (Thinking/Reasoning models)** 的原生支持（类似 OpenAI o1/Claude 3.7），实现了一套健壮的 **多端点故障转移 (Multi-Endpoint Failover)** 系统以应对速率限制 (429)，并优化了配置灵活性。
+
+## ✨ Key Changes / 主要变更
+
+### 1. 🧠 Thinking Mode Support / 思考模式支持
+- **OpenAI Compatibility**: Automatically maps OpenAI's `reasoning_effort` parameter (low/medium/high) to Claude's `budget_tokens` (4k/16k/32k).
+  - **OpenAI 兼容性**：自动将 OpenAI 的 `reasoning_effort` 参数（low/medium/high）映射为 Claude 的 `budget_tokens`（4k/16k/32k）。
+- **Stream Parsing**: Implemented advanced stream parsing logic to detect and extract content within `<thinking>...</thinking>` tags, even across chunk boundaries.
+  - **流式解析**：实现了高级流式解析逻辑，能够检测并提取 `<thinking>...</thinking>` 标签内的内容，即使标签跨越了数据块边界。
+- **Protocol Translation**: Converts Kiro's internal thinking content into OpenAI-compatible `reasoning_content` fields (for non-stream) or `thinking_delta` events (for stream).
+  - **协议转换**：将 Kiro 内部的思考内容转换为兼容 OpenAI 的 `reasoning_content` 字段（非流式）或 `thinking_delta` 事件（流式）。
+
+### 2. 🛡️ Robustness & Failover / 稳健性与故障转移
+- **Dual Quota System**: Explicitly defined `kiroEndpointConfig` to distinguish between **IDE (CodeWhisperer)** and **CLI (Amazon Q)** quotas.
+  - **双配额系统**：显式定义了 `kiroEndpointConfig` 结构，明确区分 **IDE (CodeWhisperer)** 和 **CLI (Amazon Q)** 的配额来源。
+- **Auto Failover**: Implemented automatic failover logic. If one endpoint returns `429 Too Many Requests`, the request seamlessly retries on the next available endpoint/quota.
+  - **自动故障转移**：实现了自动故障转移逻辑。如果一个端点返回 `429 Too Many Requests`，请求将无缝在下一个可用端点/配额上重试。
+- **Strict Protocol Compliance**: Enforced strict matching of `Origin` and `X-Amz-Target` headers for each endpoint to prevent `403 Forbidden` errors due to protocol mismatches.
+  - **严格协议合规**：强制每个端点严格匹配 `Origin` 和 `X-Amz-Target` 头信息，防止因协议不匹配导致的 `403 Forbidden` 错误。
+
+### 3. ⚙️ Configuration & Models / 配置与模型
+- **New Config Options**: Added `KiroPreferredEndpoint` (global) and `PreferredEndpoint` (per-key) settings to allow users to prioritize specific quotas (e.g., "ide" or "cli").
+  - **新配置项**：添加了 `KiroPreferredEndpoint`（全局）和 `PreferredEndpoint`（单 Key）设置，允许用户优先选择特定的配额（如 "ide" 或 "cli"）。
+- **Model Registry**: Normalized model IDs (replaced dots with hyphens) and added `-agentic` variants optimized for large code generation tasks.
+  - **模型注册表**：规范化了模型 ID（将点号替换为连字符），并添加了针对大型代码生成任务优化的 `-agentic` 变体。
+
+### 4. 🔧 Fixes / 修复
+- **AMP Proxy**: Downgraded client-side context cancellation logs from `Error` to `Debug` to reduce log noise.
+  - **AMP 代理**：将客户端上下文取消的日志级别从 `Error` 降级为 `Debug`，减少日志噪音。
+
+## ⚠️ Impact / 影响
+
+- **Authentication**: **No changes** to the login/OAuth process. Existing tokens work as is.
+- **认证**：登录/OAuth 流程 **无变更**。现有 Token 可直接使用。
+- **Compatibility**: Fully backward compatible. The new failover logic is transparent to the user.
+- **兼容性**：完全向后兼容。新的故障转移逻辑对用户是透明的。
--- a/internal/api/modules/amp/proxy.go
+++ b/internal/api/modules/amp/proxy.go
@@ -3,6 +3,8 @@ package amp
 import (
 	"bytes"
 	"compress/gzip"
+	"context"
+	"errors"
 	"fmt"
 	"io"
 	"net/http"
@@ -148,7 +150,13 @@ func createReverseProxy(upstreamURL string, secretSource SecretSource) (*httputi

 	// Error handler for proxy failures
 	proxy.ErrorHandler = func(rw http.ResponseWriter, req *http.Request, err error) {
-		log.Errorf("amp upstream proxy error for %s %s: %v", req.Method, req.URL.Path, err)
+		// Check if this is a client-side cancellation (normal behavior)
+		// Don't log as error for context canceled - it's usually client closing connection
+		if errors.Is(err, context.Canceled) {
+			log.Debugf("amp upstream proxy: client canceled request for %s %s", req.Method, req.URL.Path)
+		} else {
+			log.Errorf("amp upstream proxy error for %s %s: %v", req.Method, req.URL.Path, err)
+		}
 		rw.Header().Set("Content-Type", "application/json")
 		rw.WriteHeader(http.StatusBadGateway)
 		_, _ = rw.Write([]byte(`{"error":"amp_upstream_proxy_error","message":"Failed to reach Amp upstream"}`))
--- a/internal/api/server.go
+++ b/internal/api/server.go
@@ -349,6 +349,12 @@ func (s *Server) setupRoutes() {
 			},
 		})
 	})
+
+	// Event logging endpoint - handles Claude Code telemetry requests
+	// Returns 200 OK to prevent 404 errors in logs
+	s.engine.POST("/api/event_logging/batch", func(c *gin.Context) {
+		c.JSON(http.StatusOK, gin.H{"status": "ok"})
+	})
 	s.engine.POST("/v1internal:method", geminiCLIHandlers.CLIHandler)

 	// OAuth callback endpoints (reuse main server port)
--- a/internal/config/config.go
+++ b/internal/config/config.go
@@ -64,6 +64,10 @@ type Config struct {
 	// KiroKey defines a list of Kiro (AWS CodeWhisperer) configurations.
 	KiroKey []KiroKey `yaml:"kiro" json:"kiro"`

+	// KiroPreferredEndpoint sets the global default preferred endpoint for all Kiro providers.
+	// Values: "ide" (default, CodeWhisperer) or "cli" (Amazon Q).
+	KiroPreferredEndpoint string `yaml:"kiro-preferred-endpoint" json:"kiro-preferred-endpoint"`
+
 	// Codex defines a list of Codex API key configurations as specified in the YAML configuration file.
 	CodexKey []CodexKey `yaml:"codex-api-key" json:"codex-api-key"`

@@ -278,6 +282,10 @@ type KiroKey struct {
 	// AgentTaskType sets the Kiro API task type. Known values: "vibe", "dev", "chat".
 	// Leave empty to let API use defaults. Different values may inject different system prompts.
 	AgentTaskType string `yaml:"agent-task-type,omitempty" json:"agent-task-type,omitempty"`
+
+	// PreferredEndpoint sets the preferred Kiro API endpoint/quota.
+	// Values: "codewhisperer" (default, IDE quota) or "amazonq" (CLI quota).
+	PreferredEndpoint string `yaml:"preferred-endpoint,omitempty" json:"preferred-endpoint,omitempty"`
 }

 // OpenAICompatibility represents the configuration for OpenAI API compatibility
@@ -504,6 +512,7 @@ func (cfg *Config) SanitizeKiroKeys() {
 		entry.ProfileArn = strings.TrimSpace(entry.ProfileArn)
 		entry.Region = strings.TrimSpace(entry.Region)
 		entry.ProxyURL = strings.TrimSpace(entry.ProxyURL)
+		entry.PreferredEndpoint = strings.TrimSpace(entry.PreferredEndpoint)
 	}
 }

--- a/internal/registry/model_definitions.go
+++ b/internal/registry/model_definitions.go
@@ -884,8 +884,9 @@ func GetGitHubCopilotModels() []*ModelInfo {
 // GetKiroModels returns the Kiro (AWS CodeWhisperer) model definitions
 func GetKiroModels() []*ModelInfo {
 	return []*ModelInfo{
+		// --- Base Models ---
 		{
-			ID:                  "kiro-claude-opus-4.5",
+			ID:                  "kiro-claude-opus-4-5",
 			Object:              "model",
 			Created:             1732752000,
 			OwnedBy:             "aws",
@@ -896,7 +897,7 @@ func GetKiroModels() []*ModelInfo {
 			MaxCompletionTokens: 64000,
 		},
 		{
-			ID:                  "kiro-claude-sonnet-4.5",
+			ID:                  "kiro-claude-sonnet-4-5",
 			Object:              "model",
 			Created:             1732752000,
 			OwnedBy:             "aws",
@@ -918,7 +919,7 @@ func GetKiroModels() []*ModelInfo {
 			MaxCompletionTokens: 64000,
 		},
 		{
-			ID:                  "kiro-claude-haiku-4.5",
+			ID:                  "kiro-claude-haiku-4-5",
 			Object:              "model",
 			Created:             1732752000,
 			OwnedBy:             "aws",
@@ -928,21 +929,9 @@ func GetKiroModels() []*ModelInfo {
 			ContextLength:       200000,
 			MaxCompletionTokens: 64000,
 		},
-		// --- Chat Variant (No tool calling, for pure conversation) ---
-		{
-			ID:                  "kiro-claude-opus-4.5-chat",
-			Object:              "model",
-			Created:             1732752000,
-			OwnedBy:             "aws",
-			Type:                "kiro",
-			DisplayName:         "Kiro Claude Opus 4.5 (Chat)",
-			Description:         "Claude Opus 4.5 for chat only (no tool calling)",
-			ContextLength:       200000,
-			MaxCompletionTokens: 64000,
-		},
 		// --- Agentic Variants (Optimized for coding agents with chunked writes) ---
 		{
-			ID:                  "kiro-claude-opus-4.5-agentic",
+			ID:                  "kiro-claude-opus-4-5-agentic",
 			Object:              "model",
 			Created:             1732752000,
 			OwnedBy:             "aws",
@@ -953,7 +942,7 @@ func GetKiroModels() []*ModelInfo {
 			MaxCompletionTokens: 64000,
 		},
 		{
-			ID:                  "kiro-claude-sonnet-4.5-agentic",
+			ID:                  "kiro-claude-sonnet-4-5-agentic",
 			Object:              "model",
 			Created:             1732752000,
 			OwnedBy:             "aws",
@@ -963,6 +952,28 @@ func GetKiroModels() []*ModelInfo {
 			ContextLength:       200000,
 			MaxCompletionTokens: 64000,
 		},
+		{
+			ID:                  "kiro-claude-sonnet-4-agentic",
+			Object:              "model",
+			Created:             1732752000,
+			OwnedBy:             "aws",
+			Type:                "kiro",
+			DisplayName:         "Kiro Claude Sonnet 4 (Agentic)",
+			Description:         "Claude Sonnet 4 optimized for coding agents (chunked writes)",
+			ContextLength:       200000,
+			MaxCompletionTokens: 64000,
+		},
+		{
+			ID:                  "kiro-claude-haiku-4-5-agentic",
+			Object:              "model",
+			Created:             1732752000,
+			OwnedBy:             "aws",
+			Type:                "kiro",
+			DisplayName:         "Kiro Claude Haiku 4.5 (Agentic)",
+			Description:         "Claude Haiku 4.5 optimized for coding agents (chunked writes)",
+			ContextLength:       200000,
+			MaxCompletionTokens: 64000,
+		},
 	}
 }

--- a/internal/runtime/executor/kiro_executor.go
+++ b/internal/runtime/executor/kiro_executor.go
--- a/internal/translator/kiro/openai/chat-completions/kiro_openai_request.go
+++ b/internal/translator/kiro/openai/chat-completions/kiro_openai_request.go
@@ -10,9 +10,18 @@ import (
 	"github.com/tidwall/sjson"
 )

+// reasoningEffortToBudget maps OpenAI reasoning_effort values to Claude thinking budget_tokens.
+// OpenAI uses "low", "medium", "high" while Claude uses numeric budget_tokens.
+var reasoningEffortToBudget = map[string]int{
+	"low":    4000,
+	"medium": 16000,
+	"high":   32000,
+}
+
 // ConvertOpenAIRequestToKiro transforms an OpenAI Chat Completions API request into Kiro (Claude) format.
 // Kiro uses Claude-compatible format internally, so we primarily pass through to Claude format.
 // Supports tool calling: OpenAI tools -> Claude tools, tool_calls -> tool_use, tool messages -> tool_result.
+// Supports reasoning/thinking: OpenAI reasoning_effort -> Claude thinking parameter.
 func ConvertOpenAIRequestToKiro(modelName string, inputRawJSON []byte, stream bool) []byte {
 	rawJSON := bytes.Clone(inputRawJSON)
 	root := gjson.ParseBytes(rawJSON)
@@ -38,6 +47,26 @@ func ConvertOpenAIRequestToKiro(modelName string, inputRawJSON []byte, stream bo
 		out, _ = sjson.Set(out, "top_p", v.Float())
 	}

+	// Handle OpenAI reasoning_effort parameter -> Claude thinking parameter
+	// OpenAI format: {"reasoning_effort": "low"|"medium"|"high"}
+	// Claude format: {"thinking": {"type": "enabled", "budget_tokens": N}}
+	if v := root.Get("reasoning_effort"); v.Exists() {
+		effort := v.String()
+		if budget, ok := reasoningEffortToBudget[effort]; ok {
+			thinking := map[string]interface{}{
+				"type":          "enabled",
+				"budget_tokens": budget,
+			}
+			out, _ = sjson.Set(out, "thinking", thinking)
+		}
+	}
+
+	// Also support direct thinking parameter passthrough (for Claude API compatibility)
+	// Claude format: {"thinking": {"type": "enabled", "budget_tokens": N}}
+	if v := root.Get("thinking"); v.Exists() && v.IsObject() {
+		out, _ = sjson.Set(out, "thinking", v.Value())
+	}
+
 	// Convert OpenAI tools to Claude tools format
 	if tools := root.Get("tools"); tools.Exists() && tools.IsArray() {
 		claudeTools := make([]interface{}, 0)
--- a/internal/translator/kiro/openai/chat-completions/kiro_openai_response.go
+++ b/internal/translator/kiro/openai/chat-completions/kiro_openai_response.go
@@ -134,6 +134,28 @@ func convertClaudeEventToOpenAI(jsonStr string, model string) []string {
 				result, _ := json.Marshal(response)
 				results = append(results, string(result))
 			}
+		} else if deltaType == "thinking_delta" {
+			// Thinking/reasoning content delta - convert to OpenAI reasoning_content format
+			thinkingDelta := root.Get("delta.thinking").String()
+			if thinkingDelta != "" {
+				response := map[string]interface{}{
+					"id":      "chatcmpl-" + uuid.New().String()[:24],
+					"object":  "chat.completion.chunk",
+					"created": time.Now().Unix(),
+					"model":   model,
+					"choices": []map[string]interface{}{
+						{
+							"index": 0,
+							"delta": map[string]interface{}{
+								"reasoning_content": thinkingDelta,
+							},
+							"finish_reason": nil,
+						},
+					},
+				}
+				result, _ := json.Marshal(response)
+				results = append(results, string(result))
+			}
 		} else if deltaType == "input_json_delta" {
 			// Tool input delta (streaming arguments)
 			partialJSON := root.Get("delta.partial_json").String()
@@ -298,6 +320,7 @@ func ConvertKiroResponseToOpenAINonStream(ctx context.Context, model string, ori
 	root := gjson.ParseBytes(rawResponse)

 	var content string
+	var reasoningContent string
 	var toolCalls []map[string]interface{}

 	contentArray := root.Get("content")
@@ -306,6 +329,9 @@ func ConvertKiroResponseToOpenAINonStream(ctx context.Context, model string, ori
 			itemType := item.Get("type").String()
 			if itemType == "text" {
 				content += item.Get("text").String()
+			} else if itemType == "thinking" {
+				// Extract thinking/reasoning content
+				reasoningContent += item.Get("thinking").String()
 			} else if itemType == "tool_use" {
 				// Convert Claude tool_use to OpenAI tool_calls format
 				inputJSON := item.Get("input").String()
@@ -339,6 +365,11 @@ func ConvertKiroResponseToOpenAINonStream(ctx context.Context, model string, ori
 		"content": content,
 	}

+	// Add reasoning_content if present (OpenAI reasoning format)
+	if reasoningContent != "" {
+		message["reasoning_content"] = reasoningContent
+	}
+
 	// Add tool_calls if present
 	if len(toolCalls) > 0 {
 		message["tool_calls"] = toolCalls
--- a/internal/watcher/watcher.go
+++ b/internal/watcher/watcher.go
@@ -1317,6 +1317,12 @@ func (w *Watcher) SnapshotCoreAuths() []*coreauth.Auth {
 			if kk.AgentTaskType != "" {
 				attrs["agent_task_type"] = kk.AgentTaskType
 			}
+			if kk.PreferredEndpoint != "" {
+				attrs["preferred_endpoint"] = kk.PreferredEndpoint
+			} else if cfg.KiroPreferredEndpoint != "" {
+				// Apply global default if not overridden by specific key
+				attrs["preferred_endpoint"] = cfg.KiroPreferredEndpoint
+			}
 			if refreshToken != "" {
 				attrs["refresh_token"] = refreshToken
 			}
@@ -1532,6 +1538,17 @@ func (w *Watcher) SnapshotCoreAuths() []*coreauth.Auth {
 					a.NextRefreshAfter = expiresAt.Add(-30 * time.Minute)
 				}
 			}
+			
+			// Apply global preferred endpoint setting if not present in metadata
+			if cfg.KiroPreferredEndpoint != "" {
+				// Check if already set in metadata (which takes precedence in executor)
+				if _, hasMeta := metadata["preferred_endpoint"]; !hasMeta {
+					if a.Attributes == nil {
+						a.Attributes = make(map[string]string)
+					}
+					a.Attributes["preferred_endpoint"] = cfg.KiroPreferredEndpoint
+				}
+			}
 		}

 		applyAuthExcludedModelsMeta(a, cfg, nil, "oauth")