Commit Graph

274 Commits

Author SHA1 Message Date
Joao
7fd98f3556 feat: add IDC auth support with Kiro IDE headers 2025-12-23 08:18:10 +00:00
Luis Pater
b1aecc2bf1 Merge branch 'router-for-me:main' into main 2025-12-23 02:49:37 +08:00
Luis Pater
83b90e106f refactor(antigravity): add sandbox URL constant and update base URLs routine 2025-12-23 02:47:56 +08:00
Luis Pater
e755e567ea Merge branch 'router-for-me:main' into main 2025-12-21 19:54:13 +08:00
Luis Pater
63908869f6 Merge pull request #611 from soilSpoon/feature/antigravity
feat(antigravity): Improve Claude model compatibility
2025-12-21 16:27:29 +08:00
이대희
4070c9de81 Remove interleaved-thinking header from requests
Removes the addition of the "anthropic-beta: interleaved-thinking-2025-05-14" header for Claude thinking models when building HTTP requests.

This prevents sending an experimental/feature flag header that is no longer required and avoids potential compatibility or routing issues with downstream services. Keeps request headers simpler and more standard.
2025-12-21 15:29:36 +09:00
이대희
1e9e4a86a2 Improve thinking/tool signature handling for Claude and Gemini requests
Prefer cached signatures and avoid injecting dummy thinking blocks; instead remove unsigned thinking blocks and add a skip sentinel for tool calls without a valid signature. Generate stable session IDs from the first user message, apply schema cleaning only for Claude models, and reorder thinking parts so thinking appears first. For Gemini, remove thinking blocks and attach a skip sentinel to function calls. Simplify response handling by passing raw function args through (remove special Bash conversion). Update and add tests to reflect the new behavior.

These changes prevent rejected dummy signatures, improve compatibility with Antigravity’s signature validation, provide more stable session IDs for conversation grouping, and make request/response translation more robust.
2025-12-21 15:15:50 +09:00
Luis Pater
5418bbc338 Merge branch 'router-for-me:main' into main 2025-12-20 23:40:09 +08:00
Ben Vargas
1231dc9cda feat(antigravity): add payload config support to Antigravity executor
Add applyPayloadConfig calls to all Antigravity executor paths (Execute,
executeClaudeNonStream, ExecuteStream) to enable config.yaml payload
overrides for Antigravity/Gemini-Claude models.

This allows users to configure thinking budget and other parameters via
payload.override in config.yaml for models like gemini-claude-opus-4-5*.
2025-12-19 22:30:44 -07:00
Luis Pater
843316ea7a Merge branch 'router-for-me:main' into main 2025-12-19 22:24:26 +08:00
hkfires
2039062845 fix(gemini): add optional skip for gemini3 thinking conversion 2025-12-19 22:07:43 +08:00
이대희
b6ba15fcbd fix(runtime/executor): Antigravity executor schema handling and Claude-specific headers 2025-12-19 10:28:23 +09:00
Luis Pater
0f646800f6 Merge branch 'router-for-me:main' into main 2025-12-18 08:36:59 +08:00
Luis Pater
13eb5268de Merge pull request #582 from ben-vargas/fix-gemini-3-thinking-level
feat: use thinkingLevel for Gemini 3 models per Google documentation
2025-12-18 07:19:37 +08:00
Ben Vargas
598f0af19b fix: apply thinkingLevel from model suffix metadata for Gemini 3
The previous commit added thinkingLevel support but didn't apply it
when the reasoning effort came from model name suffix (e.g., model(minimal)).

This was because ResolveThinkingConfigFromMetadata returns nil for
level-based models, bypassing the metadata application.

Changes:
- Add ApplyGemini3ThinkingLevelFromMetadata for standard Gemini API
- Add ApplyGemini3ThinkingLevelFromMetadataCLI for CLI API format
- Update gemini_cli_executor to apply Gemini 3 thinkingLevel from metadata
- Update antigravity_executor to apply Gemini 3 thinkingLevel from metadata
- Update aistudio_executor to apply Gemini 3 thinkingLevel from metadata
- Add comprehensive test coverage for Gemini 3 thinkingLevel functions
2025-12-17 16:08:38 -07:00
Ben Vargas
a33f5d31fc feat: use thinkingLevel for Gemini 3 models per Google documentation
Per Google's official documentation, Gemini 3 models should use
thinkingLevel (string) instead of thinkingBudget (number) for
optimal performance.

From Google's Gemini Thinking docs:
> Use the thinkingLevel parameter with Gemini 3 models. While
> thinkingBudget is accepted for backwards compatibility, using
> it with Gemini 3 Pro may result in suboptimal performance.

Changes:
- Add model family detection functions (IsGemini3Model, IsGemini25Model,
  IsGemini3ProModel, IsGemini3FlashModel)
- Add ApplyGeminiThinkingLevel and ApplyGeminiCLIThinkingLevel functions
  for applying thinkingLevel config
- Add ValidateGemini3ThinkingLevel for model-specific level validation
- Add ThinkingBudgetToGemini3Level for backward compatibility conversion
- Update NormalizeGeminiThinkingBudget to convert budget to level for
  Gemini 3 models
- Update ApplyDefaultThinkingIfNeeded to not set a default level for
  Gemini 3 (lets API use its dynamic default "high")
- Update ConvertThinkingLevelToBudget to preserve thinkingLevel for
  Gemini 3 models
- Add Levels field to all Gemini 3 model definitions:
  - Gemini 3 Pro: ["low", "high"]
  - Gemini 3 Flash: ["minimal", "low", "medium", "high"]

Backward compatibility:
- Gemini 2.5 models continue to use thinkingBudget as before
- If thinkingBudget is provided for Gemini 3, it's converted to the
  appropriate thinkingLevel
- Existing configurations continue to work
2025-12-17 15:28:20 -07:00
Ravens2121
54acd69e9d Merge branch 'router-for-me:main' into master 2025-12-18 04:39:17 +08:00
Ravens2121
d687ee2777 feat(kiro): implement official reasoningContentEvent and improve metadat 2025-12-18 04:38:22 +08:00
Luis Pater
f7b17ee6ec Merge pull request #36 from rezhajulio/feat/gpt-5.2
Add GPT-5.2 model support for GitHub Copilot
2025-12-18 03:16:25 +08:00
Luis Pater
408614c74c Merge branch 'router-for-me:main' into main 2025-12-18 03:13:48 +08:00
Luis Pater
68a27772b3 feat(antigravity): enable token counting via API with resilient routing
Introduces the capability to count tokens for Antigravity-backed requests. This implementation leverages the `countTokens` endpoint of the Antigravity API, replacing the prior unsupported stub.

Key aspects of this update include:

- **API Integration**: Direct integration with the Antigravity `countTokens` API, including necessary request payload translation and authentication.
- **Resilient Infrastructure**: A fallback mechanism has been established, allowing the system to attempt connections across multiple Antigravity base URLs to ensure request success even in the event of temporary service interruptions.
- **Model Aliasing**: Added mappings for `gemini-3-flash` and `gemini-3-flash-preview` to ensure compatibility with the latest model variants.
- **Robust Error Handling**: Comprehensive error handling and logging are in place to manage failures during API interactions.
2025-12-18 03:12:46 +08:00
Luis Pater
10e0ea1309 Merge main into pr-39 2025-12-18 00:36:51 +08:00
Luis Pater
5fda6f8ef3 feat(antigravity): implement non-streaming execution for Claude model requests 2025-12-17 23:17:11 +08:00
Luis Pater
09923f654c feat(antigravity): add streaming support for Claude model requests 2025-12-17 22:16:57 +08:00
이대희
1b8e538a77 feature: Improves Gemini JSON schema compatibility
Enhances compatibility with the Gemini API by implementing a schema cleaning process.

This includes:
- Centralizing schema cleaning logic for Gemini in a dedicated utility function.
- Converting unsupported schema keywords to hints within the description field.
- Flattening complex schema structures like `anyOf`, `oneOf`, and type arrays to simplify the schema.
- Handling streaming responses with empty tool names, which can occur in subsequent chunks after the initial tool use.
2025-12-17 17:10:53 +09:00
Rezha Julio
92c62bb2fb Add GPT-5.2 model support for GitHub Copilot 2025-12-17 02:15:10 +07:00
Luis Pater
1efade8bdb Merge branch 'main' into plus 2025-12-17 02:50:14 +08:00
hkfires
28a428ae2f fix(thinking): align budget effort mapping across translators
Unify thinking budget-to-effort conversion in a shared helper, handle disabled/default thinking cases in translators, adjust zero-budget mapping, and drop the old OpenAI-specific helper with updated tests.
2025-12-16 18:34:43 +08:00
hkfires
b326ec3641 feat(iflow): add thinking support for iFlow models 2025-12-16 18:34:43 +08:00
Ravens2121
f3d1cc8dc1 chore: change debug logs from INFO to DEBUG level 2025-12-16 05:32:03 +08:00
Ravens2121
0a3a95521c feat: enhance thinking mode support for Kiro translator
Changes:
2025-12-16 05:01:40 +08:00
Luis Pater
407020de0c Merge branch 'router-for-me:main' into main 2025-12-15 10:36:39 +08:00
hkfires
367a05bdf6 refactor(thinking): export thinking helpers
Expose thinking/effort normalization helpers from the executor package
so conversion tests use production code and stay aligned with runtime
validation behavior.
2025-12-15 09:16:15 +08:00
hkfires
712ce9f781 fix(thinking): drop unsupported none effort
When budget 0 maps to "none" for models that use thinking levels
but don't support that effort level, strip thinking fields instead
of setting an invalid reasoning_effort value.
Tests now expect removal for this edge case.
2025-12-15 09:16:14 +08:00
hkfires
27a5ad8ec2 Fixed: #534
fix(aistudio): correct JSON string boundary detection for backslash sequences
2025-12-15 09:00:14 +08:00
Ravens2121
de0ea3ac49 fix(kiro): Always parse thinking tags from Kiro API responses
Amp-Thread-ID: https://ampcode.com/threads/T-019b1c00-17b4-713d-a8cc-813b71181934
Co-authored-by: Amp <amp@ampcode.com>
2025-12-14 16:46:17 +08:00
Ravens2121
12116b018d Merge branch 'router-for-me:main' into master 2025-12-14 16:42:30 +08:00
Ravens2121
c3ed3b40ea feat(kiro): Add token usage cross-validation and simplify thinking mode handling 2025-12-14 16:40:33 +08:00
Luis Pater
b80c2aabb0 Merge branch 'router-for-me:main' into main 2025-12-14 16:19:29 +08:00
Luis Pater
14ce6aebd1 Merge pull request #449 from sususu98/fix/gemini-cli-429-retry-delay-parsing
fix(gemini-cli): enhance 429 retry delay parsing
2025-12-14 14:04:14 +08:00
Ravens2121
9c04c18c04 feat(kiro): enhance request translation and fix streaming issues
English:
- Fix <thinking> tag parsing: only parse at response start, avoid misinterpreting discussion text
- Add token counting support using tiktoken for local estimation
- Support top_p parameter in inference config
- Handle max_tokens=-1 as maximum (32000 tokens)
- Add tool_choice and response_format parameter handling via system prompt hints
- Support multiple thinking mode detection formats (Claude API, OpenAI reasoning_effort, AMP/Cursor)
- Shorten MCP tool names exceeding 64 characters
- Fix duplicate [DONE] marker in OpenAI SSE streaming
- Enhance token usage statistics with multiple event format support
- Add code fence markers to constants

中文:
- 修复 <thinking> 标签解析:仅在响应开头解析,避免误解析讨论文本中的标签
- 使用 tiktoken 实现本地 token 计数功能
- 支持 top_p 推理配置参数
- 处理 max_tokens=-1 转换为最大值(32000 tokens)
- 通过系统提示词注入实现 tool_choice 和 response_format 参数支持
- 支持多种思考模式检测格式(Claude API、OpenAI reasoning_effort、AMP/Cursor)
- 截断超过64字符的 MCP 工具名称
- 修复 OpenAI SSE 流中重复的 [DONE] 标记
- 增强 token 使用量统计,支持多种事件格式
- 添加代码围栏标记常量
2025-12-14 11:57:16 +08:00
Ravens2121
81ae09d0ec Merge branch 'kiro-refactor-backup' 2025-12-14 07:03:24 +08:00
Ravens2121
01cf221167 feat(kiro): 代码优化重构 + OpenAI翻译器实现 2025-12-14 06:58:50 +08:00
Luis Pater
79033aee34 Merge branch 'main' into plus 2025-12-14 00:07:46 +08:00
Ravens2121
1ea0cff3a4 fix: add missing import declarations for net and time packages 2025-12-13 12:57:47 +08:00
Ravens2121
75793a18f0 feat(kiro): Add Kiro OAuth login entry and auth file filter in Web UI
为Kiro供应商添加WEB UI OAuth登录入口和认证文件过滤器

## Changes / 更改内容

### Frontend / 前端 (management.html)
- Add Kiro OAuth card UI with support for AWS Builder ID, Google, and GitHub login methods
- 添加Kiro OAuth卡片UI,支持AWS Builder ID、Google和GitHub三种登录方式
- Add i18n translations for Kiro OAuth (Chinese and English)
- 添加Kiro OAuth的中英文国际化翻译
- Add Kiro filter button in auth files management page
- 在认证文件管理页面添加Kiro过滤按钮
- Implement JavaScript methods: startKiroOAuth(), openKiroLink(), copyKiroLink(), copyKiroDeviceCode(), startKiroOAuthPolling(), resetKiroOAuthUI()
- 实现JavaScript方法:startKiroOAuth()、openKiroLink()、copyKiroLink()、copyKiroDeviceCode()、startKiroOAuthPolling()、resetKiroOAuthUI()

### Backend / 后端
- Add /kiro-auth-url endpoint for Kiro OAuth authentication (auth_files.go)
- 添加/kiro-auth-url端点用于Kiro OAuth认证 (auth_files.go)
- Fix GetAuthStatus() to correctly parse device_code and auth_url status
- 修复GetAuthStatus()以正确解析device_code和auth_url状态
- Change status delimiter from ':' to '|' to avoid URL parsing issues
- 将状态分隔符从':'改为'|'以避免URL解析问题
- Export CreateToken method in social_auth.go
- 在social_auth.go中导出CreateToken方法
- Register Kiro OAuth routes in server.go
- 在server.go中注册Kiro OAuth路由

## Files Modified / 修改的文件
- management.html
- internal/api/handlers/management/auth_files.go
- internal/api/server.go
- internal/auth/kiro/social_auth.go
2025-12-13 11:39:22 +08:00
Ravens2121
58866b21cb feat: optimize connection pooling and improve Kiro executor reliability
## 中文说明

### 连接池优化
- 为 AMP 代理、SOCKS5 代理和 HTTP 代理配置优化的连接池参数
- MaxIdleConnsPerHost 从默认的 2 增加到 20,支持更多并发用户
- MaxConnsPerHost 设为 0(无限制),避免连接瓶颈
- 添加 IdleConnTimeout (90s) 和其他超时配置

### Kiro 执行器增强
- 添加 Event Stream 消息解析的边界保护,防止越界访问
- 实现实时使用量估算(每 5000 字符或 15 秒发送 ping 事件)
- 正确从上游事件中提取并传递 stop_reason
- 改进输入 token 计算,优先使用 Claude 格式解析
- 添加 max_tokens 截断警告日志

### Token 计算改进
- 添加 tokenizer 缓存(sync.Map)避免重复创建
- 为 Claude/Kiro/AmazonQ 模型添加 1.1 调整因子
- 新增 countClaudeChatTokens 函数支持 Claude API 格式
- 支持图像 token 估算(基于尺寸计算)

### 认证刷新优化
- RefreshLead 从 30 分钟改为 5 分钟,与 Antigravity 保持一致
- 修复 NextRefreshAfter 设置,防止频繁刷新检查
- refreshFailureBackoff 从 5 分钟改为 1 分钟,加快失败恢复

---

## English Description

### Connection Pool Optimization
- Configure optimized connection pool parameters for AMP proxy, SOCKS5 proxy, and HTTP proxy
- Increase MaxIdleConnsPerHost from default 2 to 20 to support more concurrent users
- Set MaxConnsPerHost to 0 (unlimited) to avoid connection bottlenecks
- Add IdleConnTimeout (90s) and other timeout configurations

### Kiro Executor Enhancements
- Add boundary protection for Event Stream message parsing to prevent out-of-bounds access
- Implement real-time usage estimation (send ping events every 5000 chars or 15 seconds)
- Correctly extract and pass stop_reason from upstream events
- Improve input token calculation, prioritize Claude format parsing
- Add max_tokens truncation warning logs

### Token Calculation Improvements
- Add tokenizer cache (sync.Map) to avoid repeated creation
- Add 1.1 adjustment factor for Claude/Kiro/AmazonQ models
- Add countClaudeChatTokens function to support Claude API format
- Support image token estimation (calculated based on dimensions)

### Authentication Refresh Optimization
- Change RefreshLead from 30 minutes to 5 minutes, consistent with Antigravity
- Fix NextRefreshAfter setting to prevent frequent refresh checks
- Change refreshFailureBackoff from 5 minutes to 1 minute for faster failure recovery
2025-12-13 10:21:40 +08:00
Luis Pater
660aabc437 fix(executor): add allowCompat support for reasoning effort normalization
Introduced `allowCompat` parameter to improve compatibility handling for reasoning effort in payloads across OpenAI and similar models.
2025-12-13 04:06:02 +08:00
Ravens2121
db80b20bc2 feat(kiro): enhance thinking support and fix truncation issues
- **Thinking Support**:
    - Enabled thinking support for all Kiro Claude models, including Haiku 4.5 and agentic variants.
    - Updated `model_definitions.go` with thinking configuration (Min: 1024, Max: 32000, ZeroAllowed: true).
    - Fixed `extended_thinking` field names in `model_registry.go` (from `min_budget`/`max_budget` to `min`/`max`) to comply with Claude API specs, enabling thinking control in clients like Claude Code.

- **Kiro Executor Fixes**:
    - Fixed `budget_tokens` handling: explicitly disable thinking when budget is 0 or negative.
    - Removed aggressive duplicate content filtering logic that caused truncation/data loss.
    - Enhanced thinking tag parsing with `extractThinkingFromContent` to correctly handle interleaved thinking/text blocks.
    - Added EOF handling to flush pending thinking tag characters, preventing data loss at stream end.

- **Performance**:
    - Optimized Claude stream handler (v6.2) with reduced buffer size (4KB) and faster flush interval (50ms) to minimize latency and prevent timeouts.
2025-12-13 03:57:13 +08:00
Luis Pater
f3f0f1717d Merge branch 'dev' into think 2025-12-12 22:16:44 +08:00