Merge branch 'router-for-me:main' into main

Merge pull request #119 from linlang781/main
支持Kiro sso idc
2026-03-12 16:53:18 +00:00 · 2026-01-21 22:12:28 +08:00 · 2026-01-21 22:11:58 +08:00 · 2026-01-21 21:38:47 +08:00 · 2026-01-21 21:09:34 +08:00 · 2026-01-21 21:07:24 +08:00
50 changed files with 7304 additions and 618 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -50,3 +50,4 @@ _bmad-output/*
 # macOS
 .DS_Store
 ._*
+*.bak
--- a/README.md
+++ b/README.md
@@ -13,6 +13,82 @@ The Plus release stays in lockstep with the mainline features.
 - Added GitHub Copilot support (OAuth login), provided by [em4go](https://github.com/em4go/CLIProxyAPI/tree/feature/github-copilot-auth)
 - Added Kiro (AWS CodeWhisperer) support (OAuth login), provided by [fuko2935](https://github.com/fuko2935/CLIProxyAPI/tree/feature/kiro-integration), [Ravens2121](https://github.com/Ravens2121/CLIProxyAPIPlus/)

+## New Features (Plus Enhanced)
+
+- **OAuth Web Authentication**: Browser-based OAuth login for Kiro with beautiful web UI
+- **Rate Limiter**: Built-in request rate limiting to prevent API abuse
+- **Background Token Refresh**: Automatic token refresh 10 minutes before expiration
+- **Metrics & Monitoring**: Request metrics collection for monitoring and debugging
+- **Device Fingerprint**: Device fingerprint generation for enhanced security
+- **Cooldown Management**: Smart cooldown mechanism for API rate limits
+- **Usage Checker**: Real-time usage monitoring and quota management
+- **Model Converter**: Unified model name conversion across providers
+- **UTF-8 Stream Processing**: Improved streaming response handling
+
+## Kiro Authentication
+
+### Web-based OAuth Login
+
+Access the Kiro OAuth web interface at:
+
+```
+http://your-server:8080/v0/oauth/kiro
+```
+
+This provides a browser-based OAuth flow for Kiro (AWS CodeWhisperer) authentication with:
+- AWS Builder ID login
+- AWS Identity Center (IDC) login
+- Token import from Kiro IDE
+
+## Quick Deployment with Docker
+
+### One-Command Deployment
+
+```bash
+# Create deployment directory
+mkdir -p ~/cli-proxy && cd ~/cli-proxy
+
+# Create docker-compose.yml
+cat > docker-compose.yml << 'EOF'
+services:
+  cli-proxy-api:
+    image: 17600006524/cli-proxy-api-plus:latest
+    container_name: cli-proxy-api-plus
+    ports:
+      - "8317:8317"
+    volumes:
+      - ./config.yaml:/CLIProxyAPI/config.yaml
+      - ./auths:/root/.cli-proxy-api
+      - ./logs:/CLIProxyAPI/logs
+    restart: unless-stopped
+EOF
+
+# Download example config
+curl -o config.yaml https://raw.githubusercontent.com/linlang781/CLIProxyAPIPlus/main/config.example.yaml
+
+# Pull and start
+docker compose pull && docker compose up -d
+```
+
+### Configuration
+
+Edit `config.yaml` before starting:
+
+```yaml
+# Basic configuration example
+server:
+  port: 8317
+
+# Add your provider configurations here
+```
+
+### Update to Latest Version
+
+```bash
+cd ~/cli-proxy
+docker compose pull && docker compose up -d
+```
+
 ## Contributing

 This project only accepts pull requests that relate to third-party provider support. Any pull requests unrelated to third-party provider support will be rejected.
--- a/README_CN.md
+++ b/README_CN.md
@@ -13,6 +13,82 @@
 - 新增 GitHub Copilot 支持（OAuth 登录），由[em4go](https://github.com/em4go/CLIProxyAPI/tree/feature/github-copilot-auth)提供
 - 新增 Kiro (AWS CodeWhisperer) 支持 (OAuth 登录), 由[fuko2935](https://github.com/fuko2935/CLIProxyAPI/tree/feature/kiro-integration)、[Ravens2121](https://github.com/Ravens2121/CLIProxyAPIPlus/)提供

+## 新增功能 (Plus 增强版)
+
+- **OAuth Web 认证**: 基于浏览器的 Kiro OAuth 登录，提供美观的 Web UI
+- **请求限流器**: 内置请求限流，防止 API 滥用
+- **后台令牌刷新**: 过期前 10 分钟自动刷新令牌
+- **监控指标**: 请求指标收集，用于监控和调试
+- **设备指纹**: 设备指纹生成，增强安全性
+- **冷却管理**: 智能冷却机制，应对 API 速率限制
+- **用量检查器**: 实时用量监控和配额管理
+- **模型转换器**: 跨供应商的统一模型名称转换
+- **UTF-8 流处理**: 改进的流式响应处理
+
+## Kiro 认证
+
+### 网页端 OAuth 登录
+
+访问 Kiro OAuth 网页认证界面：
+
+```
+http://your-server:8080/v0/oauth/kiro
+```
+
+提供基于浏览器的 Kiro (AWS CodeWhisperer) OAuth 认证流程，支持：
+- AWS Builder ID 登录
+- AWS Identity Center (IDC) 登录
+- 从 Kiro IDE 导入令牌
+
+## Docker 快速部署
+
+### 一键部署
+
+```bash
+# 创建部署目录
+mkdir -p ~/cli-proxy && cd ~/cli-proxy
+
+# 创建 docker-compose.yml
+cat > docker-compose.yml << 'EOF'
+services:
+  cli-proxy-api:
+    image: 17600006524/cli-proxy-api-plus:latest
+    container_name: cli-proxy-api-plus
+    ports:
+      - "8317:8317"
+    volumes:
+      - ./config.yaml:/CLIProxyAPI/config.yaml
+      - ./auths:/root/.cli-proxy-api
+      - ./logs:/CLIProxyAPI/logs
+    restart: unless-stopped
+EOF
+
+# 下载示例配置
+curl -o config.yaml https://raw.githubusercontent.com/linlang781/CLIProxyAPIPlus/main/config.example.yaml
+
+# 拉取并启动
+docker compose pull && docker compose up -d
+```
+
+### 配置说明
+
+启动前请编辑 `config.yaml`：
+
+```yaml
+# 基本配置示例
+server:
+  port: 8317
+
+# 在此添加你的供应商配置
+```
+
+### 更新到最新版本
+
+```bash
+cd ~/cli-proxy
+docker compose pull && docker compose up -d
+```
+
 ## 贡献

 该项目仅接受第三方供应商支持的 Pull Request。任何非第三方供应商支持的 Pull Request 都将被拒绝。
--- a/cmd/server/main.go
+++ b/cmd/server/main.go
@@ -17,6 +17,7 @@ import (

 	"github.com/joho/godotenv"
 	configaccess "github.com/router-for-me/CLIProxyAPI/v6/internal/access/config_access"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/auth/kiro"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/buildinfo"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/cmd"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
@@ -533,6 +534,13 @@ func main() {
 		}
 		// Start the main proxy service
 		managementasset.StartAutoUpdater(context.Background(), configFilePath)
+
+		// 初始化并启动 Kiro token 后台刷新
+		if cfg.AuthDir != "" {
+			kiro.InitializeAndStart(cfg.AuthDir, cfg)
+			defer kiro.StopGlobalRefreshManager()
+		}
+
 		cmd.StartService(cfg, configFilePath, password)
 	}
 }
--- a/internal/api/server.go
+++ b/internal/api/server.go
@@ -23,6 +23,7 @@ import (
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/api/middleware"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/api/modules"
 	ampmodule "github.com/router-for-me/CLIProxyAPI/v6/internal/api/modules/amp"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/auth/kiro"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/logging"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/managementasset"
@@ -292,6 +293,11 @@ func NewServer(cfg *config.Config, authManager *auth.Manager, accessManager *sdk
 		s.registerManagementRoutes()
 	}

+	// === CLIProxyAPIPlus 扩展: 注册 Kiro OAuth Web 路由 ===
+	kiroOAuthHandler := kiro.NewOAuthWebHandler(cfg)
+	kiroOAuthHandler.RegisterRoutes(engine)
+	log.Info("Kiro OAuth Web routes registered at /v0/oauth/kiro/*")
+
 	if optionState.keepAliveEnabled {
 		s.enableKeepAlive(optionState.keepAliveTimeout, optionState.keepAliveOnTimeout)
 	}
--- a/internal/auth/kiro/aws.go
+++ b/internal/auth/kiro/aws.go
@@ -5,10 +5,12 @@ package kiro
 import (
 	"encoding/base64"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"os"
 	"path/filepath"
 	"strings"
+	"time"
 )

 // PKCECodes holds PKCE verification codes for OAuth2 PKCE flow
@@ -85,6 +87,87 @@ type KiroModel struct {
 // KiroIDETokenFile is the default path to Kiro IDE's token file
 const KiroIDETokenFile = ".aws/sso/cache/kiro-auth-token.json"

+// Default retry configuration for file reading
+const (
+	defaultTokenReadMaxAttempts = 10               // Maximum retry attempts
+	defaultTokenReadBaseDelay   = 50 * time.Millisecond // Base delay between retries
+)
+
+// isTransientFileError checks if the error is a transient file access error
+// that may be resolved by retrying (e.g., file locked by another process on Windows).
+func isTransientFileError(err error) bool {
+	if err == nil {
+		return false
+	}
+
+	// Check for OS-level file access errors (Windows sharing violation, etc.)
+	var pathErr *os.PathError
+	if errors.As(err, &pathErr) {
+		// Windows sharing violation (ERROR_SHARING_VIOLATION = 32)
+		// Windows lock violation (ERROR_LOCK_VIOLATION = 33)
+		errStr := pathErr.Err.Error()
+		if strings.Contains(errStr, "being used by another process") ||
+			strings.Contains(errStr, "sharing violation") ||
+			strings.Contains(errStr, "lock violation") {
+			return true
+		}
+	}
+
+	// Check error message for common transient patterns
+	errMsg := strings.ToLower(err.Error())
+	transientPatterns := []string{
+		"being used by another process",
+		"sharing violation",
+		"lock violation",
+		"access is denied",
+		"unexpected end of json",
+		"unexpected eof",
+	}
+	for _, pattern := range transientPatterns {
+		if strings.Contains(errMsg, pattern) {
+			return true
+		}
+	}
+
+	return false
+}
+
+// LoadKiroIDETokenWithRetry loads token data from Kiro IDE's token file with retry logic.
+// This handles transient file access errors (e.g., file locked by Kiro IDE during write).
+// maxAttempts: maximum number of retry attempts (default 10 if <= 0)
+// baseDelay: base delay between retries with exponential backoff (default 50ms if <= 0)
+func LoadKiroIDETokenWithRetry(maxAttempts int, baseDelay time.Duration) (*KiroTokenData, error) {
+	if maxAttempts <= 0 {
+		maxAttempts = defaultTokenReadMaxAttempts
+	}
+	if baseDelay <= 0 {
+		baseDelay = defaultTokenReadBaseDelay
+	}
+
+	var lastErr error
+	for attempt := 0; attempt < maxAttempts; attempt++ {
+		token, err := LoadKiroIDEToken()
+		if err == nil {
+			return token, nil
+		}
+		lastErr = err
+
+		// Only retry for transient errors
+		if !isTransientFileError(err) {
+			return nil, err
+		}
+
+		// Exponential backoff: delay * 2^attempt, capped at 500ms
+		delay := baseDelay * time.Duration(1<<uint(attempt))
+		if delay > 500*time.Millisecond {
+			delay = 500 * time.Millisecond
+		}
+		time.Sleep(delay)
+	}
+
+	return nil, fmt.Errorf("failed to read token file after %d attempts: %w", maxAttempts, lastErr)
+}
+
 // LoadKiroIDEToken loads token data from Kiro IDE's token file.
 func LoadKiroIDEToken() (*KiroTokenData, error) {
 	homeDir, err := os.UserHomeDir()
--- a/internal/auth/kiro/aws_auth.go
+++ b/internal/auth/kiro/aws_auth.go
@@ -280,6 +280,11 @@ func (k *KiroAuth) CreateTokenStorage(tokenData *KiroTokenData) *KiroTokenStorag
 		AuthMethod:   tokenData.AuthMethod,
 		Provider:     tokenData.Provider,
 		LastRefresh:  time.Now().Format(time.RFC3339),
+		ClientID:     tokenData.ClientID,
+		ClientSecret: tokenData.ClientSecret,
+		Region:       tokenData.Region,
+		StartURL:     tokenData.StartURL,
+		Email:        tokenData.Email,
 	}
 }

@@ -311,4 +316,19 @@ func (k *KiroAuth) UpdateTokenStorage(storage *KiroTokenStorage, tokenData *Kiro
 	storage.AuthMethod = tokenData.AuthMethod
 	storage.Provider = tokenData.Provider
 	storage.LastRefresh = time.Now().Format(time.RFC3339)
+	if tokenData.ClientID != "" {
+		storage.ClientID = tokenData.ClientID
+	}
+	if tokenData.ClientSecret != "" {
+		storage.ClientSecret = tokenData.ClientSecret
+	}
+	if tokenData.Region != "" {
+		storage.Region = tokenData.Region
+	}
+	if tokenData.StartURL != "" {
+		storage.StartURL = tokenData.StartURL
+	}
+	if tokenData.Email != "" {
+		storage.Email = tokenData.Email
+	}
 }
--- a/internal/auth/kiro/background_refresh.go
+++ b/internal/auth/kiro/background_refresh.go
@@ -0,0 +1,224 @@
+package kiro
+
+import (
+	"context"
+	"log"
+	"sync"
+	"time"
+
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
+	"golang.org/x/sync/semaphore"
+)
+
+type Token struct {
+	ID           string
+	AccessToken  string
+	RefreshToken string
+	ExpiresAt    time.Time
+	LastVerified time.Time
+	ClientID     string
+	ClientSecret string
+	AuthMethod   string
+	Provider     string
+	StartURL     string
+	Region       string
+}
+
+type TokenRepository interface {
+	FindOldestUnverified(limit int) []*Token
+	UpdateToken(token *Token) error
+}
+
+type RefresherOption func(*BackgroundRefresher)
+
+func WithInterval(interval time.Duration) RefresherOption {
+	return func(r *BackgroundRefresher) {
+		r.interval = interval
+	}
+}
+
+func WithBatchSize(size int) RefresherOption {
+	return func(r *BackgroundRefresher) {
+		r.batchSize = size
+	}
+}
+
+func WithConcurrency(concurrency int) RefresherOption {
+	return func(r *BackgroundRefresher) {
+		r.concurrency = concurrency
+	}
+}
+
+type BackgroundRefresher struct {
+	interval         time.Duration
+	batchSize        int
+	concurrency      int
+	tokenRepo        TokenRepository
+	stopCh           chan struct{}
+	wg               sync.WaitGroup
+	oauth            *KiroOAuth
+	ssoClient        *SSOOIDCClient
+	callbackMu       sync.RWMutex                                    // 保护回调函数的并发访问
+	onTokenRefreshed func(tokenID string, tokenData *KiroTokenData) // 刷新成功回调
+}
+
+func NewBackgroundRefresher(repo TokenRepository, opts ...RefresherOption) *BackgroundRefresher {
+	r := &BackgroundRefresher{
+		interval:    time.Minute,
+		batchSize:   50,
+		concurrency: 10,
+		tokenRepo:   repo,
+		stopCh:      make(chan struct{}),
+		oauth:       nil, // Lazy init - will be set when config available
+		ssoClient:   nil, // Lazy init - will be set when config available
+	}
+	for _, opt := range opts {
+		opt(r)
+	}
+	return r
+}
+
+// WithConfig sets the configuration for OAuth and SSO clients.
+func WithConfig(cfg *config.Config) RefresherOption {
+	return func(r *BackgroundRefresher) {
+		r.oauth = NewKiroOAuth(cfg)
+		r.ssoClient = NewSSOOIDCClient(cfg)
+	}
+}
+
+// WithOnTokenRefreshed sets the callback function to be called when a token is successfully refreshed.
+// The callback receives the token ID (filename) and the new token data.
+// This allows external components (e.g., Watcher) to be notified of token updates.
+func WithOnTokenRefreshed(callback func(tokenID string, tokenData *KiroTokenData)) RefresherOption {
+	return func(r *BackgroundRefresher) {
+		r.callbackMu.Lock()
+		r.onTokenRefreshed = callback
+		r.callbackMu.Unlock()
+	}
+}
+
+func (r *BackgroundRefresher) Start(ctx context.Context) {
+	r.wg.Add(1)
+	go func() {
+		defer r.wg.Done()
+		ticker := time.NewTicker(r.interval)
+		defer ticker.Stop()
+
+		r.refreshBatch(ctx)
+
+		for {
+			select {
+			case <-ctx.Done():
+				return
+			case <-r.stopCh:
+				return
+			case <-ticker.C:
+				r.refreshBatch(ctx)
+			}
+		}
+	}()
+}
+
+func (r *BackgroundRefresher) Stop() {
+	close(r.stopCh)
+	r.wg.Wait()
+}
+
+func (r *BackgroundRefresher) refreshBatch(ctx context.Context) {
+	tokens := r.tokenRepo.FindOldestUnverified(r.batchSize)
+	if len(tokens) == 0 {
+		return
+	}
+
+	sem := semaphore.NewWeighted(int64(r.concurrency))
+	var wg sync.WaitGroup
+
+	for i, token := range tokens {
+		if i > 0 {
+			select {
+			case <-ctx.Done():
+				return
+			case <-r.stopCh:
+				return
+			case <-time.After(100 * time.Millisecond):
+			}
+		}
+
+		if err := sem.Acquire(ctx, 1); err != nil {
+			return
+		}
+
+		wg.Add(1)
+		go func(t *Token) {
+			defer wg.Done()
+			defer sem.Release(1)
+			r.refreshSingle(ctx, t)
+		}(token)
+	}
+
+	wg.Wait()
+}
+
+func (r *BackgroundRefresher) refreshSingle(ctx context.Context, token *Token) {
+	var newTokenData *KiroTokenData
+	var err error
+
+	switch token.AuthMethod {
+	case "idc":
+		newTokenData, err = r.ssoClient.RefreshTokenWithRegion(
+			ctx,
+			token.ClientID,
+			token.ClientSecret,
+			token.RefreshToken,
+			token.Region,
+			token.StartURL,
+		)
+	case "builder-id":
+		newTokenData, err = r.ssoClient.RefreshToken(
+			ctx,
+			token.ClientID,
+			token.ClientSecret,
+			token.RefreshToken,
+		)
+	default:
+		newTokenData, err = r.oauth.RefreshToken(ctx, token.RefreshToken)
+	}
+
+	if err != nil {
+		log.Printf("failed to refresh token %s: %v", token.ID, err)
+		return
+	}
+
+	token.AccessToken = newTokenData.AccessToken
+	token.RefreshToken = newTokenData.RefreshToken
+	token.LastVerified = time.Now()
+
+	if newTokenData.ExpiresAt != "" {
+		if expTime, parseErr := time.Parse(time.RFC3339, newTokenData.ExpiresAt); parseErr == nil {
+			token.ExpiresAt = expTime
+		}
+	}
+
+	if err := r.tokenRepo.UpdateToken(token); err != nil {
+		log.Printf("failed to update token %s: %v", token.ID, err)
+		return
+	}
+
+	// 方案 A: 刷新成功后触发回调，通知 Watcher 更新内存中的 Auth 对象
+	r.callbackMu.RLock()
+	callback := r.onTokenRefreshed
+	r.callbackMu.RUnlock()
+
+	if callback != nil {
+		// 使用 defer recover 隔离回调 panic，防止崩溃整个进程
+		func() {
+			defer func() {
+				if rec := recover(); rec != nil {
+					log.Printf("background refresh: callback panic for token %s: %v", token.ID, rec)
+				}
+			}()
+			log.Printf("background refresh: notifying token refresh callback for %s", token.ID)
+			callback(token.ID, newTokenData)
+		}()
+	}
+}
--- a/internal/auth/kiro/cooldown.go
+++ b/internal/auth/kiro/cooldown.go
@@ -0,0 +1,112 @@
+package kiro
+
+import (
+	"sync"
+	"time"
+)
+
+const (
+	CooldownReason429         = "rate_limit_exceeded"
+	CooldownReasonSuspended   = "account_suspended"
+	CooldownReasonQuotaExhausted = "quota_exhausted"
+
+	DefaultShortCooldown = 1 * time.Minute
+	MaxShortCooldown     = 5 * time.Minute
+	LongCooldown         = 24 * time.Hour
+)
+
+type CooldownManager struct {
+	mu        sync.RWMutex
+	cooldowns map[string]time.Time
+	reasons   map[string]string
+}
+
+func NewCooldownManager() *CooldownManager {
+	return &CooldownManager{
+		cooldowns: make(map[string]time.Time),
+		reasons:   make(map[string]string),
+	}
+}
+
+func (cm *CooldownManager) SetCooldown(tokenKey string, duration time.Duration, reason string) {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+	cm.cooldowns[tokenKey] = time.Now().Add(duration)
+	cm.reasons[tokenKey] = reason
+}
+
+func (cm *CooldownManager) IsInCooldown(tokenKey string) bool {
+	cm.mu.RLock()
+	defer cm.mu.RUnlock()
+	endTime, exists := cm.cooldowns[tokenKey]
+	if !exists {
+		return false
+	}
+	return time.Now().Before(endTime)
+}
+
+func (cm *CooldownManager) GetRemainingCooldown(tokenKey string) time.Duration {
+	cm.mu.RLock()
+	defer cm.mu.RUnlock()
+	endTime, exists := cm.cooldowns[tokenKey]
+	if !exists {
+		return 0
+	}
+	remaining := time.Until(endTime)
+	if remaining < 0 {
+		return 0
+	}
+	return remaining
+}
+
+func (cm *CooldownManager) GetCooldownReason(tokenKey string) string {
+	cm.mu.RLock()
+	defer cm.mu.RUnlock()
+	return cm.reasons[tokenKey]
+}
+
+func (cm *CooldownManager) ClearCooldown(tokenKey string) {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+	delete(cm.cooldowns, tokenKey)
+	delete(cm.reasons, tokenKey)
+}
+
+func (cm *CooldownManager) CleanupExpired() {
+	cm.mu.Lock()
+	defer cm.mu.Unlock()
+	now := time.Now()
+	for tokenKey, endTime := range cm.cooldowns {
+		if now.After(endTime) {
+			delete(cm.cooldowns, tokenKey)
+			delete(cm.reasons, tokenKey)
+		}
+	}
+}
+
+func (cm *CooldownManager) StartCleanupRoutine(interval time.Duration, stopCh <-chan struct{}) {
+	ticker := time.NewTicker(interval)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ticker.C:
+			cm.CleanupExpired()
+		case <-stopCh:
+			return
+		}
+	}
+}
+
+func CalculateCooldownFor429(retryCount int) time.Duration {
+	duration := DefaultShortCooldown * time.Duration(1<<retryCount)
+	if duration > MaxShortCooldown {
+		return MaxShortCooldown
+	}
+	return duration
+}
+
+func CalculateCooldownUntilNextDay() time.Duration {
+	now := time.Now()
+	nextDay := time.Date(now.Year(), now.Month(), now.Day()+1, 0, 0, 0, 0, now.Location())
+	return time.Until(nextDay)
+}
--- a/internal/auth/kiro/cooldown_test.go
+++ b/internal/auth/kiro/cooldown_test.go
@@ -0,0 +1,240 @@
+package kiro
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNewCooldownManager(t *testing.T) {
+	cm := NewCooldownManager()
+	if cm == nil {
+		t.Fatal("expected non-nil CooldownManager")
+	}
+	if cm.cooldowns == nil {
+		t.Error("expected non-nil cooldowns map")
+	}
+	if cm.reasons == nil {
+		t.Error("expected non-nil reasons map")
+	}
+}
+
+func TestSetCooldown(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Minute, CooldownReason429)
+
+	if !cm.IsInCooldown("token1") {
+		t.Error("expected token to be in cooldown")
+	}
+	if cm.GetCooldownReason("token1") != CooldownReason429 {
+		t.Errorf("expected reason %s, got %s", CooldownReason429, cm.GetCooldownReason("token1"))
+	}
+}
+
+func TestIsInCooldown_NotSet(t *testing.T) {
+	cm := NewCooldownManager()
+	if cm.IsInCooldown("nonexistent") {
+		t.Error("expected non-existent token to not be in cooldown")
+	}
+}
+
+func TestIsInCooldown_Expired(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Millisecond, CooldownReason429)
+
+	time.Sleep(10 * time.Millisecond)
+
+	if cm.IsInCooldown("token1") {
+		t.Error("expected expired cooldown to return false")
+	}
+}
+
+func TestGetRemainingCooldown(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Second, CooldownReason429)
+
+	remaining := cm.GetRemainingCooldown("token1")
+	if remaining <= 0 || remaining > 1*time.Second {
+		t.Errorf("expected remaining cooldown between 0 and 1s, got %v", remaining)
+	}
+}
+
+func TestGetRemainingCooldown_NotSet(t *testing.T) {
+	cm := NewCooldownManager()
+	remaining := cm.GetRemainingCooldown("nonexistent")
+	if remaining != 0 {
+		t.Errorf("expected 0 remaining for non-existent, got %v", remaining)
+	}
+}
+
+func TestGetRemainingCooldown_Expired(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Millisecond, CooldownReason429)
+
+	time.Sleep(10 * time.Millisecond)
+
+	remaining := cm.GetRemainingCooldown("token1")
+	if remaining != 0 {
+		t.Errorf("expected 0 remaining for expired, got %v", remaining)
+	}
+}
+
+func TestGetCooldownReason(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Minute, CooldownReasonSuspended)
+
+	reason := cm.GetCooldownReason("token1")
+	if reason != CooldownReasonSuspended {
+		t.Errorf("expected reason %s, got %s", CooldownReasonSuspended, reason)
+	}
+}
+
+func TestGetCooldownReason_NotSet(t *testing.T) {
+	cm := NewCooldownManager()
+	reason := cm.GetCooldownReason("nonexistent")
+	if reason != "" {
+		t.Errorf("expected empty reason for non-existent, got %s", reason)
+	}
+}
+
+func TestClearCooldown(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Minute, CooldownReason429)
+	cm.ClearCooldown("token1")
+
+	if cm.IsInCooldown("token1") {
+		t.Error("expected cooldown to be cleared")
+	}
+	if cm.GetCooldownReason("token1") != "" {
+		t.Error("expected reason to be cleared")
+	}
+}
+
+func TestClearCooldown_NonExistent(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.ClearCooldown("nonexistent")
+}
+
+func TestCleanupExpired(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("expired1", 1*time.Millisecond, CooldownReason429)
+	cm.SetCooldown("expired2", 1*time.Millisecond, CooldownReason429)
+	cm.SetCooldown("active", 1*time.Hour, CooldownReason429)
+
+	time.Sleep(10 * time.Millisecond)
+	cm.CleanupExpired()
+
+	if cm.GetCooldownReason("expired1") != "" {
+		t.Error("expected expired1 to be cleaned up")
+	}
+	if cm.GetCooldownReason("expired2") != "" {
+		t.Error("expected expired2 to be cleaned up")
+	}
+	if cm.GetCooldownReason("active") != CooldownReason429 {
+		t.Error("expected active to remain")
+	}
+}
+
+func TestCalculateCooldownFor429_FirstRetry(t *testing.T) {
+	duration := CalculateCooldownFor429(0)
+	if duration != DefaultShortCooldown {
+		t.Errorf("expected %v for retry 0, got %v", DefaultShortCooldown, duration)
+	}
+}
+
+func TestCalculateCooldownFor429_Exponential(t *testing.T) {
+	d1 := CalculateCooldownFor429(1)
+	d2 := CalculateCooldownFor429(2)
+
+	if d2 <= d1 {
+		t.Errorf("expected d2 > d1, got d1=%v, d2=%v", d1, d2)
+	}
+}
+
+func TestCalculateCooldownFor429_MaxCap(t *testing.T) {
+	duration := CalculateCooldownFor429(10)
+	if duration > MaxShortCooldown {
+		t.Errorf("expected max %v, got %v", MaxShortCooldown, duration)
+	}
+}
+
+func TestCalculateCooldownUntilNextDay(t *testing.T) {
+	duration := CalculateCooldownUntilNextDay()
+	if duration <= 0 || duration > 24*time.Hour {
+		t.Errorf("expected duration between 0 and 24h, got %v", duration)
+	}
+}
+
+func TestCooldownManager_ConcurrentAccess(t *testing.T) {
+	cm := NewCooldownManager()
+	const numGoroutines = 50
+	const numOperations = 100
+
+	var wg sync.WaitGroup
+	wg.Add(numGoroutines)
+
+	for i := 0; i < numGoroutines; i++ {
+		go func(id int) {
+			defer wg.Done()
+			tokenKey := "token" + string(rune('a'+id%10))
+			for j := 0; j < numOperations; j++ {
+				switch j % 6 {
+				case 0:
+					cm.SetCooldown(tokenKey, time.Duration(j)*time.Millisecond, CooldownReason429)
+				case 1:
+					cm.IsInCooldown(tokenKey)
+				case 2:
+					cm.GetRemainingCooldown(tokenKey)
+				case 3:
+					cm.GetCooldownReason(tokenKey)
+				case 4:
+					cm.ClearCooldown(tokenKey)
+				case 5:
+					cm.CleanupExpired()
+				}
+			}
+		}(i)
+	}
+
+	wg.Wait()
+}
+
+func TestCooldownReasonConstants(t *testing.T) {
+	if CooldownReason429 != "rate_limit_exceeded" {
+		t.Errorf("unexpected CooldownReason429: %s", CooldownReason429)
+	}
+	if CooldownReasonSuspended != "account_suspended" {
+		t.Errorf("unexpected CooldownReasonSuspended: %s", CooldownReasonSuspended)
+	}
+	if CooldownReasonQuotaExhausted != "quota_exhausted" {
+		t.Errorf("unexpected CooldownReasonQuotaExhausted: %s", CooldownReasonQuotaExhausted)
+	}
+}
+
+func TestDefaultConstants(t *testing.T) {
+	if DefaultShortCooldown != 1*time.Minute {
+		t.Errorf("unexpected DefaultShortCooldown: %v", DefaultShortCooldown)
+	}
+	if MaxShortCooldown != 5*time.Minute {
+		t.Errorf("unexpected MaxShortCooldown: %v", MaxShortCooldown)
+	}
+	if LongCooldown != 24*time.Hour {
+		t.Errorf("unexpected LongCooldown: %v", LongCooldown)
+	}
+}
+
+func TestSetCooldown_OverwritesPrevious(t *testing.T) {
+	cm := NewCooldownManager()
+	cm.SetCooldown("token1", 1*time.Hour, CooldownReason429)
+	cm.SetCooldown("token1", 1*time.Minute, CooldownReasonSuspended)
+
+	reason := cm.GetCooldownReason("token1")
+	if reason != CooldownReasonSuspended {
+		t.Errorf("expected reason to be overwritten to %s, got %s", CooldownReasonSuspended, reason)
+	}
+
+	remaining := cm.GetRemainingCooldown("token1")
+	if remaining > 1*time.Minute {
+		t.Errorf("expected remaining <= 1 minute, got %v", remaining)
+	}
+}
--- a/internal/auth/kiro/fingerprint.go
+++ b/internal/auth/kiro/fingerprint.go
@@ -0,0 +1,197 @@
+package kiro
+
+import (
+	"crypto/sha256"
+	"encoding/hex"
+	"fmt"
+	"math/rand"
+	"net/http"
+	"sync"
+	"time"
+)
+
+// Fingerprint 多维度指纹信息
+type Fingerprint struct {
+	SDKVersion          string // 1.0.20-1.0.27
+	OSType              string // darwin/windows/linux
+	OSVersion           string // 10.0.22621
+	NodeVersion         string // 18.x/20.x/22.x
+	KiroVersion         string // 0.3.x-0.8.x
+	KiroHash            string // SHA256
+	AcceptLanguage      string
+	ScreenResolution    string // 1920x1080
+	ColorDepth          int    // 24
+	HardwareConcurrency int    // CPU 核心数
+	TimezoneOffset      int
+}
+
+// FingerprintManager 指纹管理器
+type FingerprintManager struct {
+	mu           sync.RWMutex
+	fingerprints map[string]*Fingerprint // tokenKey -> fingerprint
+	rng          *rand.Rand
+}
+
+var (
+	sdkVersions = []string{
+		"1.0.20", "1.0.21", "1.0.22", "1.0.23",
+		"1.0.24", "1.0.25", "1.0.26", "1.0.27",
+	}
+	osTypes = []string{"darwin", "windows", "linux"}
+	osVersions = map[string][]string{
+		"darwin":  {"14.0", "14.1", "14.2", "14.3", "14.4", "14.5", "15.0", "15.1"},
+		"windows": {"10.0.19041", "10.0.19042", "10.0.19043", "10.0.19044", "10.0.22621", "10.0.22631"},
+		"linux":   {"5.15.0", "6.1.0", "6.2.0", "6.5.0", "6.6.0", "6.8.0"},
+	}
+	nodeVersions = []string{
+		"18.17.0", "18.18.0", "18.19.0", "18.20.0",
+		"20.9.0", "20.10.0", "20.11.0", "20.12.0", "20.13.0",
+		"22.0.0", "22.1.0", "22.2.0", "22.3.0",
+	}
+	kiroVersions = []string{
+		"0.3.0", "0.3.1", "0.4.0", "0.4.1", "0.5.0", "0.5.1",
+		"0.6.0", "0.6.1", "0.7.0", "0.7.1", "0.8.0", "0.8.1",
+	}
+	acceptLanguages = []string{
+		"en-US,en;q=0.9",
+		"en-GB,en;q=0.9",
+		"zh-CN,zh;q=0.9,en;q=0.8",
+		"zh-TW,zh;q=0.9,en;q=0.8",
+		"ja-JP,ja;q=0.9,en;q=0.8",
+		"ko-KR,ko;q=0.9,en;q=0.8",
+		"de-DE,de;q=0.9,en;q=0.8",
+		"fr-FR,fr;q=0.9,en;q=0.8",
+	}
+	screenResolutions = []string{
+		"1920x1080", "2560x1440", "3840x2160",
+		"1366x768", "1440x900", "1680x1050",
+		"2560x1600", "3440x1440",
+	}
+	colorDepths          = []int{24, 32}
+	hardwareConcurrencies = []int{4, 6, 8, 10, 12, 16, 20, 24, 32}
+	timezoneOffsets      = []int{-480, -420, -360, -300, -240, 0, 60, 120, 480, 540}
+)
+
+// NewFingerprintManager 创建指纹管理器
+func NewFingerprintManager() *FingerprintManager {
+	return &FingerprintManager{
+		fingerprints: make(map[string]*Fingerprint),
+		rng:          rand.New(rand.NewSource(time.Now().UnixNano())),
+	}
+}
+
+// GetFingerprint 获取或生成 Token 关联的指纹
+func (fm *FingerprintManager) GetFingerprint(tokenKey string) *Fingerprint {
+	fm.mu.RLock()
+	if fp, exists := fm.fingerprints[tokenKey]; exists {
+		fm.mu.RUnlock()
+		return fp
+	}
+	fm.mu.RUnlock()
+
+	fm.mu.Lock()
+	defer fm.mu.Unlock()
+
+	if fp, exists := fm.fingerprints[tokenKey]; exists {
+		return fp
+	}
+
+	fp := fm.generateFingerprint(tokenKey)
+	fm.fingerprints[tokenKey] = fp
+	return fp
+}
+
+// generateFingerprint 生成新的指纹
+func (fm *FingerprintManager) generateFingerprint(tokenKey string) *Fingerprint {
+	osType := fm.randomChoice(osTypes)
+	osVersion := fm.randomChoice(osVersions[osType])
+	kiroVersion := fm.randomChoice(kiroVersions)
+
+	fp := &Fingerprint{
+		SDKVersion:          fm.randomChoice(sdkVersions),
+		OSType:              osType,
+		OSVersion:           osVersion,
+		NodeVersion:         fm.randomChoice(nodeVersions),
+		KiroVersion:         kiroVersion,
+		AcceptLanguage:      fm.randomChoice(acceptLanguages),
+		ScreenResolution:    fm.randomChoice(screenResolutions),
+		ColorDepth:          fm.randomIntChoice(colorDepths),
+		HardwareConcurrency: fm.randomIntChoice(hardwareConcurrencies),
+		TimezoneOffset:      fm.randomIntChoice(timezoneOffsets),
+	}
+
+	fp.KiroHash = fm.generateKiroHash(tokenKey, kiroVersion, osType)
+	return fp
+}
+
+// generateKiroHash 生成 Kiro Hash
+func (fm *FingerprintManager) generateKiroHash(tokenKey, kiroVersion, osType string) string {
+	data := fmt.Sprintf("%s:%s:%s:%d", tokenKey, kiroVersion, osType, time.Now().UnixNano())
+	hash := sha256.Sum256([]byte(data))
+	return hex.EncodeToString(hash[:])
+}
+
+// randomChoice 随机选择字符串
+func (fm *FingerprintManager) randomChoice(choices []string) string {
+	return choices[fm.rng.Intn(len(choices))]
+}
+
+// randomIntChoice 随机选择整数
+func (fm *FingerprintManager) randomIntChoice(choices []int) int {
+	return choices[fm.rng.Intn(len(choices))]
+}
+
+// ApplyToRequest 将指纹信息应用到 HTTP 请求头
+func (fp *Fingerprint) ApplyToRequest(req *http.Request) {
+	req.Header.Set("X-Kiro-SDK-Version", fp.SDKVersion)
+	req.Header.Set("X-Kiro-OS-Type", fp.OSType)
+	req.Header.Set("X-Kiro-OS-Version", fp.OSVersion)
+	req.Header.Set("X-Kiro-Node-Version", fp.NodeVersion)
+	req.Header.Set("X-Kiro-Version", fp.KiroVersion)
+	req.Header.Set("X-Kiro-Hash", fp.KiroHash)
+	req.Header.Set("Accept-Language", fp.AcceptLanguage)
+	req.Header.Set("X-Screen-Resolution", fp.ScreenResolution)
+	req.Header.Set("X-Color-Depth", fmt.Sprintf("%d", fp.ColorDepth))
+	req.Header.Set("X-Hardware-Concurrency", fmt.Sprintf("%d", fp.HardwareConcurrency))
+	req.Header.Set("X-Timezone-Offset", fmt.Sprintf("%d", fp.TimezoneOffset))
+}
+
+// RemoveFingerprint 移除 Token 关联的指纹
+func (fm *FingerprintManager) RemoveFingerprint(tokenKey string) {
+	fm.mu.Lock()
+	defer fm.mu.Unlock()
+	delete(fm.fingerprints, tokenKey)
+}
+
+// Count 返回当前管理的指纹数量
+func (fm *FingerprintManager) Count() int {
+	fm.mu.RLock()
+	defer fm.mu.RUnlock()
+	return len(fm.fingerprints)
+}
+
+// BuildUserAgent 构建 User-Agent 字符串 (Kiro IDE 风格)
+// 格式: aws-sdk-js/{SDKVersion} ua/2.1 os/{OSType}#{OSVersion} lang/js md/nodejs#{NodeVersion} api/codewhispererstreaming#{SDKVersion} m/E KiroIDE-{KiroVersion}-{KiroHash}
+func (fp *Fingerprint) BuildUserAgent() string {
+	return fmt.Sprintf(
+		"aws-sdk-js/%s ua/2.1 os/%s#%s lang/js md/nodejs#%s api/codewhispererstreaming#%s m/E KiroIDE-%s-%s",
+		fp.SDKVersion,
+		fp.OSType,
+		fp.OSVersion,
+		fp.NodeVersion,
+		fp.SDKVersion,
+		fp.KiroVersion,
+		fp.KiroHash,
+	)
+}
+
+// BuildAmzUserAgent 构建 X-Amz-User-Agent 字符串
+// 格式: aws-sdk-js/{SDKVersion} KiroIDE-{KiroVersion}-{KiroHash}
+func (fp *Fingerprint) BuildAmzUserAgent() string {
+	return fmt.Sprintf(
+		"aws-sdk-js/%s KiroIDE-%s-%s",
+		fp.SDKVersion,
+		fp.KiroVersion,
+		fp.KiroHash,
+	)
+}
--- a/internal/auth/kiro/fingerprint_test.go
+++ b/internal/auth/kiro/fingerprint_test.go
@@ -0,0 +1,227 @@
+package kiro
+
+import (
+	"net/http"
+	"sync"
+	"testing"
+)
+
+func TestNewFingerprintManager(t *testing.T) {
+	fm := NewFingerprintManager()
+	if fm == nil {
+		t.Fatal("expected non-nil FingerprintManager")
+	}
+	if fm.fingerprints == nil {
+		t.Error("expected non-nil fingerprints map")
+	}
+	if fm.rng == nil {
+		t.Error("expected non-nil rng")
+	}
+}
+
+func TestGetFingerprint_NewToken(t *testing.T) {
+	fm := NewFingerprintManager()
+	fp := fm.GetFingerprint("token1")
+
+	if fp == nil {
+		t.Fatal("expected non-nil Fingerprint")
+	}
+	if fp.SDKVersion == "" {
+		t.Error("expected non-empty SDKVersion")
+	}
+	if fp.OSType == "" {
+		t.Error("expected non-empty OSType")
+	}
+	if fp.OSVersion == "" {
+		t.Error("expected non-empty OSVersion")
+	}
+	if fp.NodeVersion == "" {
+		t.Error("expected non-empty NodeVersion")
+	}
+	if fp.KiroVersion == "" {
+		t.Error("expected non-empty KiroVersion")
+	}
+	if fp.KiroHash == "" {
+		t.Error("expected non-empty KiroHash")
+	}
+	if fp.AcceptLanguage == "" {
+		t.Error("expected non-empty AcceptLanguage")
+	}
+	if fp.ScreenResolution == "" {
+		t.Error("expected non-empty ScreenResolution")
+	}
+	if fp.ColorDepth == 0 {
+		t.Error("expected non-zero ColorDepth")
+	}
+	if fp.HardwareConcurrency == 0 {
+		t.Error("expected non-zero HardwareConcurrency")
+	}
+}
+
+func TestGetFingerprint_SameTokenReturnsSameFingerprint(t *testing.T) {
+	fm := NewFingerprintManager()
+	fp1 := fm.GetFingerprint("token1")
+	fp2 := fm.GetFingerprint("token1")
+
+	if fp1 != fp2 {
+		t.Error("expected same fingerprint for same token")
+	}
+}
+
+func TestGetFingerprint_DifferentTokens(t *testing.T) {
+	fm := NewFingerprintManager()
+	fp1 := fm.GetFingerprint("token1")
+	fp2 := fm.GetFingerprint("token2")
+
+	if fp1 == fp2 {
+		t.Error("expected different fingerprints for different tokens")
+	}
+}
+
+func TestRemoveFingerprint(t *testing.T) {
+	fm := NewFingerprintManager()
+	fm.GetFingerprint("token1")
+	if fm.Count() != 1 {
+		t.Fatalf("expected count 1, got %d", fm.Count())
+	}
+
+	fm.RemoveFingerprint("token1")
+	if fm.Count() != 0 {
+		t.Errorf("expected count 0, got %d", fm.Count())
+	}
+}
+
+func TestRemoveFingerprint_NonExistent(t *testing.T) {
+	fm := NewFingerprintManager()
+	fm.RemoveFingerprint("nonexistent")
+	if fm.Count() != 0 {
+		t.Errorf("expected count 0, got %d", fm.Count())
+	}
+}
+
+func TestCount(t *testing.T) {
+	fm := NewFingerprintManager()
+	if fm.Count() != 0 {
+		t.Errorf("expected count 0, got %d", fm.Count())
+	}
+
+	fm.GetFingerprint("token1")
+	fm.GetFingerprint("token2")
+	fm.GetFingerprint("token3")
+
+	if fm.Count() != 3 {
+		t.Errorf("expected count 3, got %d", fm.Count())
+	}
+}
+
+func TestApplyToRequest(t *testing.T) {
+	fm := NewFingerprintManager()
+	fp := fm.GetFingerprint("token1")
+
+	req, _ := http.NewRequest("GET", "http://example.com", nil)
+	fp.ApplyToRequest(req)
+
+	if req.Header.Get("X-Kiro-SDK-Version") != fp.SDKVersion {
+		t.Error("X-Kiro-SDK-Version header mismatch")
+	}
+	if req.Header.Get("X-Kiro-OS-Type") != fp.OSType {
+		t.Error("X-Kiro-OS-Type header mismatch")
+	}
+	if req.Header.Get("X-Kiro-OS-Version") != fp.OSVersion {
+		t.Error("X-Kiro-OS-Version header mismatch")
+	}
+	if req.Header.Get("X-Kiro-Node-Version") != fp.NodeVersion {
+		t.Error("X-Kiro-Node-Version header mismatch")
+	}
+	if req.Header.Get("X-Kiro-Version") != fp.KiroVersion {
+		t.Error("X-Kiro-Version header mismatch")
+	}
+	if req.Header.Get("X-Kiro-Hash") != fp.KiroHash {
+		t.Error("X-Kiro-Hash header mismatch")
+	}
+	if req.Header.Get("Accept-Language") != fp.AcceptLanguage {
+		t.Error("Accept-Language header mismatch")
+	}
+	if req.Header.Get("X-Screen-Resolution") != fp.ScreenResolution {
+		t.Error("X-Screen-Resolution header mismatch")
+	}
+}
+
+func TestGetFingerprint_OSVersionMatchesOSType(t *testing.T) {
+	fm := NewFingerprintManager()
+
+	for i := 0; i < 20; i++ {
+		fp := fm.GetFingerprint("token" + string(rune('a'+i)))
+		validVersions := osVersions[fp.OSType]
+		found := false
+		for _, v := range validVersions {
+			if v == fp.OSVersion {
+				found = true
+				break
+			}
+		}
+		if !found {
+			t.Errorf("OS version %s not valid for OS type %s", fp.OSVersion, fp.OSType)
+		}
+	}
+}
+
+func TestFingerprintManager_ConcurrentAccess(t *testing.T) {
+	fm := NewFingerprintManager()
+	const numGoroutines = 100
+	const numOperations = 100
+
+	var wg sync.WaitGroup
+	wg.Add(numGoroutines)
+
+	for i := 0; i < numGoroutines; i++ {
+		go func(id int) {
+			defer wg.Done()
+			for j := 0; j < numOperations; j++ {
+				tokenKey := "token" + string(rune('a'+id%26))
+				switch j % 4 {
+				case 0:
+					fm.GetFingerprint(tokenKey)
+				case 1:
+					fm.Count()
+				case 2:
+					fp := fm.GetFingerprint(tokenKey)
+					req, _ := http.NewRequest("GET", "http://example.com", nil)
+					fp.ApplyToRequest(req)
+				case 3:
+					fm.RemoveFingerprint(tokenKey)
+				}
+			}
+		}(i)
+	}
+
+	wg.Wait()
+}
+
+func TestKiroHashUniqueness(t *testing.T) {
+	fm := NewFingerprintManager()
+	hashes := make(map[string]bool)
+
+	for i := 0; i < 100; i++ {
+		fp := fm.GetFingerprint("token" + string(rune(i)))
+		if hashes[fp.KiroHash] {
+			t.Errorf("duplicate KiroHash detected: %s", fp.KiroHash)
+		}
+		hashes[fp.KiroHash] = true
+	}
+}
+
+func TestKiroHashFormat(t *testing.T) {
+	fm := NewFingerprintManager()
+	fp := fm.GetFingerprint("token1")
+
+	if len(fp.KiroHash) != 64 {
+		t.Errorf("expected KiroHash length 64 (SHA256 hex), got %d", len(fp.KiroHash))
+	}
+
+	for _, c := range fp.KiroHash {
+		if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f')) {
+			t.Errorf("invalid hex character in KiroHash: %c", c)
+		}
+	}
+}
--- a/internal/auth/kiro/jitter.go
+++ b/internal/auth/kiro/jitter.go
@@ -0,0 +1,174 @@
+package kiro
+
+import (
+	"math/rand"
+	"sync"
+	"time"
+)
+
+// Jitter configuration constants
+const (
+	// JitterPercent is the default percentage of jitter to apply (±30%)
+	JitterPercent = 0.30
+
+	// Human-like delay ranges
+	ShortDelayMin  = 50 * time.Millisecond  // Minimum for rapid consecutive operations
+	ShortDelayMax  = 200 * time.Millisecond // Maximum for rapid consecutive operations
+	NormalDelayMin = 1 * time.Second        // Minimum for normal thinking time
+	NormalDelayMax = 3 * time.Second        // Maximum for normal thinking time
+	LongDelayMin   = 5 * time.Second        // Minimum for reading/resting
+	LongDelayMax   = 10 * time.Second       // Maximum for reading/resting
+
+	// Probability thresholds for human-like behavior
+	ShortDelayProbability  = 0.20 // 20% chance of short delay (consecutive ops)
+	LongDelayProbability   = 0.05 // 5% chance of long delay (reading/resting)
+	NormalDelayProbability = 0.75 // 75% chance of normal delay (thinking)
+)
+
+var (
+	jitterRand     *rand.Rand
+	jitterRandOnce sync.Once
+	jitterMu       sync.Mutex
+	lastRequestTime time.Time
+)
+
+// initJitterRand initializes the random number generator for jitter calculations.
+// Uses a time-based seed for unpredictable but reproducible randomness.
+func initJitterRand() {
+	jitterRandOnce.Do(func() {
+		jitterRand = rand.New(rand.NewSource(time.Now().UnixNano()))
+	})
+}
+
+// RandomDelay generates a random delay between min and max duration.
+// Thread-safe implementation using mutex protection.
+func RandomDelay(min, max time.Duration) time.Duration {
+	initJitterRand()
+	jitterMu.Lock()
+	defer jitterMu.Unlock()
+
+	if min >= max {
+		return min
+	}
+
+	rangeMs := max.Milliseconds() - min.Milliseconds()
+	randomMs := jitterRand.Int63n(rangeMs)
+	return min + time.Duration(randomMs)*time.Millisecond
+}
+
+// JitterDelay adds jitter to a base delay.
+// Applies ±jitterPercent variation to the base delay.
+// For example, JitterDelay(1*time.Second, 0.30) returns a value between 700ms and 1300ms.
+func JitterDelay(baseDelay time.Duration, jitterPercent float64) time.Duration {
+	initJitterRand()
+	jitterMu.Lock()
+	defer jitterMu.Unlock()
+
+	if jitterPercent <= 0 || jitterPercent > 1 {
+		jitterPercent = JitterPercent
+	}
+
+	// Calculate jitter range: base * jitterPercent
+	jitterRange := float64(baseDelay) * jitterPercent
+
+	// Generate random value in range [-jitterRange, +jitterRange]
+	jitter := (jitterRand.Float64()*2 - 1) * jitterRange
+
+	result := time.Duration(float64(baseDelay) + jitter)
+	if result < 0 {
+		return 0
+	}
+	return result
+}
+
+// JitterDelayDefault applies the default ±30% jitter to a base delay.
+func JitterDelayDefault(baseDelay time.Duration) time.Duration {
+	return JitterDelay(baseDelay, JitterPercent)
+}
+
+// HumanLikeDelay generates a delay that mimics human behavior patterns.
+// The delay is selected based on probability distribution:
+//   - 20% chance: Short delay (50-200ms) - simulates consecutive rapid operations
+//   - 75% chance: Normal delay (1-3s) - simulates thinking/reading time
+//   - 5% chance: Long delay (5-10s) - simulates breaks/reading longer content
+//
+// Returns the delay duration (caller should call time.Sleep with this value).
+func HumanLikeDelay() time.Duration {
+	initJitterRand()
+	jitterMu.Lock()
+	defer jitterMu.Unlock()
+
+	// Track time since last request for adaptive behavior
+	now := time.Now()
+	timeSinceLastRequest := now.Sub(lastRequestTime)
+	lastRequestTime = now
+
+	// If requests are very close together, use short delay
+	if timeSinceLastRequest < 500*time.Millisecond && timeSinceLastRequest > 0 {
+		rangeMs := ShortDelayMax.Milliseconds() - ShortDelayMin.Milliseconds()
+		randomMs := jitterRand.Int63n(rangeMs)
+		return ShortDelayMin + time.Duration(randomMs)*time.Millisecond
+	}
+
+	// Otherwise, use probability-based selection
+	roll := jitterRand.Float64()
+
+	var min, max time.Duration
+	switch {
+	case roll < ShortDelayProbability:
+		// Short delay - consecutive operations
+		min, max = ShortDelayMin, ShortDelayMax
+	case roll < ShortDelayProbability+LongDelayProbability:
+		// Long delay - reading/resting
+		min, max = LongDelayMin, LongDelayMax
+	default:
+		// Normal delay - thinking time
+		min, max = NormalDelayMin, NormalDelayMax
+	}
+
+	rangeMs := max.Milliseconds() - min.Milliseconds()
+	randomMs := jitterRand.Int63n(rangeMs)
+	return min + time.Duration(randomMs)*time.Millisecond
+}
+
+// ApplyHumanLikeDelay applies human-like delay by sleeping.
+// This is a convenience function that combines HumanLikeDelay with time.Sleep.
+func ApplyHumanLikeDelay() {
+	delay := HumanLikeDelay()
+	if delay > 0 {
+		time.Sleep(delay)
+	}
+}
+
+// ExponentialBackoffWithJitter calculates retry delay using exponential backoff with jitter.
+// Formula: min(baseDelay * 2^attempt + jitter, maxDelay)
+// This helps prevent thundering herd problem when multiple clients retry simultaneously.
+func ExponentialBackoffWithJitter(attempt int, baseDelay, maxDelay time.Duration) time.Duration {
+	if attempt < 0 {
+		attempt = 0
+	}
+
+	// Calculate exponential backoff: baseDelay * 2^attempt
+	backoff := baseDelay * time.Duration(1<<uint(attempt))
+	if backoff > maxDelay {
+		backoff = maxDelay
+	}
+
+	// Add ±30% jitter
+	return JitterDelay(backoff, JitterPercent)
+}
+
+// ShouldSkipDelay determines if delay should be skipped based on context.
+// Returns true for streaming responses, WebSocket connections, etc.
+// This function can be extended to check additional skip conditions.
+func ShouldSkipDelay(isStreaming bool) bool {
+	return isStreaming
+}
+
+// ResetLastRequestTime resets the last request time tracker.
+// Useful for testing or when starting a new session.
+func ResetLastRequestTime() {
+	jitterMu.Lock()
+	defer jitterMu.Unlock()
+	lastRequestTime = time.Time{}
+}
--- a/internal/auth/kiro/metrics.go
+++ b/internal/auth/kiro/metrics.go
@@ -0,0 +1,187 @@
+package kiro
+
+import (
+	"math"
+	"sync"
+	"time"
+)
+
+// TokenMetrics holds performance metrics for a single token.
+type TokenMetrics struct {
+	SuccessRate    float64   // Success rate (0.0 - 1.0)
+	AvgLatency     float64   // Average latency in milliseconds
+	QuotaRemaining float64   // Remaining quota (0.0 - 1.0)
+	LastUsed       time.Time // Last usage timestamp
+	FailCount      int       // Consecutive failure count
+	TotalRequests  int       // Total request count
+	successCount   int       // Internal: successful request count
+	totalLatency   float64   // Internal: cumulative latency
+}
+
+// TokenScorer manages token metrics and scoring.
+type TokenScorer struct {
+	mu      sync.RWMutex
+	metrics map[string]*TokenMetrics
+
+	// Scoring weights
+	successRateWeight    float64
+	quotaWeight          float64
+	latencyWeight        float64
+	lastUsedWeight       float64
+	failPenaltyMultiplier float64
+}
+
+// NewTokenScorer creates a new TokenScorer with default weights.
+func NewTokenScorer() *TokenScorer {
+	return &TokenScorer{
+		metrics:               make(map[string]*TokenMetrics),
+		successRateWeight:     0.4,
+		quotaWeight:           0.25,
+		latencyWeight:         0.2,
+		lastUsedWeight:        0.15,
+		failPenaltyMultiplier: 0.1,
+	}
+}
+
+// getOrCreateMetrics returns existing metrics or creates new ones.
+func (s *TokenScorer) getOrCreateMetrics(tokenKey string) *TokenMetrics {
+	if m, ok := s.metrics[tokenKey]; ok {
+		return m
+	}
+	m := &TokenMetrics{
+		SuccessRate:    1.0,
+		QuotaRemaining: 1.0,
+	}
+	s.metrics[tokenKey] = m
+	return m
+}
+
+// RecordRequest records the result of a request for a token.
+func (s *TokenScorer) RecordRequest(tokenKey string, success bool, latency time.Duration) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	m := s.getOrCreateMetrics(tokenKey)
+	m.TotalRequests++
+	m.LastUsed = time.Now()
+	m.totalLatency += float64(latency.Milliseconds())
+
+	if success {
+		m.successCount++
+		m.FailCount = 0
+	} else {
+		m.FailCount++
+	}
+
+	// Update derived metrics
+	if m.TotalRequests > 0 {
+		m.SuccessRate = float64(m.successCount) / float64(m.TotalRequests)
+		m.AvgLatency = m.totalLatency / float64(m.TotalRequests)
+	}
+}
+
+// SetQuotaRemaining updates the remaining quota for a token.
+func (s *TokenScorer) SetQuotaRemaining(tokenKey string, quota float64) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	m := s.getOrCreateMetrics(tokenKey)
+	m.QuotaRemaining = quota
+}
+
+// GetMetrics returns a copy of the metrics for a token.
+func (s *TokenScorer) GetMetrics(tokenKey string) *TokenMetrics {
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+
+	if m, ok := s.metrics[tokenKey]; ok {
+		copy := *m
+		return &copy
+	}
+	return nil
+}
+
+// CalculateScore computes the score for a token (higher is better).
+func (s *TokenScorer) CalculateScore(tokenKey string) float64 {
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+
+	m, ok := s.metrics[tokenKey]
+	if !ok {
+		return 1.0 // New tokens get a high initial score
+	}
+
+	// Success rate component (0-1)
+	successScore := m.SuccessRate
+
+	// Quota component (0-1)
+	quotaScore := m.QuotaRemaining
+
+	// Latency component (normalized, lower is better)
+	// Using exponential decay: score = e^(-latency/1000)
+	// 1000ms latency -> ~0.37 score, 100ms -> ~0.90 score
+	latencyScore := math.Exp(-m.AvgLatency / 1000.0)
+	if m.TotalRequests == 0 {
+		latencyScore = 1.0
+	}
+
+	// Last used component (prefer tokens not recently used)
+	// Score increases as time since last use increases
+	timeSinceUse := time.Since(m.LastUsed).Seconds()
+	// Normalize: 60 seconds -> ~0.63 score, 0 seconds -> 0 score
+	lastUsedScore := 1.0 - math.Exp(-timeSinceUse/60.0)
+	if m.LastUsed.IsZero() {
+		lastUsedScore = 1.0
+	}
+
+	// Calculate weighted score
+	score := s.successRateWeight*successScore +
+		s.quotaWeight*quotaScore +
+		s.latencyWeight*latencyScore +
+		s.lastUsedWeight*lastUsedScore
+
+	// Apply consecutive failure penalty
+	if m.FailCount > 0 {
+		penalty := s.failPenaltyMultiplier * float64(m.FailCount)
+		score = score * math.Max(0, 1.0-penalty)
+	}
+
+	return score
+}
+
+// SelectBestToken selects the token with the highest score.
+func (s *TokenScorer) SelectBestToken(tokens []string) string {
+	if len(tokens) == 0 {
+		return ""
+	}
+	if len(tokens) == 1 {
+		return tokens[0]
+	}
+
+	bestToken := tokens[0]
+	bestScore := s.CalculateScore(tokens[0])
+
+	for _, token := range tokens[1:] {
+		score := s.CalculateScore(token)
+		if score > bestScore {
+			bestScore = score
+			bestToken = token
+		}
+	}
+
+	return bestToken
+}
+
+// ResetMetrics clears all metrics for a token.
+func (s *TokenScorer) ResetMetrics(tokenKey string) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	delete(s.metrics, tokenKey)
+}
+
+// ResetAllMetrics clears all stored metrics.
+func (s *TokenScorer) ResetAllMetrics() {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	s.metrics = make(map[string]*TokenMetrics)
+}
--- a/internal/auth/kiro/metrics_test.go
+++ b/internal/auth/kiro/metrics_test.go
@@ -0,0 +1,301 @@
+package kiro
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNewTokenScorer(t *testing.T) {
+	s := NewTokenScorer()
+	if s == nil {
+		t.Fatal("expected non-nil TokenScorer")
+	}
+	if s.metrics == nil {
+		t.Error("expected non-nil metrics map")
+	}
+	if s.successRateWeight != 0.4 {
+		t.Errorf("expected successRateWeight 0.4, got %f", s.successRateWeight)
+	}
+	if s.quotaWeight != 0.25 {
+		t.Errorf("expected quotaWeight 0.25, got %f", s.quotaWeight)
+	}
+}
+
+func TestRecordRequest_Success(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m == nil {
+		t.Fatal("expected non-nil metrics")
+	}
+	if m.TotalRequests != 1 {
+		t.Errorf("expected TotalRequests 1, got %d", m.TotalRequests)
+	}
+	if m.SuccessRate != 1.0 {
+		t.Errorf("expected SuccessRate 1.0, got %f", m.SuccessRate)
+	}
+	if m.FailCount != 0 {
+		t.Errorf("expected FailCount 0, got %d", m.FailCount)
+	}
+	if m.AvgLatency != 100 {
+		t.Errorf("expected AvgLatency 100, got %f", m.AvgLatency)
+	}
+}
+
+func TestRecordRequest_Failure(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", false, 200*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.SuccessRate != 0.0 {
+		t.Errorf("expected SuccessRate 0.0, got %f", m.SuccessRate)
+	}
+	if m.FailCount != 1 {
+		t.Errorf("expected FailCount 1, got %d", m.FailCount)
+	}
+}
+
+func TestRecordRequest_MixedResults(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.TotalRequests != 4 {
+		t.Errorf("expected TotalRequests 4, got %d", m.TotalRequests)
+	}
+	if m.SuccessRate != 0.75 {
+		t.Errorf("expected SuccessRate 0.75, got %f", m.SuccessRate)
+	}
+	if m.FailCount != 0 {
+		t.Errorf("expected FailCount 0 (reset on success), got %d", m.FailCount)
+	}
+}
+
+func TestRecordRequest_ConsecutiveFailures(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.FailCount != 3 {
+		t.Errorf("expected FailCount 3, got %d", m.FailCount)
+	}
+}
+
+func TestSetQuotaRemaining(t *testing.T) {
+	s := NewTokenScorer()
+	s.SetQuotaRemaining("token1", 0.5)
+
+	m := s.GetMetrics("token1")
+	if m.QuotaRemaining != 0.5 {
+		t.Errorf("expected QuotaRemaining 0.5, got %f", m.QuotaRemaining)
+	}
+}
+
+func TestGetMetrics_NonExistent(t *testing.T) {
+	s := NewTokenScorer()
+	m := s.GetMetrics("nonexistent")
+	if m != nil {
+		t.Error("expected nil metrics for non-existent token")
+	}
+}
+
+func TestGetMetrics_ReturnsCopy(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+
+	m1 := s.GetMetrics("token1")
+	m1.TotalRequests = 999
+
+	m2 := s.GetMetrics("token1")
+	if m2.TotalRequests == 999 {
+		t.Error("GetMetrics should return a copy")
+	}
+}
+
+func TestCalculateScore_NewToken(t *testing.T) {
+	s := NewTokenScorer()
+	score := s.CalculateScore("newtoken")
+	if score != 1.0 {
+		t.Errorf("expected score 1.0 for new token, got %f", score)
+	}
+}
+
+func TestCalculateScore_PerfectToken(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 50*time.Millisecond)
+	s.SetQuotaRemaining("token1", 1.0)
+
+	time.Sleep(100 * time.Millisecond)
+	score := s.CalculateScore("token1")
+	if score < 0.5 || score > 1.0 {
+		t.Errorf("expected high score for perfect token, got %f", score)
+	}
+}
+
+func TestCalculateScore_FailedToken(t *testing.T) {
+	s := NewTokenScorer()
+	for i := 0; i < 5; i++ {
+		s.RecordRequest("token1", false, 1000*time.Millisecond)
+	}
+	s.SetQuotaRemaining("token1", 0.1)
+
+	score := s.CalculateScore("token1")
+	if score > 0.5 {
+		t.Errorf("expected low score for failed token, got %f", score)
+	}
+}
+
+func TestCalculateScore_FailPenalty(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	scoreNoFail := s.CalculateScore("token1")
+
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+	s.RecordRequest("token1", false, 100*time.Millisecond)
+	scoreWithFail := s.CalculateScore("token1")
+
+	if scoreWithFail >= scoreNoFail {
+		t.Errorf("expected lower score with consecutive failures: noFail=%f, withFail=%f", scoreNoFail, scoreWithFail)
+	}
+}
+
+func TestSelectBestToken_Empty(t *testing.T) {
+	s := NewTokenScorer()
+	best := s.SelectBestToken([]string{})
+	if best != "" {
+		t.Errorf("expected empty string for empty tokens, got %s", best)
+	}
+}
+
+func TestSelectBestToken_SingleToken(t *testing.T) {
+	s := NewTokenScorer()
+	best := s.SelectBestToken([]string{"token1"})
+	if best != "token1" {
+		t.Errorf("expected token1, got %s", best)
+	}
+}
+
+func TestSelectBestToken_MultipleTokens(t *testing.T) {
+	s := NewTokenScorer()
+
+	s.RecordRequest("bad", false, 1000*time.Millisecond)
+	s.RecordRequest("bad", false, 1000*time.Millisecond)
+	s.SetQuotaRemaining("bad", 0.1)
+
+	s.RecordRequest("good", true, 50*time.Millisecond)
+	s.SetQuotaRemaining("good", 0.9)
+
+	time.Sleep(50 * time.Millisecond)
+
+	best := s.SelectBestToken([]string{"bad", "good"})
+	if best != "good" {
+		t.Errorf("expected good token to be selected, got %s", best)
+	}
+}
+
+func TestResetMetrics(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.ResetMetrics("token1")
+
+	m := s.GetMetrics("token1")
+	if m != nil {
+		t.Error("expected nil metrics after reset")
+	}
+}
+
+func TestResetAllMetrics(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.RecordRequest("token2", true, 100*time.Millisecond)
+	s.RecordRequest("token3", true, 100*time.Millisecond)
+
+	s.ResetAllMetrics()
+
+	if s.GetMetrics("token1") != nil {
+		t.Error("expected nil metrics for token1 after reset all")
+	}
+	if s.GetMetrics("token2") != nil {
+		t.Error("expected nil metrics for token2 after reset all")
+	}
+}
+
+func TestTokenScorer_ConcurrentAccess(t *testing.T) {
+	s := NewTokenScorer()
+	const numGoroutines = 50
+	const numOperations = 100
+
+	var wg sync.WaitGroup
+	wg.Add(numGoroutines)
+
+	for i := 0; i < numGoroutines; i++ {
+		go func(id int) {
+			defer wg.Done()
+			tokenKey := "token" + string(rune('a'+id%10))
+			for j := 0; j < numOperations; j++ {
+				switch j % 6 {
+				case 0:
+					s.RecordRequest(tokenKey, j%2 == 0, time.Duration(j)*time.Millisecond)
+				case 1:
+					s.SetQuotaRemaining(tokenKey, float64(j%100)/100)
+				case 2:
+					s.GetMetrics(tokenKey)
+				case 3:
+					s.CalculateScore(tokenKey)
+				case 4:
+					s.SelectBestToken([]string{tokenKey, "token_x", "token_y"})
+				case 5:
+					if j%20 == 0 {
+						s.ResetMetrics(tokenKey)
+					}
+				}
+			}
+		}(i)
+	}
+
+	wg.Wait()
+}
+
+func TestAvgLatencyCalculation(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+	s.RecordRequest("token1", true, 200*time.Millisecond)
+	s.RecordRequest("token1", true, 300*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.AvgLatency != 200 {
+		t.Errorf("expected AvgLatency 200, got %f", m.AvgLatency)
+	}
+}
+
+func TestLastUsedUpdated(t *testing.T) {
+	s := NewTokenScorer()
+	before := time.Now()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.LastUsed.Before(before) {
+		t.Error("expected LastUsed to be after test start time")
+	}
+	if m.LastUsed.After(time.Now()) {
+		t.Error("expected LastUsed to be before or equal to now")
+	}
+}
+
+func TestDefaultQuotaForNewToken(t *testing.T) {
+	s := NewTokenScorer()
+	s.RecordRequest("token1", true, 100*time.Millisecond)
+
+	m := s.GetMetrics("token1")
+	if m.QuotaRemaining != 1.0 {
+		t.Errorf("expected default QuotaRemaining 1.0, got %f", m.QuotaRemaining)
+	}
+}
--- a/internal/auth/kiro/oauth.go
+++ b/internal/auth/kiro/oauth.go
@@ -227,6 +227,7 @@ func (o *KiroOAuth) exchangeCodeForToken(ctx context.Context, code, codeVerifier
 		ExpiresAt:    expiresAt.Format(time.RFC3339),
 		AuthMethod:   "social",
 		Provider:     "", // Caller should preserve original provider
+		Region:       "us-east-1",
 	}, nil
 }

@@ -285,6 +286,7 @@ func (o *KiroOAuth) RefreshToken(ctx context.Context, refreshToken string) (*Kir
 		ExpiresAt:    expiresAt.Format(time.RFC3339),
 		AuthMethod:   "social",
 		Provider:     "", // Caller should preserve original provider
+		Region:       "us-east-1",
 	}, nil
 }

--- a/internal/auth/kiro/oauth_web.go
+++ b/internal/auth/kiro/oauth_web.go
@@ -0,0 +1,982 @@
+// Package kiro provides OAuth Web authentication for Kiro.
+package kiro
+
+import (
+	"context"
+	"crypto/rand"
+	"encoding/base64"
+	"encoding/json"
+	"fmt"
+	"html/template"
+	"net/http"
+	"os"
+	"path/filepath"
+	"strings"
+	"sync"
+	"time"
+
+	"github.com/gin-gonic/gin"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/util"
+	log "github.com/sirupsen/logrus"
+)
+
+const (
+	defaultSessionExpiry = 10 * time.Minute
+	pollIntervalSeconds  = 5
+)
+
+type authSessionStatus string
+
+const (
+	statusPending authSessionStatus = "pending"
+	statusSuccess authSessionStatus = "success"
+	statusFailed  authSessionStatus = "failed"
+)
+
+type webAuthSession struct {
+	stateID          string
+	deviceCode       string
+	userCode         string
+	authURL          string
+	verificationURI  string
+	expiresIn        int
+	interval         int
+	status           authSessionStatus
+	startedAt        time.Time
+	completedAt      time.Time
+	expiresAt        time.Time
+	error            string
+	tokenData        *KiroTokenData
+	ssoClient        *SSOOIDCClient
+	clientID         string
+	clientSecret     string
+	region           string
+	cancelFunc       context.CancelFunc
+	authMethod       string // "google", "github", "builder-id", "idc"
+	startURL         string // Used for IDC
+	codeVerifier     string // Used for social auth PKCE
+	codeChallenge    string // Used for social auth PKCE
+}
+
+type OAuthWebHandler struct {
+	cfg              *config.Config
+	sessions         map[string]*webAuthSession
+	mu               sync.RWMutex
+	onTokenObtained  func(*KiroTokenData)
+}
+
+func NewOAuthWebHandler(cfg *config.Config) *OAuthWebHandler {
+	return &OAuthWebHandler{
+		cfg:      cfg,
+		sessions: make(map[string]*webAuthSession),
+	}
+}
+
+func (h *OAuthWebHandler) SetTokenCallback(callback func(*KiroTokenData)) {
+	h.onTokenObtained = callback
+}
+
+func (h *OAuthWebHandler) RegisterRoutes(router gin.IRouter) {
+	oauth := router.Group("/v0/oauth/kiro")
+	{
+		oauth.GET("", h.handleSelect)
+		oauth.GET("/start", h.handleStart)
+		oauth.GET("/callback", h.handleCallback)
+		oauth.GET("/social/callback", h.handleSocialCallback)
+		oauth.GET("/status", h.handleStatus)
+		oauth.POST("/import", h.handleImportToken)
+		oauth.POST("/refresh", h.handleManualRefresh)
+	}
+}
+
+func generateStateID() (string, error) {
+	b := make([]byte, 16)
+	if _, err := rand.Read(b); err != nil {
+		return "", err
+	}
+	return base64.RawURLEncoding.EncodeToString(b), nil
+}
+
+func (h *OAuthWebHandler) handleSelect(c *gin.Context) {
+	h.renderSelectPage(c)
+}
+
+func (h *OAuthWebHandler) handleStart(c *gin.Context) {
+	method := c.Query("method")
+	
+	if method == "" {
+		c.Redirect(http.StatusFound, "/v0/oauth/kiro")
+		return
+	}
+
+	switch method {
+	case "google", "github":
+		// Google/GitHub social login is not supported for third-party apps
+		// due to AWS Cognito redirect_uri restrictions
+		h.renderError(c, "Google/GitHub login is not available for third-party applications. Please use AWS Builder ID or import your token from Kiro IDE.")
+	case "builder-id":
+		h.startBuilderIDAuth(c)
+	case "idc":
+		h.startIDCAuth(c)
+	default:
+		h.renderError(c, fmt.Sprintf("Unknown authentication method: %s", method))
+	}
+}
+
+func (h *OAuthWebHandler) startSocialAuth(c *gin.Context, method string) {
+	stateID, err := generateStateID()
+	if err != nil {
+		h.renderError(c, "Failed to generate state parameter")
+		return
+	}
+
+	codeVerifier, codeChallenge, err := generatePKCE()
+	if err != nil {
+		h.renderError(c, "Failed to generate PKCE parameters")
+		return
+	}
+
+	socialClient := NewSocialAuthClient(h.cfg)
+	
+	var provider string
+	if method == "google" {
+		provider = string(ProviderGoogle)
+	} else {
+		provider = string(ProviderGitHub)
+	}
+
+	redirectURI := h.getSocialCallbackURL(c)
+	authURL := socialClient.buildLoginURL(provider, redirectURI, codeChallenge, stateID)
+
+	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
+
+	session := &webAuthSession{
+		stateID:       stateID,
+		authMethod:    method,
+		authURL:       authURL,
+		status:        statusPending,
+		startedAt:     time.Now(),
+		expiresIn:     600,
+		codeVerifier:  codeVerifier,
+		codeChallenge: codeChallenge,
+		region:        "us-east-1",
+		cancelFunc:    cancel,
+	}
+
+	h.mu.Lock()
+	h.sessions[stateID] = session
+	h.mu.Unlock()
+
+	go func() {
+		<-ctx.Done()
+		h.mu.Lock()
+		if session.status == statusPending {
+			session.status = statusFailed
+			session.error = "Authentication timed out"
+		}
+		h.mu.Unlock()
+	}()
+
+	c.Redirect(http.StatusFound, authURL)
+}
+
+func (h *OAuthWebHandler) getSocialCallbackURL(c *gin.Context) string {
+	scheme := "http"
+	if c.Request.TLS != nil || c.GetHeader("X-Forwarded-Proto") == "https" {
+		scheme = "https"
+	}
+	return fmt.Sprintf("%s://%s/v0/oauth/kiro/social/callback", scheme, c.Request.Host)
+}
+
+func (h *OAuthWebHandler) startBuilderIDAuth(c *gin.Context) {
+	stateID, err := generateStateID()
+	if err != nil {
+		h.renderError(c, "Failed to generate state parameter")
+		return
+	}
+
+	region := defaultIDCRegion
+	startURL := builderIDStartURL
+
+	ssoClient := NewSSOOIDCClient(h.cfg)
+
+	regResp, err := ssoClient.RegisterClientWithRegion(c.Request.Context(), region)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to register client: %v", err)
+		h.renderError(c, fmt.Sprintf("Failed to register client: %v", err))
+		return
+	}
+
+	authResp, err := ssoClient.StartDeviceAuthorizationWithIDC(
+		c.Request.Context(),
+		regResp.ClientID,
+		regResp.ClientSecret,
+		startURL,
+		region,
+	)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to start device authorization: %v", err)
+		h.renderError(c, fmt.Sprintf("Failed to start device authorization: %v", err))
+		return
+	}
+
+	ctx, cancel := context.WithTimeout(context.Background(), time.Duration(authResp.ExpiresIn)*time.Second)
+
+	session := &webAuthSession{
+		stateID:         stateID,
+		deviceCode:      authResp.DeviceCode,
+		userCode:        authResp.UserCode,
+		authURL:         authResp.VerificationURIComplete,
+		verificationURI: authResp.VerificationURI,
+		expiresIn:       authResp.ExpiresIn,
+		interval:        authResp.Interval,
+		status:          statusPending,
+		startedAt:       time.Now(),
+		ssoClient:       ssoClient,
+		clientID:        regResp.ClientID,
+		clientSecret:    regResp.ClientSecret,
+		region:          region,
+		authMethod:      "builder-id",
+		startURL:        startURL,
+		cancelFunc:      cancel,
+	}
+
+	h.mu.Lock()
+	h.sessions[stateID] = session
+	h.mu.Unlock()
+
+	go h.pollForToken(ctx, session)
+
+	h.renderStartPage(c, session)
+}
+
+func (h *OAuthWebHandler) startIDCAuth(c *gin.Context) {
+	startURL := c.Query("startUrl")
+	region := c.Query("region")
+
+	if startURL == "" {
+		h.renderError(c, "Missing startUrl parameter for IDC authentication")
+		return
+	}
+	if region == "" {
+		region = defaultIDCRegion
+	}
+
+	stateID, err := generateStateID()
+	if err != nil {
+		h.renderError(c, "Failed to generate state parameter")
+		return
+	}
+
+	ssoClient := NewSSOOIDCClient(h.cfg)
+
+	regResp, err := ssoClient.RegisterClientWithRegion(c.Request.Context(), region)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to register client: %v", err)
+		h.renderError(c, fmt.Sprintf("Failed to register client: %v", err))
+		return
+	}
+
+	authResp, err := ssoClient.StartDeviceAuthorizationWithIDC(
+		c.Request.Context(),
+		regResp.ClientID,
+		regResp.ClientSecret,
+		startURL,
+		region,
+	)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to start device authorization: %v", err)
+		h.renderError(c, fmt.Sprintf("Failed to start device authorization: %v", err))
+		return
+	}
+
+	ctx, cancel := context.WithTimeout(context.Background(), time.Duration(authResp.ExpiresIn)*time.Second)
+
+	session := &webAuthSession{
+		stateID:         stateID,
+		deviceCode:      authResp.DeviceCode,
+		userCode:        authResp.UserCode,
+		authURL:         authResp.VerificationURIComplete,
+		verificationURI: authResp.VerificationURI,
+		expiresIn:       authResp.ExpiresIn,
+		interval:        authResp.Interval,
+		status:          statusPending,
+		startedAt:       time.Now(),
+		ssoClient:       ssoClient,
+		clientID:        regResp.ClientID,
+		clientSecret:    regResp.ClientSecret,
+		region:          region,
+		authMethod:      "idc",
+		startURL:        startURL,
+		cancelFunc:      cancel,
+	}
+
+	h.mu.Lock()
+	h.sessions[stateID] = session
+	h.mu.Unlock()
+
+	go h.pollForToken(ctx, session)
+
+	h.renderStartPage(c, session)
+}
+
+func (h *OAuthWebHandler) pollForToken(ctx context.Context, session *webAuthSession) {
+	defer session.cancelFunc()
+
+	interval := time.Duration(session.interval) * time.Second
+	if interval < time.Duration(pollIntervalSeconds)*time.Second {
+		interval = time.Duration(pollIntervalSeconds) * time.Second
+	}
+
+	ticker := time.NewTicker(interval)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ctx.Done():
+			h.mu.Lock()
+			if session.status == statusPending {
+				session.status = statusFailed
+				session.error = "Authentication timed out"
+			}
+			h.mu.Unlock()
+			return
+		case <-ticker.C:
+			tokenResp, err := h.ssoClient(session).CreateTokenWithRegion(
+				ctx,
+				session.clientID,
+				session.clientSecret,
+				session.deviceCode,
+				session.region,
+			)
+
+			if err != nil {
+				errStr := err.Error()
+				if errStr == ErrAuthorizationPending.Error() {
+					continue
+				}
+				if errStr == ErrSlowDown.Error() {
+					interval += 5 * time.Second
+					ticker.Reset(interval)
+					continue
+				}
+
+				h.mu.Lock()
+				session.status = statusFailed
+				session.error = errStr
+				session.completedAt = time.Now()
+				h.mu.Unlock()
+
+				log.Errorf("OAuth Web: token polling failed: %v", err)
+				return
+			}
+
+			expiresAt := time.Now().Add(time.Duration(tokenResp.ExpiresIn) * time.Second)
+			profileArn := session.ssoClient.fetchProfileArn(ctx, tokenResp.AccessToken)
+			email := FetchUserEmailWithFallback(ctx, h.cfg, tokenResp.AccessToken)
+
+			tokenData := &KiroTokenData{
+					AccessToken:  tokenResp.AccessToken,
+					RefreshToken: tokenResp.RefreshToken,
+					ProfileArn:   profileArn,
+					ExpiresAt:    expiresAt.Format(time.RFC3339),
+					AuthMethod:   session.authMethod,
+					Provider:     "AWS",
+					ClientID:     session.clientID,
+					ClientSecret: session.clientSecret,
+					Email:        email,
+					Region:       session.region,
+					StartURL:     session.startURL,
+				}
+
+			h.mu.Lock()
+			session.status = statusSuccess
+			session.completedAt = time.Now()
+			session.expiresAt = expiresAt
+			session.tokenData = tokenData
+			h.mu.Unlock()
+
+			if h.onTokenObtained != nil {
+				h.onTokenObtained(tokenData)
+			}
+
+			// Save token to file
+			h.saveTokenToFile(tokenData)
+
+			log.Infof("OAuth Web: authentication successful for %s", email)
+			return
+		}
+	}
+}
+
+// saveTokenToFile saves the token data to the auth directory
+func (h *OAuthWebHandler) saveTokenToFile(tokenData *KiroTokenData) {
+	// Get auth directory from config or use default
+	authDir := ""
+	if h.cfg != nil && h.cfg.AuthDir != "" {
+		var err error
+		authDir, err = util.ResolveAuthDir(h.cfg.AuthDir)
+		if err != nil {
+			log.Errorf("OAuth Web: failed to resolve auth directory: %v", err)
+		}
+	}
+	
+	// Fall back to default location
+	if authDir == "" {
+		home, err := os.UserHomeDir()
+		if err != nil {
+			log.Errorf("OAuth Web: failed to get home directory: %v", err)
+			return
+		}
+		authDir = filepath.Join(home, ".cli-proxy-api")
+	}
+	
+	// Create directory if not exists
+	if err := os.MkdirAll(authDir, 0700); err != nil {
+		log.Errorf("OAuth Web: failed to create auth directory: %v", err)
+		return
+	}
+	
+	// Generate filename based on auth method
+	// Format: kiro-{authMethod}.json or kiro-{authMethod}-{email}.json
+	fileName := fmt.Sprintf("kiro-%s.json", tokenData.AuthMethod)
+	if tokenData.Email != "" {
+		// Sanitize email for filename (replace @ and . with -)
+		sanitizedEmail := tokenData.Email
+		sanitizedEmail = strings.ReplaceAll(sanitizedEmail, "@", "-")
+		sanitizedEmail = strings.ReplaceAll(sanitizedEmail, ".", "-")
+		fileName = fmt.Sprintf("kiro-%s-%s.json", tokenData.AuthMethod, sanitizedEmail)
+	}
+	
+	authFilePath := filepath.Join(authDir, fileName)
+	
+	// Convert to storage format and save
+	storage := &KiroTokenStorage{
+		Type:         "kiro",
+		AccessToken:  tokenData.AccessToken,
+		RefreshToken: tokenData.RefreshToken,
+		ProfileArn:   tokenData.ProfileArn,
+		ExpiresAt:    tokenData.ExpiresAt,
+		AuthMethod:   tokenData.AuthMethod,
+		Provider:     tokenData.Provider,
+		LastRefresh:  time.Now().Format(time.RFC3339),
+		ClientID:     tokenData.ClientID,
+		ClientSecret: tokenData.ClientSecret,
+		Region:       tokenData.Region,
+		StartURL:     tokenData.StartURL,
+		Email:        tokenData.Email,
+	}
+	
+	if err := storage.SaveTokenToFile(authFilePath); err != nil {
+		log.Errorf("OAuth Web: failed to save token to file: %v", err)
+		return
+	}
+	
+	log.Infof("OAuth Web: token saved to %s", authFilePath)
+}
+
+func (h *OAuthWebHandler) ssoClient(session *webAuthSession) *SSOOIDCClient {
+	return session.ssoClient
+}
+
+func (h *OAuthWebHandler) handleCallback(c *gin.Context) {
+	stateID := c.Query("state")
+	errParam := c.Query("error")
+
+	if errParam != "" {
+		h.renderError(c, errParam)
+		return
+	}
+
+	if stateID == "" {
+		h.renderError(c, "Missing state parameter")
+		return
+	}
+
+	h.mu.RLock()
+	session, exists := h.sessions[stateID]
+	h.mu.RUnlock()
+
+	if !exists {
+		h.renderError(c, "Invalid or expired session")
+		return
+	}
+
+	if session.status == statusSuccess {
+		h.renderSuccess(c, session)
+	} else if session.status == statusFailed {
+		h.renderError(c, session.error)
+	} else {
+		c.Redirect(http.StatusFound, "/v0/oauth/kiro/start")
+	}
+}
+
+func (h *OAuthWebHandler) handleSocialCallback(c *gin.Context) {
+	stateID := c.Query("state")
+	code := c.Query("code")
+	errParam := c.Query("error")
+
+	if errParam != "" {
+		h.renderError(c, errParam)
+		return
+	}
+
+	if stateID == "" {
+		h.renderError(c, "Missing state parameter")
+		return
+	}
+
+	if code == "" {
+		h.renderError(c, "Missing authorization code")
+		return
+	}
+
+	h.mu.RLock()
+	session, exists := h.sessions[stateID]
+	h.mu.RUnlock()
+
+	if !exists {
+		h.renderError(c, "Invalid or expired session")
+		return
+	}
+
+	if session.authMethod != "google" && session.authMethod != "github" {
+		h.renderError(c, "Invalid session type for social callback")
+		return
+	}
+
+	socialClient := NewSocialAuthClient(h.cfg)
+	redirectURI := h.getSocialCallbackURL(c)
+
+	tokenReq := &CreateTokenRequest{
+		Code:         code,
+		CodeVerifier: session.codeVerifier,
+		RedirectURI:  redirectURI,
+	}
+
+	tokenResp, err := socialClient.CreateToken(c.Request.Context(), tokenReq)
+	if err != nil {
+		log.Errorf("OAuth Web: social token exchange failed: %v", err)
+		h.mu.Lock()
+		session.status = statusFailed
+		session.error = fmt.Sprintf("Token exchange failed: %v", err)
+		session.completedAt = time.Now()
+		h.mu.Unlock()
+		h.renderError(c, session.error)
+		return
+	}
+
+	expiresIn := tokenResp.ExpiresIn
+	if expiresIn <= 0 {
+		expiresIn = 3600
+	}
+	expiresAt := time.Now().Add(time.Duration(expiresIn) * time.Second)
+
+	email := ExtractEmailFromJWT(tokenResp.AccessToken)
+
+	var provider string
+	if session.authMethod == "google" {
+		provider = string(ProviderGoogle)
+	} else {
+		provider = string(ProviderGitHub)
+	}
+
+	tokenData := &KiroTokenData{
+		AccessToken:  tokenResp.AccessToken,
+		RefreshToken: tokenResp.RefreshToken,
+		ProfileArn:   tokenResp.ProfileArn,
+		ExpiresAt:    expiresAt.Format(time.RFC3339),
+		AuthMethod:   session.authMethod,
+		Provider:     provider,
+		Email:        email,
+		Region:       "us-east-1",
+	}
+
+	h.mu.Lock()
+	session.status = statusSuccess
+	session.completedAt = time.Now()
+	session.expiresAt = expiresAt
+	session.tokenData = tokenData
+	h.mu.Unlock()
+
+	if session.cancelFunc != nil {
+		session.cancelFunc()
+	}
+
+	if h.onTokenObtained != nil {
+		h.onTokenObtained(tokenData)
+	}
+
+	// Save token to file
+	h.saveTokenToFile(tokenData)
+
+	log.Infof("OAuth Web: social authentication successful for %s via %s", email, provider)
+	h.renderSuccess(c, session)
+}
+
+func (h *OAuthWebHandler) handleStatus(c *gin.Context) {
+	stateID := c.Query("state")
+	if stateID == "" {
+		c.JSON(http.StatusBadRequest, gin.H{"error": "missing state parameter"})
+		return
+	}
+
+	h.mu.RLock()
+	session, exists := h.sessions[stateID]
+	h.mu.RUnlock()
+
+	if !exists {
+		c.JSON(http.StatusNotFound, gin.H{"error": "session not found"})
+		return
+	}
+
+	response := gin.H{
+		"status": string(session.status),
+	}
+
+	switch session.status {
+	case statusPending:
+		elapsed := time.Since(session.startedAt).Seconds()
+		remaining := float64(session.expiresIn) - elapsed
+		if remaining < 0 {
+			remaining = 0
+		}
+		response["remaining_seconds"] = int(remaining)
+	case statusSuccess:
+		response["completed_at"] = session.completedAt.Format(time.RFC3339)
+		response["expires_at"] = session.expiresAt.Format(time.RFC3339)
+	case statusFailed:
+		response["error"] = session.error
+		response["failed_at"] = session.completedAt.Format(time.RFC3339)
+	}
+
+	c.JSON(http.StatusOK, response)
+}
+
+func (h *OAuthWebHandler) renderStartPage(c *gin.Context, session *webAuthSession) {
+	tmpl, err := template.New("start").Parse(oauthWebStartPageHTML)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to parse template: %v", err)
+		c.String(http.StatusInternalServerError, "Template error")
+		return
+	}
+
+	data := map[string]interface{}{
+		"AuthURL":   session.authURL,
+		"UserCode":  session.userCode,
+		"ExpiresIn": session.expiresIn,
+		"StateID":   session.stateID,
+	}
+
+	c.Header("Content-Type", "text/html; charset=utf-8")
+	if err := tmpl.Execute(c.Writer, data); err != nil {
+		log.Errorf("OAuth Web: failed to render template: %v", err)
+	}
+}
+
+func (h *OAuthWebHandler) renderSelectPage(c *gin.Context) {
+	tmpl, err := template.New("select").Parse(oauthWebSelectPageHTML)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to parse select template: %v", err)
+		c.String(http.StatusInternalServerError, "Template error")
+		return
+	}
+
+	c.Header("Content-Type", "text/html; charset=utf-8")
+	if err := tmpl.Execute(c.Writer, nil); err != nil {
+		log.Errorf("OAuth Web: failed to render select template: %v", err)
+	}
+}
+
+func (h *OAuthWebHandler) renderError(c *gin.Context, errMsg string) {
+	tmpl, err := template.New("error").Parse(oauthWebErrorPageHTML)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to parse error template: %v", err)
+		c.String(http.StatusInternalServerError, "Template error")
+		return
+	}
+
+	data := map[string]interface{}{
+		"Error": errMsg,
+	}
+
+	c.Header("Content-Type", "text/html; charset=utf-8")
+	c.Status(http.StatusBadRequest)
+	if err := tmpl.Execute(c.Writer, data); err != nil {
+		log.Errorf("OAuth Web: failed to render error template: %v", err)
+	}
+}
+
+func (h *OAuthWebHandler) renderSuccess(c *gin.Context, session *webAuthSession) {
+	tmpl, err := template.New("success").Parse(oauthWebSuccessPageHTML)
+	if err != nil {
+		log.Errorf("OAuth Web: failed to parse success template: %v", err)
+		c.String(http.StatusInternalServerError, "Template error")
+		return
+	}
+
+	data := map[string]interface{}{
+		"ExpiresAt": session.expiresAt.Format(time.RFC3339),
+	}
+
+	c.Header("Content-Type", "text/html; charset=utf-8")
+	if err := tmpl.Execute(c.Writer, data); err != nil {
+		log.Errorf("OAuth Web: failed to render success template: %v", err)
+	}
+}
+
+func (h *OAuthWebHandler) CleanupExpiredSessions() {
+	h.mu.Lock()
+	defer h.mu.Unlock()
+
+	now := time.Now()
+	for id, session := range h.sessions {
+		if session.status != statusPending && now.Sub(session.completedAt) > 30*time.Minute {
+			delete(h.sessions, id)
+		} else if session.status == statusPending && now.Sub(session.startedAt) > defaultSessionExpiry {
+			session.cancelFunc()
+			delete(h.sessions, id)
+		}
+	}
+}
+
+func (h *OAuthWebHandler) GetSession(stateID string) (*webAuthSession, bool) {
+	h.mu.RLock()
+	defer h.mu.RUnlock()
+	session, exists := h.sessions[stateID]
+	return session, exists
+}
+
+// ImportTokenRequest represents the request body for token import
+type ImportTokenRequest struct {
+	RefreshToken string `json:"refreshToken"`
+}
+
+// handleImportToken handles manual refresh token import from Kiro IDE
+func (h *OAuthWebHandler) handleImportToken(c *gin.Context) {
+	var req ImportTokenRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"error":   "Invalid request body",
+		})
+		return
+	}
+
+	refreshToken := strings.TrimSpace(req.RefreshToken)
+	if refreshToken == "" {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"error":   "Refresh token is required",
+		})
+		return
+	}
+
+	// Validate token format
+	if !strings.HasPrefix(refreshToken, "aorAAAAAG") {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"error":   "Invalid token format. Token should start with aorAAAAAG...",
+		})
+		return
+	}
+
+	// Create social auth client to refresh and validate the token
+	socialClient := NewSocialAuthClient(h.cfg)
+
+	// Refresh the token to validate it and get access token
+	tokenData, err := socialClient.RefreshSocialToken(c.Request.Context(), refreshToken)
+	if err != nil {
+		log.Errorf("OAuth Web: token refresh failed during import: %v", err)
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"error":   fmt.Sprintf("Token validation failed: %v", err),
+		})
+		return
+	}
+
+	// Set the original refresh token (the refreshed one might be empty)
+	if tokenData.RefreshToken == "" {
+		tokenData.RefreshToken = refreshToken
+	}
+	tokenData.AuthMethod = "social"
+	tokenData.Provider = "imported"
+
+	// Notify callback if set
+	if h.onTokenObtained != nil {
+		h.onTokenObtained(tokenData)
+	}
+
+	// Save token to file
+	h.saveTokenToFile(tokenData)
+
+	// Generate filename for response
+	fileName := fmt.Sprintf("kiro-%s.json", tokenData.AuthMethod)
+	if tokenData.Email != "" {
+		sanitizedEmail := strings.ReplaceAll(tokenData.Email, "@", "-")
+		sanitizedEmail = strings.ReplaceAll(sanitizedEmail, ".", "-")
+		fileName = fmt.Sprintf("kiro-%s-%s.json", tokenData.AuthMethod, sanitizedEmail)
+	}
+
+	log.Infof("OAuth Web: token imported successfully")
+	c.JSON(http.StatusOK, gin.H{
+		"success":  true,
+		"message":  "Token imported successfully",
+		"fileName": fileName,
+	})
+}
+
+// handleManualRefresh handles manual token refresh requests from the web UI.
+// This allows users to trigger a token refresh when needed, without waiting
+// for the automatic 30-second check and 20-minute-before-expiry refresh cycle.
+// Uses the same refresh logic as kiro_executor.Refresh for consistency.
+func (h *OAuthWebHandler) handleManualRefresh(c *gin.Context) {
+	authDir := ""
+	if h.cfg != nil && h.cfg.AuthDir != "" {
+		var err error
+		authDir, err = util.ResolveAuthDir(h.cfg.AuthDir)
+		if err != nil {
+			log.Errorf("OAuth Web: failed to resolve auth directory: %v", err)
+		}
+	}
+
+	if authDir == "" {
+		home, err := os.UserHomeDir()
+		if err != nil {
+			c.JSON(http.StatusInternalServerError, gin.H{
+				"success": false,
+				"error":   "Failed to get home directory",
+			})
+			return
+		}
+		authDir = filepath.Join(home, ".cli-proxy-api")
+	}
+
+	// Find all kiro token files in the auth directory
+	files, err := os.ReadDir(authDir)
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{
+			"success": false,
+			"error":   fmt.Sprintf("Failed to read auth directory: %v", err),
+		})
+		return
+	}
+
+	var refreshedCount int
+	var errors []string
+
+	for _, file := range files {
+		if file.IsDir() {
+			continue
+		}
+		name := file.Name()
+		if !strings.HasPrefix(name, "kiro-") || !strings.HasSuffix(name, ".json") {
+			continue
+		}
+
+		filePath := filepath.Join(authDir, name)
+		data, err := os.ReadFile(filePath)
+		if err != nil {
+			errors = append(errors, fmt.Sprintf("%s: read error - %v", name, err))
+			continue
+		}
+
+		var storage KiroTokenStorage
+		if err := json.Unmarshal(data, &storage); err != nil {
+			errors = append(errors, fmt.Sprintf("%s: parse error - %v", name, err))
+			continue
+		}
+
+		if storage.RefreshToken == "" {
+			errors = append(errors, fmt.Sprintf("%s: no refresh token", name))
+			continue
+		}
+
+		// Refresh token using the same logic as kiro_executor.Refresh
+		tokenData, err := h.refreshTokenData(c.Request.Context(), &storage)
+		if err != nil {
+			errors = append(errors, fmt.Sprintf("%s: refresh failed - %v", name, err))
+			continue
+		}
+
+		// Update storage with new token data
+		storage.AccessToken = tokenData.AccessToken
+		if tokenData.RefreshToken != "" {
+			storage.RefreshToken = tokenData.RefreshToken
+		}
+		storage.ExpiresAt = tokenData.ExpiresAt
+		storage.LastRefresh = time.Now().Format(time.RFC3339)
+		if tokenData.ProfileArn != "" {
+			storage.ProfileArn = tokenData.ProfileArn
+		}
+
+		// Write updated token back to file
+		updatedData, err := json.MarshalIndent(storage, "", "  ")
+		if err != nil {
+			errors = append(errors, fmt.Sprintf("%s: marshal error - %v", name, err))
+			continue
+		}
+
+		tmpFile := filePath + ".tmp"
+		if err := os.WriteFile(tmpFile, updatedData, 0600); err != nil {
+			errors = append(errors, fmt.Sprintf("%s: write error - %v", name, err))
+			continue
+		}
+		if err := os.Rename(tmpFile, filePath); err != nil {
+			errors = append(errors, fmt.Sprintf("%s: rename error - %v", name, err))
+			continue
+		}
+
+		log.Infof("OAuth Web: manually refreshed token in %s, expires at %s", name, tokenData.ExpiresAt)
+		refreshedCount++
+
+		// Notify callback if set
+		if h.onTokenObtained != nil {
+			h.onTokenObtained(tokenData)
+		}
+	}
+
+	if refreshedCount == 0 && len(errors) > 0 {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"error":   fmt.Sprintf("All refresh attempts failed: %v", errors),
+		})
+		return
+	}
+
+	response := gin.H{
+		"success":        true,
+		"message":        fmt.Sprintf("Refreshed %d token(s)", refreshedCount),
+		"refreshedCount": refreshedCount,
+	}
+	if len(errors) > 0 {
+		response["warnings"] = errors
+	}
+
+	c.JSON(http.StatusOK, response)
+}
+
+// refreshTokenData refreshes a token using the appropriate method based on auth type.
+// This mirrors the logic in kiro_executor.Refresh for consistency.
+func (h *OAuthWebHandler) refreshTokenData(ctx context.Context, storage *KiroTokenStorage) (*KiroTokenData, error) {
+	ssoClient := NewSSOOIDCClient(h.cfg)
+
+	switch {
+	case storage.ClientID != "" && storage.ClientSecret != "" && storage.AuthMethod == "idc" && storage.Region != "":
+		// IDC refresh with region-specific endpoint
+		log.Debugf("OAuth Web: using SSO OIDC refresh for IDC (region=%s)", storage.Region)
+		return ssoClient.RefreshTokenWithRegion(ctx, storage.ClientID, storage.ClientSecret, storage.RefreshToken, storage.Region, storage.StartURL)
+
+	case storage.ClientID != "" && storage.ClientSecret != "" && storage.AuthMethod == "builder-id":
+		// Builder ID refresh with default endpoint
+		log.Debugf("OAuth Web: using SSO OIDC refresh for AWS Builder ID")
+		return ssoClient.RefreshToken(ctx, storage.ClientID, storage.ClientSecret, storage.RefreshToken)
+
+	default:
+		// Fallback to Kiro's OAuth refresh endpoint (for social auth: Google/GitHub)
+		log.Debugf("OAuth Web: using Kiro OAuth refresh endpoint")
+		oauth := NewKiroOAuth(h.cfg)
+		return oauth.RefreshToken(ctx, storage.RefreshToken)
+	}
+}
--- a/internal/auth/kiro/oauth_web_templates.go
+++ b/internal/auth/kiro/oauth_web_templates.go
@@ -0,0 +1,779 @@
+// Package kiro provides OAuth Web authentication templates.
+package kiro
+
+const (
+	oauthWebStartPageHTML = `<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>AWS SSO Authentication</title>
+    <style>
+        * { box-sizing: border-box; }
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            margin: 0;
+            padding: 20px;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+            display: flex;
+            justify-content: center;
+            align-items: center;
+        }
+        .container {
+            max-width: 500px;
+            width: 100%;
+            background: #fff;
+            padding: 40px;
+            border-radius: 12px;
+            box-shadow: 0 10px 40px rgba(0,0,0,0.2);
+        }
+        h1 {
+            margin: 0 0 10px;
+            color: #333;
+            font-size: 24px;
+            text-align: center;
+        }
+        .subtitle {
+            text-align: center;
+            color: #666;
+            margin-bottom: 30px;
+        }
+        .step {
+            background: #f8f9fa;
+            padding: 20px;
+            border-radius: 8px;
+            margin-bottom: 15px;
+        }
+        .step-title {
+            display: flex;
+            align-items: center;
+            font-weight: 600;
+            color: #333;
+            margin-bottom: 10px;
+        }
+        .step-number {
+            width: 28px;
+            height: 28px;
+            background: #667eea;
+            color: white;
+            border-radius: 50%;
+            display: flex;
+            align-items: center;
+            justify-content: center;
+            font-size: 14px;
+            margin-right: 12px;
+        }
+        .user-code {
+            background: #e7f3ff;
+            border: 2px dashed #2196F3;
+            border-radius: 8px;
+            padding: 20px;
+            text-align: center;
+            margin-top: 10px;
+        }
+        .user-code-label {
+            font-size: 12px;
+            color: #666;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+            margin-bottom: 8px;
+        }
+        .user-code-value {
+            font-size: 32px;
+            font-weight: bold;
+            font-family: monospace;
+            color: #2196F3;
+            letter-spacing: 4px;
+        }
+        .auth-btn {
+            display: block;
+            width: 100%;
+            padding: 15px;
+            background: #667eea;
+            color: white;
+            text-align: center;
+            text-decoration: none;
+            border-radius: 8px;
+            font-weight: 600;
+            font-size: 16px;
+            transition: all 0.3s;
+            border: none;
+            cursor: pointer;
+            margin-top: 20px;
+        }
+        .auth-btn:hover {
+            background: #5568d3;
+            transform: translateY(-2px);
+            box-shadow: 0 4px 12px rgba(102, 126, 234, 0.4);
+        }
+        .status {
+            margin-top: 30px;
+            padding: 20px;
+            background: #f8f9fa;
+            border-radius: 8px;
+            text-align: center;
+        }
+        .status-pending { border-left: 4px solid #ffc107; }
+        .status-success { border-left: 4px solid #28a745; }
+        .status-failed { border-left: 4px solid #dc3545; }
+        .spinner {
+            border: 3px solid #f3f3f3;
+            border-top: 3px solid #667eea;
+            border-radius: 50%;
+            width: 40px;
+            height: 40px;
+            animation: spin 1s linear infinite;
+            margin: 0 auto 15px;
+        }
+        @keyframes spin {
+            0% { transform: rotate(0deg); }
+            100% { transform: rotate(360deg); }
+        }
+        .timer {
+            font-size: 24px;
+            font-weight: bold;
+            color: #667eea;
+            margin: 10px 0;
+        }
+        .timer.warning { color: #ffc107; }
+        .timer.danger { color: #dc3545; }
+        .status-message { color: #666; line-height: 1.6; }
+        .success-icon, .error-icon { font-size: 48px; margin-bottom: 15px; }
+        .info-box {
+            background: #e7f3ff;
+            border-left: 4px solid #2196F3;
+            padding: 15px;
+            margin-top: 20px;
+            border-radius: 4px;
+            font-size: 14px;
+            color: #666;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1>🔐 AWS SSO Authentication</h1>
+        <p class="subtitle">Follow the steps below to complete authentication</p>
+        
+        <div class="step">
+            <div class="step-title">
+                <span class="step-number">1</span>
+                Click the button below to open the authorization page
+            </div>
+            <a href="{{.AuthURL}}" target="_blank" class="auth-btn" id="authBtn">
+                🚀 Open Authorization Page
+            </a>
+        </div>
+        
+        <div class="step">
+            <div class="step-title">
+                <span class="step-number">2</span>
+                Enter the verification code below
+            </div>
+            <div class="user-code">
+                <div class="user-code-label">Verification Code</div>
+                <div class="user-code-value">{{.UserCode}}</div>
+            </div>
+        </div>
+        
+        <div class="step">
+            <div class="step-title">
+                <span class="step-number">3</span>
+                Complete AWS SSO login
+            </div>
+            <p style="color: #666; font-size: 14px; margin-top: 10px;">
+                Use your AWS SSO account to login and authorize
+            </p>
+        </div>
+        
+        <div class="status status-pending" id="statusBox">
+            <div class="spinner" id="spinner"></div>
+            <div class="timer" id="timer">{{.ExpiresIn}}s</div>
+            <div class="status-message" id="statusMessage">
+                Waiting for authorization...
+            </div>
+        </div>
+        
+        <div class="info-box">
+            💡 <strong>Tip:</strong> The authorization page will open in a new tab. This page will automatically update once authorization is complete.
+        </div>
+    </div>
+    
+    <script>
+        let pollInterval;
+        let timerInterval;
+        let remainingSeconds = {{.ExpiresIn}};
+        const stateID = "{{.StateID}}";
+        
+        setTimeout(() => {
+            document.getElementById('authBtn').click();
+        }, 500);
+        
+        function pollStatus() {
+            fetch('/v0/oauth/kiro/status?state=' + stateID)
+                .then(response => response.json())
+                .then(data => {
+                    console.log('Status:', data);
+                    if (data.status === 'success') {
+                        clearInterval(pollInterval);
+                        clearInterval(timerInterval);
+                        showSuccess(data);
+                    } else if (data.status === 'failed') {
+                        clearInterval(pollInterval);
+                        clearInterval(timerInterval);
+                        showError(data);
+                    } else {
+                        remainingSeconds = data.remaining_seconds || 0;
+                    }
+                })
+                .catch(error => {
+                    console.error('Poll error:', error);
+                });
+        }
+        
+        function updateTimer() {
+            const timerEl = document.getElementById('timer');
+            const minutes = Math.floor(remainingSeconds / 60);
+            const seconds = remainingSeconds % 60;
+            timerEl.textContent = minutes + ':' + seconds.toString().padStart(2, '0');
+            
+            if (remainingSeconds < 60) {
+                timerEl.className = 'timer danger';
+            } else if (remainingSeconds < 180) {
+                timerEl.className = 'timer warning';
+            } else {
+                timerEl.className = 'timer';
+            }
+            
+            remainingSeconds--;
+            
+            if (remainingSeconds < 0) {
+                clearInterval(timerInterval);
+                clearInterval(pollInterval);
+                showError({ error: 'Authentication timed out. Please refresh and try again.' });
+            }
+        }
+        
+        function showSuccess(data) {
+            const statusBox = document.getElementById('statusBox');
+            statusBox.className = 'status status-success';
+            statusBox.innerHTML = '<div class="success-icon">✅</div>' +
+                '<div class="status-message">' +
+                '<strong>Authentication Successful!</strong><br>' +
+                'Token expires: ' + new Date(data.expires_at).toLocaleString() +
+                '</div>';
+        }
+        
+        function showError(data) {
+            const statusBox = document.getElementById('statusBox');
+            statusBox.className = 'status status-failed';
+            statusBox.innerHTML = '<div class="error-icon">❌</div>' +
+                '<div class="status-message">' +
+                '<strong>Authentication Failed</strong><br>' +
+                (data.error || 'Unknown error') +
+                '</div>' +
+                '<button class="auth-btn" onclick="location.reload()" style="margin-top: 15px;">' +
+                '🔄 Retry' +
+                '</button>';
+        }
+        
+        pollInterval = setInterval(pollStatus, 3000);
+        timerInterval = setInterval(updateTimer, 1000);
+        pollStatus();
+    </script>
+</body>
+</html>`
+
+	oauthWebErrorPageHTML = `<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Authentication Failed</title>
+    <style>
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            max-width: 600px;
+            margin: 50px auto;
+            padding: 20px;
+            background: #f5f5f5;
+        }
+        .error {
+            background: #fff;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            border-left: 4px solid #dc3545;
+        }
+        h1 { color: #dc3545; margin-top: 0; }
+        .error-message { color: #666; line-height: 1.6; }
+        .retry-btn {
+            display: inline-block;
+            margin-top: 20px;
+            padding: 10px 20px;
+            background: #007bff;
+            color: white;
+            text-decoration: none;
+            border-radius: 4px;
+        }
+        .retry-btn:hover { background: #0056b3; }
+    </style>
+</head>
+<body>
+    <div class="error">
+        <h1>❌ Authentication Failed</h1>
+        <div class="error-message">
+            <p><strong>Error:</strong></p>
+            <p>{{.Error}}</p>
+        </div>
+        <a href="/v0/oauth/kiro/start" class="retry-btn">🔄 Retry</a>
+    </div>
+</body>
+</html>`
+
+	oauthWebSuccessPageHTML = `<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Authentication Successful</title>
+    <style>
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            max-width: 600px;
+            margin: 50px auto;
+            padding: 20px;
+            background: #f5f5f5;
+        }
+        .success {
+            background: #fff;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            border-left: 4px solid #28a745;
+            text-align: center;
+        }
+        h1 { color: #28a745; margin-top: 0; }
+        .success-message { color: #666; line-height: 1.6; }
+        .icon { font-size: 48px; margin-bottom: 15px; }
+        .expires { font-size: 14px; color: #999; margin-top: 15px; }
+    </style>
+</head>
+<body>
+    <div class="success">
+        <div class="icon">✅</div>
+        <h1>Authentication Successful!</h1>
+        <div class="success-message">
+            <p>You can close this window.</p>
+        </div>
+        <div class="expires">Token expires: {{.ExpiresAt}}</div>
+    </div>
+</body>
+</html>`
+
+	oauthWebSelectPageHTML = `<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Select Authentication Method</title>
+    <style>
+        * { box-sizing: border-box; }
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            margin: 0;
+            padding: 20px;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+            display: flex;
+            justify-content: center;
+            align-items: center;
+        }
+        .container {
+            max-width: 500px;
+            width: 100%;
+            background: #fff;
+            padding: 40px;
+            border-radius: 12px;
+            box-shadow: 0 10px 40px rgba(0,0,0,0.2);
+        }
+        h1 {
+            margin: 0 0 10px;
+            color: #333;
+            font-size: 24px;
+            text-align: center;
+        }
+        .subtitle {
+            text-align: center;
+            color: #666;
+            margin-bottom: 30px;
+        }
+        .auth-methods {
+            display: flex;
+            flex-direction: column;
+            gap: 15px;
+        }
+        .auth-btn {
+            display: flex;
+            align-items: center;
+            width: 100%;
+            padding: 15px 20px;
+            background: #667eea;
+            color: white;
+            text-decoration: none;
+            border-radius: 8px;
+            font-weight: 600;
+            font-size: 16px;
+            transition: all 0.3s;
+            border: none;
+            cursor: pointer;
+        }
+        .auth-btn:hover {
+            background: #5568d3;
+            transform: translateY(-2px);
+            box-shadow: 0 4px 12px rgba(102, 126, 234, 0.4);
+        }
+        .auth-btn .icon {
+            font-size: 24px;
+            margin-right: 15px;
+            width: 32px;
+            text-align: center;
+        }
+        .auth-btn.google { background: #4285F4; }
+        .auth-btn.google:hover { background: #3367D6; }
+        .auth-btn.github { background: #24292e; }
+        .auth-btn.github:hover { background: #1a1e22; }
+        .auth-btn.aws { background: #FF9900; }
+        .auth-btn.aws:hover { background: #E68A00; }
+        .auth-btn.idc { background: #232F3E; }
+        .auth-btn.idc:hover { background: #1a242f; }
+        .idc-form {
+            background: #f8f9fa;
+            padding: 20px;
+            border-radius: 8px;
+            margin-top: 15px;
+            display: none;
+        }
+        .idc-form.show {
+            display: block;
+        }
+        .form-group {
+            margin-bottom: 15px;
+        }
+        .form-group label {
+            display: block;
+            font-weight: 600;
+            color: #333;
+            margin-bottom: 8px;
+            font-size: 14px;
+        }
+        .form-group input {
+            width: 100%;
+            padding: 12px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            font-size: 14px;
+            transition: border-color 0.3s;
+        }
+        .form-group input:focus {
+            outline: none;
+            border-color: #667eea;
+        }
+        .form-group .hint {
+            font-size: 12px;
+            color: #999;
+            margin-top: 5px;
+        }
+        .submit-btn {
+            display: block;
+            width: 100%;
+            padding: 15px;
+            background: #232F3E;
+            color: white;
+            text-align: center;
+            text-decoration: none;
+            border-radius: 8px;
+            font-weight: 600;
+            font-size: 16px;
+            transition: all 0.3s;
+            border: none;
+            cursor: pointer;
+        }
+        .submit-btn:hover {
+            background: #1a242f;
+            transform: translateY(-2px);
+            box-shadow: 0 4px 12px rgba(35, 47, 62, 0.4);
+        }
+        .divider {
+            display: flex;
+            align-items: center;
+            margin: 20px 0;
+        }
+        .divider::before,
+        .divider::after {
+            content: "";
+            flex: 1;
+            border-bottom: 1px solid #e0e0e0;
+        }
+        .divider span {
+            padding: 0 15px;
+            color: #999;
+            font-size: 14px;
+        }
+        .info-box {
+            background: #e7f3ff;
+            border-left: 4px solid #2196F3;
+            padding: 15px;
+            margin-top: 20px;
+            border-radius: 4px;
+            font-size: 14px;
+            color: #666;
+        }
+        .warning-box {
+            background: #fff3cd;
+            border-left: 4px solid #ffc107;
+            padding: 15px;
+            margin-top: 20px;
+            border-radius: 4px;
+            font-size: 14px;
+            color: #856404;
+        }
+        .auth-btn.manual { background: #6c757d; }
+        .auth-btn.manual:hover { background: #5a6268; }
+        .auth-btn.refresh { background: #17a2b8; }
+        .auth-btn.refresh:hover { background: #138496; }
+        .auth-btn.refresh:disabled { background: #7fb3bd; cursor: not-allowed; }
+        .manual-form {
+            background: #f8f9fa;
+            padding: 20px;
+            border-radius: 8px;
+            margin-top: 15px;
+            display: none;
+        }
+        .manual-form.show {
+            display: block;
+        }
+        .form-group textarea {
+            width: 100%;
+            padding: 12px;
+            border: 2px solid #e0e0e0;
+            border-radius: 6px;
+            font-size: 14px;
+            font-family: monospace;
+            transition: border-color 0.3s;
+            resize: vertical;
+            min-height: 80px;
+        }
+        .form-group textarea:focus {
+            outline: none;
+            border-color: #667eea;
+        }
+        .status-message {
+            padding: 15px;
+            border-radius: 6px;
+            margin-top: 15px;
+            display: none;
+        }
+        .status-message.success {
+            background: #d4edda;
+            color: #155724;
+            display: block;
+        }
+        .status-message.error {
+            background: #f8d7da;
+            color: #721c24;
+            display: block;
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1>🔐 Select Authentication Method</h1>
+        <p class="subtitle">Choose how you want to authenticate with Kiro</p>
+        
+        <div class="auth-methods">
+            <a href="/v0/oauth/kiro/start?method=builder-id" class="auth-btn aws">
+                <span class="icon">🔶</span>
+                AWS Builder ID (Recommended)
+            </a>
+            
+            <button type="button" class="auth-btn idc" onclick="toggleIdcForm()">
+                <span class="icon">🏢</span>
+                AWS Identity Center (IDC)
+            </button>
+            
+            <div class="divider"><span>or</span></div>
+            
+            <button type="button" class="auth-btn manual" onclick="toggleManualForm()">
+                <span class="icon">📋</span>
+                Import RefreshToken from Kiro IDE
+            </button>
+            
+            <button type="button" class="auth-btn refresh" onclick="manualRefresh()" id="refreshBtn">
+                <span class="icon">🔄</span>
+                Manual Refresh All Tokens
+            </button>
+            
+            <div class="status-message" id="refreshStatus"></div>
+        </div>
+        
+        <div class="idc-form" id="idcForm">
+            <form action="/v0/oauth/kiro/start" method="get">
+                <input type="hidden" name="method" value="idc">
+                
+                <div class="form-group">
+                    <label for="startUrl">Start URL</label>
+                    <input type="url" id="startUrl" name="startUrl" placeholder="https://your-org.awsapps.com/start" required>
+                    <div class="hint">Your AWS Identity Center Start URL</div>
+                </div>
+                
+                <div class="form-group">
+                    <label for="region">Region</label>
+                    <input type="text" id="region" name="region" value="us-east-1" placeholder="us-east-1">
+                    <div class="hint">AWS Region for your Identity Center</div>
+                </div>
+                
+                <button type="submit" class="submit-btn">
+                    🚀 Continue with IDC
+                </button>
+            </form>
+        </div>
+        
+        <div class="manual-form" id="manualForm">
+            <form id="importForm" onsubmit="submitImport(event)">
+                <div class="form-group">
+                    <label for="refreshToken">Refresh Token</label>
+                    <textarea id="refreshToken" name="refreshToken" placeholder="Paste your refreshToken here (starts with aorAAAAAG...)" required></textarea>
+                    <div class="hint">Copy from Kiro IDE: ~/.kiro/kiro-auth-token.json → refreshToken field</div>
+                </div>
+                
+                <button type="submit" class="submit-btn" id="importBtn">
+                    📥 Import Token
+                </button>
+                
+                <div class="status-message" id="importStatus"></div>
+            </form>
+        </div>
+        
+        <div class="warning-box">
+            ⚠️ <strong>Note:</strong> Google and GitHub login are not available for third-party applications due to AWS Cognito restrictions. Please use AWS Builder ID or import your token from Kiro IDE.
+        </div>
+        
+        <div class="info-box">
+            💡 <strong>How to get RefreshToken:</strong><br>
+            1. Open Kiro IDE and login with Google/GitHub<br>
+            2. Find the token file: <code>~/.kiro/kiro-auth-token.json</code><br>
+            3. Copy the <code>refreshToken</code> value and paste it above
+        </div>
+    </div>
+    
+    <script>
+        function toggleIdcForm() {
+            const idcForm = document.getElementById('idcForm');
+            const manualForm = document.getElementById('manualForm');
+            manualForm.classList.remove('show');
+            idcForm.classList.toggle('show');
+            if (idcForm.classList.contains('show')) {
+                document.getElementById('startUrl').focus();
+            }
+        }
+        
+        function toggleManualForm() {
+            const idcForm = document.getElementById('idcForm');
+            const manualForm = document.getElementById('manualForm');
+            idcForm.classList.remove('show');
+            manualForm.classList.toggle('show');
+            if (manualForm.classList.contains('show')) {
+                document.getElementById('refreshToken').focus();
+            }
+        }
+        
+        async function submitImport(event) {
+            event.preventDefault();
+            const refreshToken = document.getElementById('refreshToken').value.trim();
+            const statusEl = document.getElementById('importStatus');
+            const btn = document.getElementById('importBtn');
+            
+            if (!refreshToken) {
+                statusEl.className = 'status-message error';
+                statusEl.textContent = 'Please enter a refresh token';
+                return;
+            }
+            
+            if (!refreshToken.startsWith('aorAAAAAG')) {
+                statusEl.className = 'status-message error';
+                statusEl.textContent = 'Invalid token format. Token should start with aorAAAAAG...';
+                return;
+            }
+            
+            btn.disabled = true;
+            btn.textContent = '⏳ Importing...';
+            statusEl.className = 'status-message';
+            statusEl.style.display = 'none';
+            
+            try {
+                const response = await fetch('/v0/oauth/kiro/import', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' },
+                    body: JSON.stringify({ refreshToken: refreshToken })
+                });
+                
+                const data = await response.json();
+                
+                if (response.ok && data.success) {
+                    statusEl.className = 'status-message success';
+                    statusEl.textContent = '✅ Token imported successfully! File: ' + (data.fileName || 'kiro-token.json');
+                } else {
+                    statusEl.className = 'status-message error';
+                    statusEl.textContent = '❌ ' + (data.error || data.message || 'Import failed');
+                }
+            } catch (error) {
+                statusEl.className = 'status-message error';
+                statusEl.textContent = '❌ Network error: ' + error.message;
+            } finally {
+                btn.disabled = false;
+                btn.textContent = '📥 Import Token';
+            }
+        }
+        
+        async function manualRefresh() {
+            const btn = document.getElementById('refreshBtn');
+            const statusEl = document.getElementById('refreshStatus');
+            
+            btn.disabled = true;
+            btn.innerHTML = '<span class="icon">⏳</span> Refreshing...';
+            statusEl.className = 'status-message';
+            statusEl.style.display = 'none';
+            
+            try {
+                const response = await fetch('/v0/oauth/kiro/refresh', {
+                    method: 'POST',
+                    headers: { 'Content-Type': 'application/json' }
+                });
+                
+                const data = await response.json();
+                
+                if (response.ok && data.success) {
+                    statusEl.className = 'status-message success';
+                    let msg = '✅ ' + data.message;
+                    if (data.warnings && data.warnings.length > 0) {
+                        msg += ' (Warnings: ' + data.warnings.join('; ') + ')';
+                    }
+                    statusEl.textContent = msg;
+                } else {
+                    statusEl.className = 'status-message error';
+                    statusEl.textContent = '❌ ' + (data.error || data.message || 'Refresh failed');
+                }
+            } catch (error) {
+                statusEl.className = 'status-message error';
+                statusEl.textContent = '❌ Network error: ' + error.message;
+            } finally {
+                btn.disabled = false;
+                btn.innerHTML = '<span class="icon">🔄</span> Manual Refresh All Tokens';
+            }
+        }
+    </script>
+</body>
+</html>`
+)
--- a/internal/auth/kiro/rate_limiter.go
+++ b/internal/auth/kiro/rate_limiter.go
@@ -0,0 +1,316 @@
+package kiro
+
+import (
+	"math"
+	"math/rand"
+	"strings"
+	"sync"
+	"time"
+)
+
+const (
+	DefaultMinTokenInterval  = 10 * time.Second
+	DefaultMaxTokenInterval  = 30 * time.Second
+	DefaultDailyMaxRequests  = 500
+	DefaultJitterPercent     = 0.3
+	DefaultBackoffBase       = 2 * time.Minute
+	DefaultBackoffMax        = 60 * time.Minute
+	DefaultBackoffMultiplier = 2.0
+	DefaultSuspendCooldown   = 24 * time.Hour
+)
+
+// TokenState Token 状态
+type TokenState struct {
+	LastRequest    time.Time
+	RequestCount   int
+	CooldownEnd    time.Time
+	FailCount      int
+	DailyRequests  int
+	DailyResetTime time.Time
+	IsSuspended    bool
+	SuspendedAt    time.Time
+	SuspendReason  string
+}
+
+// RateLimiter 频率限制器
+type RateLimiter struct {
+	mu                sync.RWMutex
+	states            map[string]*TokenState
+	minTokenInterval  time.Duration
+	maxTokenInterval  time.Duration
+	dailyMaxRequests  int
+	jitterPercent     float64
+	backoffBase       time.Duration
+	backoffMax        time.Duration
+	backoffMultiplier float64
+	suspendCooldown   time.Duration
+	rng               *rand.Rand
+}
+
+// NewRateLimiter 创建默认配置的频率限制器
+func NewRateLimiter() *RateLimiter {
+	return &RateLimiter{
+		states:            make(map[string]*TokenState),
+		minTokenInterval:  DefaultMinTokenInterval,
+		maxTokenInterval:  DefaultMaxTokenInterval,
+		dailyMaxRequests:  DefaultDailyMaxRequests,
+		jitterPercent:     DefaultJitterPercent,
+		backoffBase:       DefaultBackoffBase,
+		backoffMax:        DefaultBackoffMax,
+		backoffMultiplier: DefaultBackoffMultiplier,
+		suspendCooldown:   DefaultSuspendCooldown,
+		rng:               rand.New(rand.NewSource(time.Now().UnixNano())),
+	}
+}
+
+// RateLimiterConfig 频率限制器配置
+type RateLimiterConfig struct {
+	MinTokenInterval  time.Duration
+	MaxTokenInterval  time.Duration
+	DailyMaxRequests  int
+	JitterPercent     float64
+	BackoffBase       time.Duration
+	BackoffMax        time.Duration
+	BackoffMultiplier float64
+	SuspendCooldown   time.Duration
+}
+
+// NewRateLimiterWithConfig 使用自定义配置创建频率限制器
+func NewRateLimiterWithConfig(cfg RateLimiterConfig) *RateLimiter {
+	rl := NewRateLimiter()
+	if cfg.MinTokenInterval > 0 {
+		rl.minTokenInterval = cfg.MinTokenInterval
+	}
+	if cfg.MaxTokenInterval > 0 {
+		rl.maxTokenInterval = cfg.MaxTokenInterval
+	}
+	if cfg.DailyMaxRequests > 0 {
+		rl.dailyMaxRequests = cfg.DailyMaxRequests
+	}
+	if cfg.JitterPercent > 0 {
+		rl.jitterPercent = cfg.JitterPercent
+	}
+	if cfg.BackoffBase > 0 {
+		rl.backoffBase = cfg.BackoffBase
+	}
+	if cfg.BackoffMax > 0 {
+		rl.backoffMax = cfg.BackoffMax
+	}
+	if cfg.BackoffMultiplier > 0 {
+		rl.backoffMultiplier = cfg.BackoffMultiplier
+	}
+	if cfg.SuspendCooldown > 0 {
+		rl.suspendCooldown = cfg.SuspendCooldown
+	}
+	return rl
+}
+
+// getOrCreateState 获取或创建 Token 状态
+func (rl *RateLimiter) getOrCreateState(tokenKey string) *TokenState {
+	state, exists := rl.states[tokenKey]
+	if !exists {
+		state = &TokenState{
+			DailyResetTime: time.Now().Truncate(24 * time.Hour).Add(24 * time.Hour),
+		}
+		rl.states[tokenKey] = state
+	}
+	return state
+}
+
+// resetDailyIfNeeded 如果需要则重置每日计数
+func (rl *RateLimiter) resetDailyIfNeeded(state *TokenState) {
+	now := time.Now()
+	if now.After(state.DailyResetTime) {
+		state.DailyRequests = 0
+		state.DailyResetTime = now.Truncate(24 * time.Hour).Add(24 * time.Hour)
+	}
+}
+
+// calculateInterval 计算带抖动的随机间隔
+func (rl *RateLimiter) calculateInterval() time.Duration {
+	baseInterval := rl.minTokenInterval + time.Duration(rl.rng.Int63n(int64(rl.maxTokenInterval-rl.minTokenInterval)))
+	jitter := time.Duration(float64(baseInterval) * rl.jitterPercent * (rl.rng.Float64()*2 - 1))
+	return baseInterval + jitter
+}
+
+// WaitForToken 等待 Token 可用（带抖动的随机间隔）
+func (rl *RateLimiter) WaitForToken(tokenKey string) {
+	rl.mu.Lock()
+	state := rl.getOrCreateState(tokenKey)
+	rl.resetDailyIfNeeded(state)
+
+	now := time.Now()
+
+	// 检查是否在冷却期
+	if now.Before(state.CooldownEnd) {
+		waitTime := state.CooldownEnd.Sub(now)
+		rl.mu.Unlock()
+		time.Sleep(waitTime)
+		rl.mu.Lock()
+		state = rl.getOrCreateState(tokenKey)
+		now = time.Now()
+	}
+
+	// 计算距离上次请求的间隔
+	interval := rl.calculateInterval()
+	nextAllowedTime := state.LastRequest.Add(interval)
+
+	if now.Before(nextAllowedTime) {
+		waitTime := nextAllowedTime.Sub(now)
+		rl.mu.Unlock()
+		time.Sleep(waitTime)
+		rl.mu.Lock()
+		state = rl.getOrCreateState(tokenKey)
+	}
+
+	state.LastRequest = time.Now()
+	state.RequestCount++
+	state.DailyRequests++
+	rl.mu.Unlock()
+}
+
+// MarkTokenFailed 标记 Token 失败
+func (rl *RateLimiter) MarkTokenFailed(tokenKey string) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+
+	state := rl.getOrCreateState(tokenKey)
+	state.FailCount++
+	state.CooldownEnd = time.Now().Add(rl.calculateBackoff(state.FailCount))
+}
+
+// MarkTokenSuccess 标记 Token 成功
+func (rl *RateLimiter) MarkTokenSuccess(tokenKey string) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+
+	state := rl.getOrCreateState(tokenKey)
+	state.FailCount = 0
+	state.CooldownEnd = time.Time{}
+}
+
+// CheckAndMarkSuspended 检测暂停错误并标记
+func (rl *RateLimiter) CheckAndMarkSuspended(tokenKey string, errorMsg string) bool {
+	suspendKeywords := []string{
+		"suspended",
+		"banned",
+		"disabled",
+		"account has been",
+		"access denied",
+		"rate limit exceeded",
+		"too many requests",
+		"quota exceeded",
+	}
+
+	lowerMsg := strings.ToLower(errorMsg)
+	for _, keyword := range suspendKeywords {
+		if strings.Contains(lowerMsg, keyword) {
+			rl.mu.Lock()
+			defer rl.mu.Unlock()
+
+			state := rl.getOrCreateState(tokenKey)
+			state.IsSuspended = true
+			state.SuspendedAt = time.Now()
+			state.SuspendReason = errorMsg
+			state.CooldownEnd = time.Now().Add(rl.suspendCooldown)
+			return true
+		}
+	}
+	return false
+}
+
+// IsTokenAvailable 检查 Token 是否可用
+func (rl *RateLimiter) IsTokenAvailable(tokenKey string) bool {
+	rl.mu.RLock()
+	defer rl.mu.RUnlock()
+
+	state, exists := rl.states[tokenKey]
+	if !exists {
+		return true
+	}
+
+	now := time.Now()
+
+	// 检查是否被暂停
+	if state.IsSuspended {
+		if now.After(state.SuspendedAt.Add(rl.suspendCooldown)) {
+			return true
+		}
+		return false
+	}
+
+	// 检查是否在冷却期
+	if now.Before(state.CooldownEnd) {
+		return false
+	}
+
+	// 检查每日请求限制
+	rl.mu.RUnlock()
+	rl.mu.Lock()
+	rl.resetDailyIfNeeded(state)
+	dailyRequests := state.DailyRequests
+	dailyMax := rl.dailyMaxRequests
+	rl.mu.Unlock()
+	rl.mu.RLock()
+
+	if dailyRequests >= dailyMax {
+		return false
+	}
+
+	return true
+}
+
+// calculateBackoff 计算指数退避时间
+func (rl *RateLimiter) calculateBackoff(failCount int) time.Duration {
+	if failCount <= 0 {
+		return 0
+	}
+
+	backoff := float64(rl.backoffBase) * math.Pow(rl.backoffMultiplier, float64(failCount-1))
+
+	// 添加抖动
+	jitter := backoff * rl.jitterPercent * (rl.rng.Float64()*2 - 1)
+	backoff += jitter
+
+	if time.Duration(backoff) > rl.backoffMax {
+		return rl.backoffMax
+	}
+	return time.Duration(backoff)
+}
+
+// GetTokenState 获取 Token 状态（只读）
+func (rl *RateLimiter) GetTokenState(tokenKey string) *TokenState {
+	rl.mu.RLock()
+	defer rl.mu.RUnlock()
+
+	state, exists := rl.states[tokenKey]
+	if !exists {
+		return nil
+	}
+
+	// 返回副本以防止外部修改
+	stateCopy := *state
+	return &stateCopy
+}
+
+// ClearTokenState 清除 Token 状态
+func (rl *RateLimiter) ClearTokenState(tokenKey string) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+	delete(rl.states, tokenKey)
+}
+
+// ResetSuspension 重置暂停状态
+func (rl *RateLimiter) ResetSuspension(tokenKey string) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+
+	state, exists := rl.states[tokenKey]
+	if exists {
+		state.IsSuspended = false
+		state.SuspendedAt = time.Time{}
+		state.SuspendReason = ""
+		state.CooldownEnd = time.Time{}
+		state.FailCount = 0
+	}
+}
--- a/internal/auth/kiro/rate_limiter_singleton.go
+++ b/internal/auth/kiro/rate_limiter_singleton.go
@@ -0,0 +1,46 @@
+package kiro
+
+import (
+	"sync"
+	"time"
+
+	log "github.com/sirupsen/logrus"
+)
+
+var (
+	globalRateLimiter     *RateLimiter
+	globalRateLimiterOnce sync.Once
+
+	globalCooldownManager     *CooldownManager
+	globalCooldownManagerOnce sync.Once
+	cooldownStopCh            chan struct{}
+)
+
+// GetGlobalRateLimiter returns the singleton RateLimiter instance.
+func GetGlobalRateLimiter() *RateLimiter {
+	globalRateLimiterOnce.Do(func() {
+		globalRateLimiter = NewRateLimiter()
+		log.Info("kiro: global RateLimiter initialized")
+	})
+	return globalRateLimiter
+}
+
+// GetGlobalCooldownManager returns the singleton CooldownManager instance.
+func GetGlobalCooldownManager() *CooldownManager {
+	globalCooldownManagerOnce.Do(func() {
+		globalCooldownManager = NewCooldownManager()
+		cooldownStopCh = make(chan struct{})
+		go globalCooldownManager.StartCleanupRoutine(5*time.Minute, cooldownStopCh)
+		log.Info("kiro: global CooldownManager initialized with cleanup routine")
+	})
+	return globalCooldownManager
+}
+
+// ShutdownRateLimiters stops the cooldown cleanup routine.
+// Should be called during application shutdown.
+func ShutdownRateLimiters() {
+	if cooldownStopCh != nil {
+		close(cooldownStopCh)
+		log.Info("kiro: rate limiter cleanup routine stopped")
+	}
+}
--- a/internal/auth/kiro/rate_limiter_test.go
+++ b/internal/auth/kiro/rate_limiter_test.go
@@ -0,0 +1,304 @@
+package kiro
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNewRateLimiter(t *testing.T) {
+	rl := NewRateLimiter()
+	if rl == nil {
+		t.Fatal("expected non-nil RateLimiter")
+	}
+	if rl.states == nil {
+		t.Error("expected non-nil states map")
+	}
+	if rl.minTokenInterval != DefaultMinTokenInterval {
+		t.Errorf("expected minTokenInterval %v, got %v", DefaultMinTokenInterval, rl.minTokenInterval)
+	}
+	if rl.maxTokenInterval != DefaultMaxTokenInterval {
+		t.Errorf("expected maxTokenInterval %v, got %v", DefaultMaxTokenInterval, rl.maxTokenInterval)
+	}
+	if rl.dailyMaxRequests != DefaultDailyMaxRequests {
+		t.Errorf("expected dailyMaxRequests %d, got %d", DefaultDailyMaxRequests, rl.dailyMaxRequests)
+	}
+}
+
+func TestNewRateLimiterWithConfig(t *testing.T) {
+	cfg := RateLimiterConfig{
+		MinTokenInterval:  5 * time.Second,
+		MaxTokenInterval:  15 * time.Second,
+		DailyMaxRequests:  100,
+		JitterPercent:     0.2,
+		BackoffBase:       1 * time.Minute,
+		BackoffMax:        30 * time.Minute,
+		BackoffMultiplier: 1.5,
+		SuspendCooldown:   12 * time.Hour,
+	}
+
+	rl := NewRateLimiterWithConfig(cfg)
+	if rl.minTokenInterval != 5*time.Second {
+		t.Errorf("expected minTokenInterval 5s, got %v", rl.minTokenInterval)
+	}
+	if rl.maxTokenInterval != 15*time.Second {
+		t.Errorf("expected maxTokenInterval 15s, got %v", rl.maxTokenInterval)
+	}
+	if rl.dailyMaxRequests != 100 {
+		t.Errorf("expected dailyMaxRequests 100, got %d", rl.dailyMaxRequests)
+	}
+}
+
+func TestNewRateLimiterWithConfig_PartialConfig(t *testing.T) {
+	cfg := RateLimiterConfig{
+		MinTokenInterval: 5 * time.Second,
+	}
+
+	rl := NewRateLimiterWithConfig(cfg)
+	if rl.minTokenInterval != 5*time.Second {
+		t.Errorf("expected minTokenInterval 5s, got %v", rl.minTokenInterval)
+	}
+	if rl.maxTokenInterval != DefaultMaxTokenInterval {
+		t.Errorf("expected default maxTokenInterval, got %v", rl.maxTokenInterval)
+	}
+}
+
+func TestGetTokenState_NonExistent(t *testing.T) {
+	rl := NewRateLimiter()
+	state := rl.GetTokenState("nonexistent")
+	if state != nil {
+		t.Error("expected nil state for non-existent token")
+	}
+}
+
+func TestIsTokenAvailable_NewToken(t *testing.T) {
+	rl := NewRateLimiter()
+	if !rl.IsTokenAvailable("newtoken") {
+		t.Error("expected new token to be available")
+	}
+}
+
+func TestMarkTokenFailed(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.MarkTokenFailed("token1")
+
+	state := rl.GetTokenState("token1")
+	if state == nil {
+		t.Fatal("expected non-nil state")
+	}
+	if state.FailCount != 1 {
+		t.Errorf("expected FailCount 1, got %d", state.FailCount)
+	}
+	if state.CooldownEnd.IsZero() {
+		t.Error("expected non-zero CooldownEnd")
+	}
+}
+
+func TestMarkTokenSuccess(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.MarkTokenFailed("token1")
+	rl.MarkTokenFailed("token1")
+	rl.MarkTokenSuccess("token1")
+
+	state := rl.GetTokenState("token1")
+	if state == nil {
+		t.Fatal("expected non-nil state")
+	}
+	if state.FailCount != 0 {
+		t.Errorf("expected FailCount 0, got %d", state.FailCount)
+	}
+	if !state.CooldownEnd.IsZero() {
+		t.Error("expected zero CooldownEnd after success")
+	}
+}
+
+func TestCheckAndMarkSuspended_Suspended(t *testing.T) {
+	rl := NewRateLimiter()
+
+	testCases := []string{
+		"Account has been suspended",
+		"You are banned from this service",
+		"Account disabled",
+		"Access denied permanently",
+		"Rate limit exceeded",
+		"Too many requests",
+		"Quota exceeded for today",
+	}
+
+	for i, msg := range testCases {
+		tokenKey := "token" + string(rune('a'+i))
+		if !rl.CheckAndMarkSuspended(tokenKey, msg) {
+			t.Errorf("expected suspension detected for: %s", msg)
+		}
+		state := rl.GetTokenState(tokenKey)
+		if !state.IsSuspended {
+			t.Errorf("expected IsSuspended true for: %s", msg)
+		}
+	}
+}
+
+func TestCheckAndMarkSuspended_NotSuspended(t *testing.T) {
+	rl := NewRateLimiter()
+
+	normalErrors := []string{
+		"connection timeout",
+		"internal server error",
+		"bad request",
+		"invalid token format",
+	}
+
+	for i, msg := range normalErrors {
+		tokenKey := "token" + string(rune('a'+i))
+		if rl.CheckAndMarkSuspended(tokenKey, msg) {
+			t.Errorf("unexpected suspension for: %s", msg)
+		}
+	}
+}
+
+func TestIsTokenAvailable_Suspended(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.CheckAndMarkSuspended("token1", "Account suspended")
+
+	if rl.IsTokenAvailable("token1") {
+		t.Error("expected suspended token to be unavailable")
+	}
+}
+
+func TestClearTokenState(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.MarkTokenFailed("token1")
+	rl.ClearTokenState("token1")
+
+	state := rl.GetTokenState("token1")
+	if state != nil {
+		t.Error("expected nil state after clear")
+	}
+}
+
+func TestResetSuspension(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.CheckAndMarkSuspended("token1", "Account suspended")
+	rl.ResetSuspension("token1")
+
+	state := rl.GetTokenState("token1")
+	if state.IsSuspended {
+		t.Error("expected IsSuspended false after reset")
+	}
+	if state.FailCount != 0 {
+		t.Errorf("expected FailCount 0, got %d", state.FailCount)
+	}
+}
+
+func TestResetSuspension_NonExistent(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.ResetSuspension("nonexistent")
+}
+
+func TestCalculateBackoff_ZeroFailCount(t *testing.T) {
+	rl := NewRateLimiter()
+	backoff := rl.calculateBackoff(0)
+	if backoff != 0 {
+		t.Errorf("expected 0 backoff for 0 fails, got %v", backoff)
+	}
+}
+
+func TestCalculateBackoff_Exponential(t *testing.T) {
+	cfg := RateLimiterConfig{
+		BackoffBase:       1 * time.Minute,
+		BackoffMax:        60 * time.Minute,
+		BackoffMultiplier: 2.0,
+		JitterPercent:     0.3,
+	}
+	rl := NewRateLimiterWithConfig(cfg)
+
+	backoff1 := rl.calculateBackoff(1)
+	if backoff1 < 40*time.Second || backoff1 > 80*time.Second {
+		t.Errorf("expected ~1min (with jitter) for fail 1, got %v", backoff1)
+	}
+
+	backoff2 := rl.calculateBackoff(2)
+	if backoff2 < 80*time.Second || backoff2 > 160*time.Second {
+		t.Errorf("expected ~2min (with jitter) for fail 2, got %v", backoff2)
+	}
+}
+
+func TestCalculateBackoff_MaxCap(t *testing.T) {
+	cfg := RateLimiterConfig{
+		BackoffBase:       1 * time.Minute,
+		BackoffMax:        10 * time.Minute,
+		BackoffMultiplier: 2.0,
+		JitterPercent:     0,
+	}
+	rl := NewRateLimiterWithConfig(cfg)
+
+	backoff := rl.calculateBackoff(10)
+	if backoff > 10*time.Minute {
+		t.Errorf("expected backoff capped at 10min, got %v", backoff)
+	}
+}
+
+func TestGetTokenState_ReturnsCopy(t *testing.T) {
+	rl := NewRateLimiter()
+	rl.MarkTokenFailed("token1")
+
+	state1 := rl.GetTokenState("token1")
+	state1.FailCount = 999
+
+	state2 := rl.GetTokenState("token1")
+	if state2.FailCount == 999 {
+		t.Error("GetTokenState should return a copy")
+	}
+}
+
+func TestRateLimiter_ConcurrentAccess(t *testing.T) {
+	rl := NewRateLimiter()
+	const numGoroutines = 50
+	const numOperations = 50
+
+	var wg sync.WaitGroup
+	wg.Add(numGoroutines)
+
+	for i := 0; i < numGoroutines; i++ {
+		go func(id int) {
+			defer wg.Done()
+			tokenKey := "token" + string(rune('a'+id%10))
+			for j := 0; j < numOperations; j++ {
+				switch j % 6 {
+				case 0:
+					rl.IsTokenAvailable(tokenKey)
+				case 1:
+					rl.MarkTokenFailed(tokenKey)
+				case 2:
+					rl.MarkTokenSuccess(tokenKey)
+				case 3:
+					rl.GetTokenState(tokenKey)
+				case 4:
+					rl.CheckAndMarkSuspended(tokenKey, "test error")
+				case 5:
+					rl.ResetSuspension(tokenKey)
+				}
+			}
+		}(i)
+	}
+
+	wg.Wait()
+}
+
+func TestCalculateInterval_WithinRange(t *testing.T) {
+	cfg := RateLimiterConfig{
+		MinTokenInterval: 10 * time.Second,
+		MaxTokenInterval: 30 * time.Second,
+		JitterPercent:    0.3,
+	}
+	rl := NewRateLimiterWithConfig(cfg)
+
+	minAllowed := 7 * time.Second
+	maxAllowed := 40 * time.Second
+
+	for i := 0; i < 100; i++ {
+		interval := rl.calculateInterval()
+		if interval < minAllowed || interval > maxAllowed {
+			t.Errorf("interval %v outside expected range [%v, %v]", interval, minAllowed, maxAllowed)
+		}
+	}
+}
--- a/internal/auth/kiro/refresh_manager.go
+++ b/internal/auth/kiro/refresh_manager.go
@@ -0,0 +1,171 @@
+package kiro
+
+import (
+	"context"
+	"sync"
+	"time"
+
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
+	log "github.com/sirupsen/logrus"
+)
+
+// RefreshManager 是后台刷新器的单例管理器
+type RefreshManager struct {
+	mu               sync.Mutex
+	refresher        *BackgroundRefresher
+	ctx              context.Context
+	cancel           context.CancelFunc
+	started          bool
+	onTokenRefreshed func(tokenID string, tokenData *KiroTokenData) // 刷新成功回调
+}
+
+var (
+	globalRefreshManager *RefreshManager
+	managerOnce          sync.Once
+)
+
+// GetRefreshManager 获取全局刷新管理器实例
+func GetRefreshManager() *RefreshManager {
+	managerOnce.Do(func() {
+		globalRefreshManager = &RefreshManager{}
+	})
+	return globalRefreshManager
+}
+
+// Initialize 初始化后台刷新器
+// baseDir: token 文件所在的目录
+// cfg: 应用配置
+func (m *RefreshManager) Initialize(baseDir string, cfg *config.Config) error {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	if m.started {
+		log.Debug("refresh manager: already initialized")
+		return nil
+	}
+
+	if baseDir == "" {
+		log.Warn("refresh manager: base directory not provided, skipping initialization")
+		return nil
+	}
+
+	// 创建 token 存储库
+	repo := NewFileTokenRepository(baseDir)
+
+	// 创建后台刷新器，配置参数
+	opts := []RefresherOption{
+		WithInterval(time.Minute), // 每分钟检查一次
+		WithBatchSize(50),         // 每批最多处理 50 个 token
+		WithConcurrency(10),       // 最多 10 个并发刷新
+		WithConfig(cfg),           // 设置 OAuth 和 SSO 客户端
+	}
+
+	// 如果已设置回调，传递给 BackgroundRefresher
+	if m.onTokenRefreshed != nil {
+		opts = append(opts, WithOnTokenRefreshed(m.onTokenRefreshed))
+	}
+
+	m.refresher = NewBackgroundRefresher(repo, opts...)
+
+	log.Infof("refresh manager: initialized with base directory %s", baseDir)
+	return nil
+}
+
+// Start 启动后台刷新
+func (m *RefreshManager) Start() {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	if m.started {
+		log.Debug("refresh manager: already started")
+		return
+	}
+
+	if m.refresher == nil {
+		log.Warn("refresh manager: not initialized, cannot start")
+		return
+	}
+
+	m.ctx, m.cancel = context.WithCancel(context.Background())
+	m.refresher.Start(m.ctx)
+	m.started = true
+
+	log.Info("refresh manager: background refresh started")
+}
+
+// Stop 停止后台刷新
+func (m *RefreshManager) Stop() {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	if !m.started {
+		return
+	}
+
+	if m.cancel != nil {
+		m.cancel()
+	}
+
+	if m.refresher != nil {
+		m.refresher.Stop()
+	}
+
+	m.started = false
+	log.Info("refresh manager: background refresh stopped")
+}
+
+// IsRunning 检查后台刷新是否正在运行
+func (m *RefreshManager) IsRunning() bool {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	return m.started
+}
+
+// UpdateBaseDir 更新 token 目录（用于运行时配置更改）
+func (m *RefreshManager) UpdateBaseDir(baseDir string) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	if m.refresher != nil && m.refresher.tokenRepo != nil {
+		if repo, ok := m.refresher.tokenRepo.(*FileTokenRepository); ok {
+			repo.SetBaseDir(baseDir)
+			log.Infof("refresh manager: updated base directory to %s", baseDir)
+		}
+	}
+}
+
+// SetOnTokenRefreshed 设置 token 刷新成功后的回调函数
+// 可以在任何时候调用，支持运行时更新回调
+// callback: 回调函数，接收 tokenID（文件名）和新的 token 数据
+func (m *RefreshManager) SetOnTokenRefreshed(callback func(tokenID string, tokenData *KiroTokenData)) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	m.onTokenRefreshed = callback
+
+	// 如果 refresher 已经创建，使用并发安全的方式更新它的回调
+	if m.refresher != nil {
+		m.refresher.callbackMu.Lock()
+		m.refresher.onTokenRefreshed = callback
+		m.refresher.callbackMu.Unlock()
+	}
+
+	log.Debug("refresh manager: token refresh callback registered")
+}
+
+// InitializeAndStart 初始化并启动后台刷新（便捷方法）
+func InitializeAndStart(baseDir string, cfg *config.Config) {
+	manager := GetRefreshManager()
+	if err := manager.Initialize(baseDir, cfg); err != nil {
+		log.Errorf("refresh manager: initialization failed: %v", err)
+		return
+	}
+	manager.Start()
+}
+
+// StopGlobalRefreshManager 停止全局刷新管理器
+func StopGlobalRefreshManager() {
+	if globalRefreshManager != nil {
+		globalRefreshManager.Stop()
+	}
+}
--- a/internal/auth/kiro/social_auth.go
+++ b/internal/auth/kiro/social_auth.go
@@ -9,7 +9,9 @@ import (
 	"encoding/base64"
 	"encoding/json"
 	"fmt"
+	"html"
 	"io"
+	"net"
 	"net/http"
 	"net/url"
 	"os"
@@ -31,6 +33,9 @@ const (

 	// OAuth timeout
 	socialAuthTimeout = 10 * time.Minute
+
+	// Default callback port for social auth HTTP server
+	socialAuthCallbackPort = 9876
 )

 // SocialProvider represents the social login provider.
@@ -67,6 +72,13 @@ type RefreshTokenRequest struct {
 	RefreshToken string `json:"refreshToken"`
 }

+// WebCallbackResult contains the OAuth callback result from HTTP server.
+type WebCallbackResult struct {
+	Code  string
+	State string
+	Error string
+}
+
 // SocialAuthClient handles social authentication with Kiro.
 type SocialAuthClient struct {
 	httpClient      *http.Client
@@ -87,6 +99,83 @@ func NewSocialAuthClient(cfg *config.Config) *SocialAuthClient {
 	}
 }

+// startWebCallbackServer starts a local HTTP server to receive the OAuth callback.
+// This is used instead of the kiro:// protocol handler to avoid redirect_mismatch errors.
+func (c *SocialAuthClient) startWebCallbackServer(ctx context.Context, expectedState string) (string, <-chan WebCallbackResult, error) {
+	// Try to find an available port - use localhost like Kiro does
+	listener, err := net.Listen("tcp", fmt.Sprintf("localhost:%d", socialAuthCallbackPort))
+	if err != nil {
+		// Try with dynamic port (RFC 8252 allows dynamic ports for native apps)
+		log.Warnf("kiro social auth: default port %d is busy, falling back to dynamic port", socialAuthCallbackPort)
+		listener, err = net.Listen("tcp", "localhost:0")
+		if err != nil {
+			return "", nil, fmt.Errorf("failed to start callback server: %w", err)
+		}
+	}
+
+	port := listener.Addr().(*net.TCPAddr).Port
+	// Use http scheme for local callback server
+	redirectURI := fmt.Sprintf("http://localhost:%d/oauth/callback", port)
+	resultChan := make(chan WebCallbackResult, 1)
+
+	server := &http.Server{
+		ReadHeaderTimeout: 10 * time.Second,
+	}
+
+	mux := http.NewServeMux()
+	mux.HandleFunc("/oauth/callback", func(w http.ResponseWriter, r *http.Request) {
+		code := r.URL.Query().Get("code")
+		state := r.URL.Query().Get("state")
+		errParam := r.URL.Query().Get("error")
+
+		if errParam != "" {
+			w.Header().Set("Content-Type", "text/html; charset=utf-8")
+			w.WriteHeader(http.StatusBadRequest)
+			fmt.Fprintf(w, `<!DOCTYPE html>
+<html><head><title>Login Failed</title></head>
+<body><h1>Login Failed</h1><p>%s</p><p>You can close this window.</p></body></html>`, html.EscapeString(errParam))
+			resultChan <- WebCallbackResult{Error: errParam}
+			return
+		}
+
+		if state != expectedState {
+			w.Header().Set("Content-Type", "text/html; charset=utf-8")
+			w.WriteHeader(http.StatusBadRequest)
+			fmt.Fprint(w, `<!DOCTYPE html>
+<html><head><title>Login Failed</title></head>
+<body><h1>Login Failed</h1><p>Invalid state parameter</p><p>You can close this window.</p></body></html>`)
+			resultChan <- WebCallbackResult{Error: "state mismatch"}
+			return
+		}
+
+		w.Header().Set("Content-Type", "text/html; charset=utf-8")
+		fmt.Fprint(w, `<!DOCTYPE html>
+<html><head><title>Login Successful</title></head>
+<body><h1>Login Successful!</h1><p>You can close this window and return to the terminal.</p>
+<script>window.close();</script></body></html>`)
+		resultChan <- WebCallbackResult{Code: code, State: state}
+	})
+
+	server.Handler = mux
+
+	go func() {
+		if err := server.Serve(listener); err != nil && err != http.ErrServerClosed {
+			log.Debugf("kiro social auth callback server error: %v", err)
+		}
+	}()
+
+	go func() {
+		select {
+		case <-ctx.Done():
+		case <-time.After(socialAuthTimeout):
+		case <-resultChan:
+		}
+		_ = server.Shutdown(context.Background())
+	}()
+
+	return redirectURI, resultChan, nil
+}
+
 // generatePKCE generates PKCE code verifier and challenge.
 func generatePKCE() (verifier, challenge string, err error) {
 	// Generate 32 bytes of random data for verifier
@@ -217,10 +306,12 @@ func (c *SocialAuthClient) RefreshSocialToken(ctx context.Context, refreshToken
 		ExpiresAt:    expiresAt.Format(time.RFC3339),
 		AuthMethod:   "social",
 		Provider:     "", // Caller should preserve original provider
+		Region:       "us-east-1",
 	}, nil
 }

-// LoginWithSocial performs OAuth login with Google.
+// LoginWithSocial performs OAuth login with Google or GitHub.
+// Uses local HTTP callback server instead of custom protocol handler to avoid redirect_mismatch errors.
 func (c *SocialAuthClient) LoginWithSocial(ctx context.Context, provider SocialProvider) (*KiroTokenData, error) {
 	providerName := string(provider)

@@ -228,28 +319,10 @@ func (c *SocialAuthClient) LoginWithSocial(ctx context.Context, provider SocialP
 	fmt.Printf("║         Kiro Authentication (%s)                    ║\n", providerName)
 	fmt.Println("╚══════════════════════════════════════════════════════════╝")

-	// Step 1: Setup protocol handler
+	// Step 1: Start local HTTP callback server (instead of kiro:// protocol handler)
+	// This avoids redirect_mismatch errors with AWS Cognito
 	fmt.Println("\nSetting up authentication...")

-	// Start the local callback server
-	handlerPort, err := c.protocolHandler.Start(ctx)
-	if err != nil {
-		return nil, fmt.Errorf("failed to start callback server: %w", err)
-	}
-	defer c.protocolHandler.Stop()
-
-	// Ensure protocol handler is installed and set as default
-	if err := SetupProtocolHandlerIfNeeded(handlerPort); err != nil {
-		fmt.Println("\n⚠ Protocol handler setup failed. Trying alternative method...")
-		fmt.Println("  If you see a browser 'Open with' dialog, select your default browser.")
-		fmt.Println("  For manual setup instructions, run: cliproxy kiro --help-protocol")
-		log.Debugf("kiro: protocol handler setup error: %v", err)
-		// Continue anyway - user might have set it up manually or select browser manually
-	} else {
-		// Force set our handler as default (prevents "Open with" dialog)
-		forceDefaultProtocolHandler()
-	}
-
 	// Step 2: Generate PKCE codes
 	codeVerifier, codeChallenge, err := generatePKCE()
 	if err != nil {
@@ -262,8 +335,15 @@ func (c *SocialAuthClient) LoginWithSocial(ctx context.Context, provider SocialP
 		return nil, fmt.Errorf("failed to generate state: %w", err)
 	}

-	// Step 4: Build the login URL (Kiro uses GET request with query params)
-	authURL := c.buildLoginURL(providerName, KiroRedirectURI, codeChallenge, state)
+	// Step 4: Start local HTTP callback server
+	redirectURI, resultChan, err := c.startWebCallbackServer(ctx, state)
+	if err != nil {
+		return nil, fmt.Errorf("failed to start callback server: %w", err)
+	}
+	log.Debugf("kiro social auth: callback server started at %s", redirectURI)
+
+	// Step 5: Build the login URL using HTTP redirect URI
+	authURL := c.buildLoginURL(providerName, redirectURI, codeChallenge, state)

 	// Set incognito mode based on config (defaults to true for Kiro, can be overridden with --no-incognito)
 	// Incognito mode enables multi-account support by bypassing cached sessions
@@ -279,7 +359,7 @@ func (c *SocialAuthClient) LoginWithSocial(ctx context.Context, provider SocialP
 		log.Debug("kiro: using incognito mode for multi-account support (default)")
 	}

-	// Step 5: Open browser for user authentication
+	// Step 6: Open browser for user authentication
 	fmt.Println("\n════════════════════════════════════════════════════════════")
 	fmt.Printf("  Opening browser for %s authentication...\n", providerName)
 	fmt.Println("════════════════════════════════════════════════════════════")
@@ -295,80 +375,78 @@ func (c *SocialAuthClient) LoginWithSocial(ctx context.Context, provider SocialP

 	fmt.Println("\n  Waiting for authentication callback...")

-	// Step 6: Wait for callback
-	callback, err := c.protocolHandler.WaitForCallback(ctx)
-	if err != nil {
-		return nil, fmt.Errorf("failed to receive callback: %w", err)
-	}
-
-	if callback.Error != "" {
-		return nil, fmt.Errorf("authentication error: %s", callback.Error)
-	}
-
-	if callback.State != state {
-		// Log state values for debugging, but don't expose in user-facing error
-		log.Debugf("kiro: OAuth state mismatch - expected %s, got %s", state, callback.State)
-		return nil, fmt.Errorf("OAuth state validation failed - please try again")
-	}
-
-	if callback.Code == "" {
-		return nil, fmt.Errorf("no authorization code received")
-	}
-
-	fmt.Println("\n✓ Authorization received!")
-
-	// Step 7: Exchange code for tokens
-	fmt.Println("Exchanging code for tokens...")
-
-	tokenReq := &CreateTokenRequest{
-		Code:         callback.Code,
-		CodeVerifier: codeVerifier,
-		RedirectURI:  KiroRedirectURI,
-	}
-
-	tokenResp, err := c.CreateToken(ctx, tokenReq)
-	if err != nil {
-		return nil, fmt.Errorf("failed to exchange code for tokens: %w", err)
-	}
-
-	fmt.Println("\n✓ Authentication successful!")
-
-	// Close the browser window
-	if err := browser.CloseBrowser(); err != nil {
-		log.Debugf("Failed to close browser: %v", err)
-	}
-
-	// Validate ExpiresIn - use default 1 hour if invalid
-	expiresIn := tokenResp.ExpiresIn
-	if expiresIn <= 0 {
-		expiresIn = 3600
-	}
-	expiresAt := time.Now().Add(time.Duration(expiresIn) * time.Second)
-
-	// Try to extract email from JWT access token first
-	email := ExtractEmailFromJWT(tokenResp.AccessToken)
-	
-	// If no email in JWT, ask user for account label (only in interactive mode)
-	if email == "" && isInteractiveTerminal() {
-		fmt.Print("\n  Enter account label for file naming (optional, press Enter to skip): ")
-		reader := bufio.NewReader(os.Stdin)
-		var err error
-		email, err = reader.ReadString('\n')
-		if err != nil {
-			log.Debugf("Failed to read account label: %v", err)
+	// Step 7: Wait for callback from HTTP server
+	select {
+	case <-ctx.Done():
+		return nil, ctx.Err()
+	case <-time.After(socialAuthTimeout):
+		return nil, fmt.Errorf("authentication timed out")
+	case callback := <-resultChan:
+		if callback.Error != "" {
+			return nil, fmt.Errorf("authentication error: %s", callback.Error)
 		}
-		email = strings.TrimSpace(email)
-	}

-	return &KiroTokenData{
-		AccessToken:  tokenResp.AccessToken,
-		RefreshToken: tokenResp.RefreshToken,
-		ProfileArn:   tokenResp.ProfileArn,
-		ExpiresAt:    expiresAt.Format(time.RFC3339),
-		AuthMethod:   "social",
-		Provider:     providerName,
-		Email:        email, // JWT email or user-provided label
-	}, nil
+		// State is already validated by the callback server
+		if callback.Code == "" {
+			return nil, fmt.Errorf("no authorization code received")
+		}
+
+		fmt.Println("\n✓ Authorization received!")
+
+		// Step 8: Exchange code for tokens
+		fmt.Println("Exchanging code for tokens...")
+
+		tokenReq := &CreateTokenRequest{
+			Code:         callback.Code,
+			CodeVerifier: codeVerifier,
+			RedirectURI:  redirectURI, // Use HTTP redirect URI, not kiro:// protocol
+		}
+
+		tokenResp, err := c.CreateToken(ctx, tokenReq)
+		if err != nil {
+			return nil, fmt.Errorf("failed to exchange code for tokens: %w", err)
+		}
+
+		fmt.Println("\n✓ Authentication successful!")
+
+		// Close the browser window
+		if err := browser.CloseBrowser(); err != nil {
+			log.Debugf("Failed to close browser: %v", err)
+		}
+
+		// Validate ExpiresIn - use default 1 hour if invalid
+		expiresIn := tokenResp.ExpiresIn
+		if expiresIn <= 0 {
+			expiresIn = 3600
+		}
+		expiresAt := time.Now().Add(time.Duration(expiresIn) * time.Second)
+
+		// Try to extract email from JWT access token first
+		email := ExtractEmailFromJWT(tokenResp.AccessToken)
+
+		// If no email in JWT, ask user for account label (only in interactive mode)
+		if email == "" && isInteractiveTerminal() {
+			fmt.Print("\n  Enter account label for file naming (optional, press Enter to skip): ")
+			reader := bufio.NewReader(os.Stdin)
+			var err error
+			email, err = reader.ReadString('\n')
+			if err != nil {
+				log.Debugf("Failed to read account label: %v", err)
+			}
+			email = strings.TrimSpace(email)
+		}
+
+		return &KiroTokenData{
+			AccessToken:  tokenResp.AccessToken,
+			RefreshToken: tokenResp.RefreshToken,
+			ProfileArn:   tokenResp.ProfileArn,
+			ExpiresAt:    expiresAt.Format(time.RFC3339),
+			AuthMethod:   "social",
+			Provider:     providerName,
+			Email:        email, // JWT email or user-provided label
+			Region:       "us-east-1",
+		}, nil
+	}
 }

 // LoginWithGoogle performs OAuth login with Google.
--- a/internal/auth/kiro/sso_oidc.go
+++ b/internal/auth/kiro/sso_oidc.go
@@ -735,6 +735,7 @@ func (c *SSOOIDCClient) RefreshToken(ctx context.Context, clientID, clientSecret
 		Provider:     "AWS",
 		ClientID:     clientID,
 		ClientSecret: clientSecret,
+		Region:       defaultIDCRegion,
 	}, nil
 }

@@ -850,16 +851,17 @@ func (c *SSOOIDCClient) LoginWithBuilderID(ctx context.Context) (*KiroTokenData,
 				ClientID:     regResp.ClientID,
 				ClientSecret: regResp.ClientSecret,
 				Email:        email,
+				Region:       defaultIDCRegion,
 			}, nil
-		}
-	}
+			}
+			}

-	// Close browser on timeout for better UX
-	if err := browser.CloseBrowser(); err != nil {
-		log.Debugf("Failed to close browser on timeout: %v", err)
-	}
-	return nil, fmt.Errorf("authorization timed out")
-}
+			// Close browser on timeout for better UX
+			if err := browser.CloseBrowser(); err != nil {
+			log.Debugf("Failed to close browser on timeout: %v", err)
+			}
+			return nil, fmt.Errorf("authorization timed out")
+			}

 // FetchUserEmail retrieves the user's email from AWS SSO OIDC userinfo endpoint.
 // Falls back to JWT parsing if userinfo fails.
@@ -1366,6 +1368,7 @@ func (c *SSOOIDCClient) LoginWithBuilderIDAuthCode(ctx context.Context) (*KiroTo
 			ClientID:     regResp.ClientID,
 			ClientSecret: regResp.ClientSecret,
 			Email:        email,
+			Region:       defaultIDCRegion,
 		}, nil
 	}
 }
--- a/internal/auth/kiro/token.go
+++ b/internal/auth/kiro/token.go
@@ -9,6 +9,8 @@ import (

 // KiroTokenStorage holds the persistent token data for Kiro authentication.
 type KiroTokenStorage struct {
+	// Type is the provider type for management UI recognition (must be "kiro")
+	Type string `json:"type"`
 	// AccessToken is the OAuth2 access token for API access
 	AccessToken string `json:"access_token"`
 	// RefreshToken is used to obtain new access tokens
@@ -23,6 +25,16 @@ type KiroTokenStorage struct {
 	Provider string `json:"provider"`
 	// LastRefresh is the timestamp of the last token refresh
 	LastRefresh string `json:"last_refresh"`
+	// ClientID is the OAuth client ID (required for token refresh)
+	ClientID string `json:"client_id,omitempty"`
+	// ClientSecret is the OAuth client secret (required for token refresh)
+	ClientSecret string `json:"client_secret,omitempty"`
+	// Region is the AWS region
+	Region string `json:"region,omitempty"`
+	// StartURL is the AWS Identity Center start URL (for IDC auth)
+	StartURL string `json:"start_url,omitempty"`
+	// Email is the user's email address
+	Email string `json:"email,omitempty"`
 }

 // SaveTokenToFile persists the token storage to the specified file path.
@@ -68,5 +80,10 @@ func (s *KiroTokenStorage) ToTokenData() *KiroTokenData {
 		ExpiresAt:    s.ExpiresAt,
 		AuthMethod:   s.AuthMethod,
 		Provider:     s.Provider,
+		ClientID:     s.ClientID,
+		ClientSecret: s.ClientSecret,
+		Region:       s.Region,
+		StartURL:     s.StartURL,
+		Email:        s.Email,
 	}
 }
--- a/internal/auth/kiro/token_repository.go
+++ b/internal/auth/kiro/token_repository.go
@@ -0,0 +1,273 @@
+package kiro
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"io/fs"
+	"os"
+	"path/filepath"
+	"sort"
+	"strings"
+	"sync"
+	"time"
+
+	log "github.com/sirupsen/logrus"
+)
+
+// FileTokenRepository 实现 TokenRepository 接口，基于文件系统存储
+type FileTokenRepository struct {
+	mu      sync.RWMutex
+	baseDir string
+}
+
+// NewFileTokenRepository 创建一个新的文件 token 存储库
+func NewFileTokenRepository(baseDir string) *FileTokenRepository {
+	return &FileTokenRepository{
+		baseDir: baseDir,
+	}
+}
+
+// SetBaseDir 设置基础目录
+func (r *FileTokenRepository) SetBaseDir(dir string) {
+	r.mu.Lock()
+	r.baseDir = strings.TrimSpace(dir)
+	r.mu.Unlock()
+}
+
+// FindOldestUnverified 查找需要刷新的 token（按最后验证时间排序）
+func (r *FileTokenRepository) FindOldestUnverified(limit int) []*Token {
+	r.mu.RLock()
+	baseDir := r.baseDir
+	r.mu.RUnlock()
+
+	if baseDir == "" {
+		log.Debug("token repository: base directory not configured")
+		return nil
+	}
+
+	var tokens []*Token
+
+	err := filepath.WalkDir(baseDir, func(path string, d fs.DirEntry, walkErr error) error {
+		if walkErr != nil {
+			return nil // 忽略错误，继续遍历
+		}
+		if d.IsDir() {
+			return nil
+		}
+		if !strings.HasSuffix(strings.ToLower(d.Name()), ".json") {
+			return nil
+		}
+
+		// 只处理 kiro 相关的 token 文件
+		if !strings.HasPrefix(d.Name(), "kiro-") {
+			return nil
+		}
+
+		token, err := r.readTokenFile(path)
+		if err != nil {
+			log.Debugf("token repository: failed to read token file %s: %v", path, err)
+			return nil
+		}
+
+		if token != nil && token.RefreshToken != "" {
+			// 检查 token 是否需要刷新（过期前 5 分钟）
+			if token.ExpiresAt.IsZero() || time.Until(token.ExpiresAt) < 5*time.Minute {
+				tokens = append(tokens, token)
+			}
+		}
+
+		return nil
+	})
+
+	if err != nil {
+		log.Warnf("token repository: error walking directory: %v", err)
+	}
+
+	// 按最后验证时间排序（最旧的优先）
+	sort.Slice(tokens, func(i, j int) bool {
+		return tokens[i].LastVerified.Before(tokens[j].LastVerified)
+	})
+
+	// 限制返回数量
+	if limit > 0 && len(tokens) > limit {
+		tokens = tokens[:limit]
+	}
+
+	return tokens
+}
+
+// UpdateToken 更新 token 并持久化到文件
+func (r *FileTokenRepository) UpdateToken(token *Token) error {
+	if token == nil {
+		return fmt.Errorf("token repository: token is nil")
+	}
+
+	r.mu.RLock()
+	baseDir := r.baseDir
+	r.mu.RUnlock()
+
+	if baseDir == "" {
+		return fmt.Errorf("token repository: base directory not configured")
+	}
+
+	// 构建文件路径
+	filePath := filepath.Join(baseDir, token.ID)
+	if !strings.HasSuffix(filePath, ".json") {
+		filePath += ".json"
+	}
+
+	// 读取现有文件内容
+	existingData := make(map[string]any)
+	if data, err := os.ReadFile(filePath); err == nil {
+		_ = json.Unmarshal(data, &existingData)
+	}
+
+	// 更新字段
+	existingData["access_token"] = token.AccessToken
+	existingData["refresh_token"] = token.RefreshToken
+	existingData["last_refresh"] = time.Now().Format(time.RFC3339)
+
+	if !token.ExpiresAt.IsZero() {
+		existingData["expires_at"] = token.ExpiresAt.Format(time.RFC3339)
+	}
+
+	// 保持原有的关键字段
+	if token.ClientID != "" {
+		existingData["client_id"] = token.ClientID
+	}
+	if token.ClientSecret != "" {
+		existingData["client_secret"] = token.ClientSecret
+	}
+	if token.AuthMethod != "" {
+		existingData["auth_method"] = token.AuthMethod
+	}
+	if token.Region != "" {
+		existingData["region"] = token.Region
+	}
+	if token.StartURL != "" {
+		existingData["start_url"] = token.StartURL
+	}
+
+	// 序列化并写入文件
+	raw, err := json.MarshalIndent(existingData, "", "  ")
+	if err != nil {
+		return fmt.Errorf("token repository: marshal failed: %w", err)
+	}
+
+	// 原子写入：先写入临时文件，再重命名
+	tmpPath := filePath + ".tmp"
+	if err := os.WriteFile(tmpPath, raw, 0o600); err != nil {
+		return fmt.Errorf("token repository: write temp file failed: %w", err)
+	}
+	if err := os.Rename(tmpPath, filePath); err != nil {
+		_ = os.Remove(tmpPath)
+		return fmt.Errorf("token repository: rename failed: %w", err)
+	}
+
+	log.Debugf("token repository: updated token %s", token.ID)
+	return nil
+}
+
+// readTokenFile 从文件读取 token
+func (r *FileTokenRepository) readTokenFile(path string) (*Token, error) {
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil, err
+	}
+
+	var metadata map[string]any
+	if err := json.Unmarshal(data, &metadata); err != nil {
+		return nil, err
+	}
+
+	// 检查是否是 kiro token
+	tokenType, _ := metadata["type"].(string)
+	if tokenType != "kiro" {
+		return nil, nil
+	}
+
+	// 检查 auth_method
+	authMethod, _ := metadata["auth_method"].(string)
+	if authMethod != "idc" && authMethod != "builder-id" {
+		return nil, nil // 只处理 IDC 和 Builder ID token
+	}
+
+	token := &Token{
+		ID:         filepath.Base(path),
+		AuthMethod: authMethod,
+	}
+
+	// 解析各字段
+	if v, ok := metadata["access_token"].(string); ok {
+		token.AccessToken = v
+	}
+	if v, ok := metadata["refresh_token"].(string); ok {
+		token.RefreshToken = v
+	}
+	if v, ok := metadata["client_id"].(string); ok {
+		token.ClientID = v
+	}
+	if v, ok := metadata["client_secret"].(string); ok {
+		token.ClientSecret = v
+	}
+	if v, ok := metadata["region"].(string); ok {
+		token.Region = v
+	}
+	if v, ok := metadata["start_url"].(string); ok {
+		token.StartURL = v
+	}
+	if v, ok := metadata["provider"].(string); ok {
+		token.Provider = v
+	}
+
+	// 解析时间字段
+	if v, ok := metadata["expires_at"].(string); ok {
+		if t, err := time.Parse(time.RFC3339, v); err == nil {
+			token.ExpiresAt = t
+		}
+	}
+	if v, ok := metadata["last_refresh"].(string); ok {
+		if t, err := time.Parse(time.RFC3339, v); err == nil {
+			token.LastVerified = t
+		}
+	}
+
+	return token, nil
+}
+
+// ListKiroTokens 列出所有 Kiro token（用于调试）
+func (r *FileTokenRepository) ListKiroTokens(ctx context.Context) ([]*Token, error) {
+	r.mu.RLock()
+	baseDir := r.baseDir
+	r.mu.RUnlock()
+
+	if baseDir == "" {
+		return nil, fmt.Errorf("token repository: base directory not configured")
+	}
+
+	var tokens []*Token
+
+	err := filepath.WalkDir(baseDir, func(path string, d fs.DirEntry, walkErr error) error {
+		if walkErr != nil {
+			return nil
+		}
+		if d.IsDir() {
+			return nil
+		}
+		if !strings.HasPrefix(d.Name(), "kiro-") || !strings.HasSuffix(d.Name(), ".json") {
+			return nil
+		}
+
+		token, err := r.readTokenFile(path)
+		if err != nil {
+			return nil
+		}
+		if token != nil {
+			tokens = append(tokens, token)
+		}
+		return nil
+	})
+
+	return tokens, err
+}
--- a/internal/auth/kiro/usage_checker.go
+++ b/internal/auth/kiro/usage_checker.go
@@ -0,0 +1,243 @@
+// Package kiro provides authentication functionality for AWS CodeWhisperer (Kiro) API.
+// This file implements usage quota checking and monitoring.
+package kiro
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"strings"
+	"time"
+
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/config"
+	"github.com/router-for-me/CLIProxyAPI/v6/internal/util"
+)
+
+// UsageQuotaResponse represents the API response structure for usage quota checking.
+type UsageQuotaResponse struct {
+	UsageBreakdownList []UsageBreakdownExtended `json:"usageBreakdownList"`
+	SubscriptionInfo   *SubscriptionInfo        `json:"subscriptionInfo,omitempty"`
+	NextDateReset      float64                  `json:"nextDateReset,omitempty"`
+}
+
+// UsageBreakdownExtended represents detailed usage information for quota checking.
+// Note: UsageBreakdown is already defined in codewhisperer_client.go
+type UsageBreakdownExtended struct {
+	ResourceType              string                 `json:"resourceType"`
+	UsageLimitWithPrecision   float64                `json:"usageLimitWithPrecision"`
+	CurrentUsageWithPrecision float64                `json:"currentUsageWithPrecision"`
+	FreeTrialInfo             *FreeTrialInfoExtended `json:"freeTrialInfo,omitempty"`
+}
+
+// FreeTrialInfoExtended represents free trial usage information.
+type FreeTrialInfoExtended struct {
+	FreeTrialStatus           string  `json:"freeTrialStatus"`
+	UsageLimitWithPrecision   float64 `json:"usageLimitWithPrecision"`
+	CurrentUsageWithPrecision float64 `json:"currentUsageWithPrecision"`
+}
+
+// QuotaStatus represents the quota status for a token.
+type QuotaStatus struct {
+	TotalLimit     float64
+	CurrentUsage   float64
+	RemainingQuota float64
+	IsExhausted    bool
+	ResourceType   string
+	NextReset      time.Time
+}
+
+// UsageChecker provides methods for checking token quota usage.
+type UsageChecker struct {
+	httpClient *http.Client
+	endpoint   string
+}
+
+// NewUsageChecker creates a new UsageChecker instance.
+func NewUsageChecker(cfg *config.Config) *UsageChecker {
+	return &UsageChecker{
+		httpClient: util.SetProxy(&cfg.SDKConfig, &http.Client{Timeout: 30 * time.Second}),
+		endpoint:   awsKiroEndpoint,
+	}
+}
+
+// NewUsageCheckerWithClient creates a UsageChecker with a custom HTTP client.
+func NewUsageCheckerWithClient(client *http.Client) *UsageChecker {
+	return &UsageChecker{
+		httpClient: client,
+		endpoint:   awsKiroEndpoint,
+	}
+}
+
+// CheckUsage retrieves usage limits for the given token.
+func (c *UsageChecker) CheckUsage(ctx context.Context, tokenData *KiroTokenData) (*UsageQuotaResponse, error) {
+	if tokenData == nil {
+		return nil, fmt.Errorf("token data is nil")
+	}
+
+	if tokenData.AccessToken == "" {
+		return nil, fmt.Errorf("access token is empty")
+	}
+
+	payload := map[string]interface{}{
+		"origin":       "AI_EDITOR",
+		"profileArn":   tokenData.ProfileArn,
+		"resourceType": "AGENTIC_REQUEST",
+	}
+
+	jsonBody, err := json.Marshal(payload)
+	if err != nil {
+		return nil, fmt.Errorf("failed to marshal request: %w", err)
+	}
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.endpoint, strings.NewReader(string(jsonBody)))
+	if err != nil {
+		return nil, fmt.Errorf("failed to create request: %w", err)
+	}
+
+	req.Header.Set("Content-Type", "application/x-amz-json-1.0")
+	req.Header.Set("x-amz-target", targetGetUsage)
+	req.Header.Set("Authorization", "Bearer "+tokenData.AccessToken)
+	req.Header.Set("Accept", "application/json")
+
+	resp, err := c.httpClient.Do(req)
+	if err != nil {
+		return nil, fmt.Errorf("request failed: %w", err)
+	}
+	defer resp.Body.Close()
+
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, fmt.Errorf("failed to read response: %w", err)
+	}
+
+	if resp.StatusCode != http.StatusOK {
+		return nil, fmt.Errorf("API error (status %d): %s", resp.StatusCode, string(body))
+	}
+
+	var result UsageQuotaResponse
+	if err := json.Unmarshal(body, &result); err != nil {
+		return nil, fmt.Errorf("failed to parse usage response: %w", err)
+	}
+
+	return &result, nil
+}
+
+// CheckUsageByAccessToken retrieves usage limits using an access token and profile ARN directly.
+func (c *UsageChecker) CheckUsageByAccessToken(ctx context.Context, accessToken, profileArn string) (*UsageQuotaResponse, error) {
+	tokenData := &KiroTokenData{
+		AccessToken: accessToken,
+		ProfileArn:  profileArn,
+	}
+	return c.CheckUsage(ctx, tokenData)
+}
+
+// GetRemainingQuota calculates the remaining quota from usage limits.
+func GetRemainingQuota(usage *UsageQuotaResponse) float64 {
+	if usage == nil || len(usage.UsageBreakdownList) == 0 {
+		return 0
+	}
+
+	var totalRemaining float64
+	for _, breakdown := range usage.UsageBreakdownList {
+		remaining := breakdown.UsageLimitWithPrecision - breakdown.CurrentUsageWithPrecision
+		if remaining > 0 {
+			totalRemaining += remaining
+		}
+
+		if breakdown.FreeTrialInfo != nil {
+			freeRemaining := breakdown.FreeTrialInfo.UsageLimitWithPrecision - breakdown.FreeTrialInfo.CurrentUsageWithPrecision
+			if freeRemaining > 0 {
+				totalRemaining += freeRemaining
+			}
+		}
+	}
+
+	return totalRemaining
+}
+
+// IsQuotaExhausted checks if the quota is exhausted based on usage limits.
+func IsQuotaExhausted(usage *UsageQuotaResponse) bool {
+	if usage == nil || len(usage.UsageBreakdownList) == 0 {
+		return true
+	}
+
+	for _, breakdown := range usage.UsageBreakdownList {
+		if breakdown.CurrentUsageWithPrecision < breakdown.UsageLimitWithPrecision {
+			return false
+		}
+
+		if breakdown.FreeTrialInfo != nil {
+			if breakdown.FreeTrialInfo.CurrentUsageWithPrecision < breakdown.FreeTrialInfo.UsageLimitWithPrecision {
+				return false
+			}
+		}
+	}
+
+	return true
+}
+
+// GetQuotaStatus retrieves a comprehensive quota status for a token.
+func (c *UsageChecker) GetQuotaStatus(ctx context.Context, tokenData *KiroTokenData) (*QuotaStatus, error) {
+	usage, err := c.CheckUsage(ctx, tokenData)
+	if err != nil {
+		return nil, err
+	}
+
+	status := &QuotaStatus{
+		IsExhausted: IsQuotaExhausted(usage),
+	}
+
+	if len(usage.UsageBreakdownList) > 0 {
+		breakdown := usage.UsageBreakdownList[0]
+		status.TotalLimit = breakdown.UsageLimitWithPrecision
+		status.CurrentUsage = breakdown.CurrentUsageWithPrecision
+		status.RemainingQuota = breakdown.UsageLimitWithPrecision - breakdown.CurrentUsageWithPrecision
+		status.ResourceType = breakdown.ResourceType
+
+		if breakdown.FreeTrialInfo != nil {
+			status.TotalLimit += breakdown.FreeTrialInfo.UsageLimitWithPrecision
+			status.CurrentUsage += breakdown.FreeTrialInfo.CurrentUsageWithPrecision
+			freeRemaining := breakdown.FreeTrialInfo.UsageLimitWithPrecision - breakdown.FreeTrialInfo.CurrentUsageWithPrecision
+			if freeRemaining > 0 {
+				status.RemainingQuota += freeRemaining
+			}
+		}
+	}
+
+	if usage.NextDateReset > 0 {
+		status.NextReset = time.Unix(int64(usage.NextDateReset/1000), 0)
+	}
+
+	return status, nil
+}
+
+// CalculateAvailableCount calculates the available request count based on usage limits.
+func CalculateAvailableCount(usage *UsageQuotaResponse) float64 {
+	return GetRemainingQuota(usage)
+}
+
+// GetUsagePercentage calculates the usage percentage.
+func GetUsagePercentage(usage *UsageQuotaResponse) float64 {
+	if usage == nil || len(usage.UsageBreakdownList) == 0 {
+		return 100.0
+	}
+
+	var totalLimit, totalUsage float64
+	for _, breakdown := range usage.UsageBreakdownList {
+		totalLimit += breakdown.UsageLimitWithPrecision
+		totalUsage += breakdown.CurrentUsageWithPrecision
+
+		if breakdown.FreeTrialInfo != nil {
+			totalLimit += breakdown.FreeTrialInfo.UsageLimitWithPrecision
+			totalUsage += breakdown.FreeTrialInfo.CurrentUsageWithPrecision
+		}
+	}
+
+	if totalLimit == 0 {
+		return 100.0
+	}
+
+	return (totalUsage / totalLimit) * 100
+}
--- a/internal/cache/signature_cache.go
+++ b/internal/cache/signature_cache.go
@@ -96,17 +96,17 @@ func purgeExpiredSessions() {

 // CacheSignature stores a thinking signature for a given session and text.
 // Used for Claude models that require signed thinking blocks in multi-turn conversations.
-func CacheSignature(modelName, sessionID, text, signature string) {
-	if sessionID == "" || text == "" || signature == "" {
+func CacheSignature(modelName, text, signature string) {
+	if text == "" || signature == "" {
 		return
 	}
 	if len(signature) < MinValidSignatureLen {
 		return
 	}

-	sc := getOrCreateSession(fmt.Sprintf("%s#%s", GetModelGroup(modelName), sessionID))
+	text = fmt.Sprintf("%s#%s", GetModelGroup(modelName), text)
 	textHash := hashText(text)
-
+	sc := getOrCreateSession(textHash)
 	sc.mu.Lock()
 	defer sc.mu.Unlock()

@@ -118,13 +118,21 @@ func CacheSignature(modelName, sessionID, text, signature string) {

 // GetCachedSignature retrieves a cached signature for a given session and text.
 // Returns empty string if not found or expired.
-func GetCachedSignature(modelName, sessionID, text string) string {
-	if sessionID == "" || text == "" {
+func GetCachedSignature(modelName, text string) string {
+	family := GetModelGroup(modelName)
+
+	if text == "" {
+		if family == "gemini" {
+			return "skip_thought_signature_validator"
+		}
 		return ""
 	}
-
-	val, ok := signatureCache.Load(fmt.Sprintf("%s#%s", GetModelGroup(modelName), sessionID))
+	text = fmt.Sprintf("%s#%s", GetModelGroup(modelName), text)
+	val, ok := signatureCache.Load(hashText(text))
 	if !ok {
+		if family == "gemini" {
+			return "skip_thought_signature_validator"
+		}
 		return ""
 	}
 	sc := val.(*sessionCache)
@@ -137,11 +145,17 @@ func GetCachedSignature(modelName, sessionID, text string) string {
 	entry, exists := sc.entries[textHash]
 	if !exists {
 		sc.mu.Unlock()
+		if family == "gemini" {
+			return "skip_thought_signature_validator"
+		}
 		return ""
 	}
 	if now.Sub(entry.Timestamp) > SignatureCacheTTL {
 		delete(sc.entries, textHash)
 		sc.mu.Unlock()
+		if family == "gemini" {
+			return "skip_thought_signature_validator"
+		}
 		return ""
 	}

@@ -156,7 +170,13 @@ func GetCachedSignature(modelName, sessionID, text string) string {
 // ClearSignatureCache clears signature cache for a specific session or all sessions.
 func ClearSignatureCache(sessionID string) {
 	if sessionID != "" {
-		signatureCache.Delete(sessionID)
+		signatureCache.Range(func(key, _ any) bool {
+			kStr, ok := key.(string)
+			if ok && strings.HasSuffix(kStr, "#"+sessionID) {
+				signatureCache.Delete(key)
+			}
+			return true
+		})
 	} else {
 		signatureCache.Range(func(key, _ any) bool {
 			signatureCache.Delete(key)
@@ -166,8 +186,8 @@ func ClearSignatureCache(sessionID string) {
 }

 // HasValidSignature checks if a signature is valid (non-empty and long enough)
-func HasValidSignature(signature string) bool {
-	return signature != "" && len(signature) >= MinValidSignatureLen
+func HasValidSignature(modelName, signature string) bool {
+	return (signature != "" && len(signature) >= MinValidSignatureLen) || (signature == "skip_thought_signature_validator" && GetModelGroup(modelName) == "gemini")
 }

 func GetModelGroup(modelName string) string {
--- a/internal/cache/signature_cache_test.go
+++ b/internal/cache/signature_cache_test.go
@@ -8,15 +8,14 @@ import (
 func TestCacheSignature_BasicStorageAndRetrieval(t *testing.T) {
 	ClearSignatureCache("")

-	sessionID := "test-session-1"
 	text := "This is some thinking text content"
 	signature := "abc123validSignature1234567890123456789012345678901234567890"

 	// Store signature
-	CacheSignature(sessionID, text, signature)
+	CacheSignature("test-model", text, signature)

 	// Retrieve signature
-	retrieved := GetCachedSignature(sessionID, text)
+	retrieved := GetCachedSignature("test-model", text)
 	if retrieved != signature {
 		t.Errorf("Expected signature '%s', got '%s'", signature, retrieved)
 	}
@@ -29,13 +28,13 @@ func TestCacheSignature_DifferentSessions(t *testing.T) {
 	sig1 := "signature1_1234567890123456789012345678901234567890123456"
 	sig2 := "signature2_1234567890123456789012345678901234567890123456"

-	CacheSignature("session-a", text, sig1)
-	CacheSignature("session-b", text, sig2)
+	CacheSignature("test-model", text, sig1)
+	CacheSignature("test-model", text, sig2)

-	if GetCachedSignature("session-a", text) != sig1 {
+	if GetCachedSignature("test-model", text) != sig1 {
 		t.Error("Session-a signature mismatch")
 	}
-	if GetCachedSignature("session-b", text) != sig2 {
+	if GetCachedSignature("test-model", text) != sig2 {
 		t.Error("Session-b signature mismatch")
 	}
 }
@@ -44,13 +43,13 @@ func TestCacheSignature_NotFound(t *testing.T) {
 	ClearSignatureCache("")

 	// Non-existent session
-	if got := GetCachedSignature("nonexistent", "some text"); got != "" {
+	if got := GetCachedSignature("test-model", "some text"); got != "" {
 		t.Errorf("Expected empty string for nonexistent session, got '%s'", got)
 	}

 	// Existing session but different text
-	CacheSignature("session-x", "text-a", "sigA12345678901234567890123456789012345678901234567890")
-	if got := GetCachedSignature("session-x", "text-b"); got != "" {
+	CacheSignature("test-model", "text-a", "sigA12345678901234567890123456789012345678901234567890")
+	if got := GetCachedSignature("test-model", "text-b"); got != "" {
 		t.Errorf("Expected empty string for different text, got '%s'", got)
 	}
 }
@@ -59,12 +58,12 @@ func TestCacheSignature_EmptyInputs(t *testing.T) {
 	ClearSignatureCache("")

 	// All empty/invalid inputs should be no-ops
-	CacheSignature("", "text", "sig12345678901234567890123456789012345678901234567890")
-	CacheSignature("session", "", "sig12345678901234567890123456789012345678901234567890")
-	CacheSignature("session", "text", "")
-	CacheSignature("session", "text", "short") // Too short
+	CacheSignature("test-model", "text", "sig12345678901234567890123456789012345678901234567890")
+	CacheSignature("test-model", "", "sig12345678901234567890123456789012345678901234567890")
+	CacheSignature("test-model", "text", "")
+	CacheSignature("test-model", "text", "short") // Too short

-	if got := GetCachedSignature("session", "text"); got != "" {
+	if got := GetCachedSignature("test-model", "text"); got != "" {
 		t.Errorf("Expected empty after invalid cache attempts, got '%s'", got)
 	}
 }
@@ -72,13 +71,12 @@ func TestCacheSignature_EmptyInputs(t *testing.T) {
 func TestCacheSignature_ShortSignatureRejected(t *testing.T) {
 	ClearSignatureCache("")

-	sessionID := "test-short-sig"
 	text := "Some text"
 	shortSig := "abc123" // Less than 50 chars

-	CacheSignature(sessionID, text, shortSig)
+	CacheSignature("test-model", text, shortSig)

-	if got := GetCachedSignature(sessionID, text); got != "" {
+	if got := GetCachedSignature("test-model", text); got != "" {
 		t.Errorf("Short signature should be rejected, got '%s'", got)
 	}
 }
@@ -87,15 +85,15 @@ func TestClearSignatureCache_SpecificSession(t *testing.T) {
 	ClearSignatureCache("")

 	sig := "validSig1234567890123456789012345678901234567890123456"
-	CacheSignature("session-1", "text", sig)
-	CacheSignature("session-2", "text", sig)
+	CacheSignature("test-model", "text", sig)
+	CacheSignature("test-model", "text", sig)

 	ClearSignatureCache("session-1")

-	if got := GetCachedSignature("session-1", "text"); got != "" {
+	if got := GetCachedSignature("test-model", "text"); got != "" {
 		t.Error("session-1 should be cleared")
 	}
-	if got := GetCachedSignature("session-2", "text"); got != sig {
+	if got := GetCachedSignature("test-model", "text"); got != sig {
 		t.Error("session-2 should still exist")
 	}
 }
@@ -104,15 +102,15 @@ func TestClearSignatureCache_AllSessions(t *testing.T) {
 	ClearSignatureCache("")

 	sig := "validSig1234567890123456789012345678901234567890123456"
-	CacheSignature("session-1", "text", sig)
-	CacheSignature("session-2", "text", sig)
+	CacheSignature("test-model", "text", sig)
+	CacheSignature("test-model", "text", sig)

 	ClearSignatureCache("")

-	if got := GetCachedSignature("session-1", "text"); got != "" {
+	if got := GetCachedSignature("test-model", "text"); got != "" {
 		t.Error("session-1 should be cleared")
 	}
-	if got := GetCachedSignature("session-2", "text"); got != "" {
+	if got := GetCachedSignature("test-model", "text"); got != "" {
 		t.Error("session-2 should be cleared")
 	}
 }
@@ -132,7 +130,7 @@ func TestHasValidSignature(t *testing.T) {

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
-			result := HasValidSignature(tt.signature)
+			result := HasValidSignature("claude-sonnet-4-5-thinking", tt.signature)
 			if result != tt.expected {
 				t.Errorf("HasValidSignature(%q) = %v, expected %v", tt.signature, result, tt.expected)
 			}
@@ -143,21 +141,19 @@ func TestHasValidSignature(t *testing.T) {
 func TestCacheSignature_TextHashCollisionResistance(t *testing.T) {
 	ClearSignatureCache("")

-	sessionID := "hash-test-session"
-
 	// Different texts should produce different hashes
 	text1 := "First thinking text"
 	text2 := "Second thinking text"
 	sig1 := "signature1_1234567890123456789012345678901234567890123456"
 	sig2 := "signature2_1234567890123456789012345678901234567890123456"

-	CacheSignature(sessionID, text1, sig1)
-	CacheSignature(sessionID, text2, sig2)
+	CacheSignature("test-model", text1, sig1)
+	CacheSignature("test-model", text2, sig2)

-	if GetCachedSignature(sessionID, text1) != sig1 {
+	if GetCachedSignature("test-model", text1) != sig1 {
 		t.Error("text1 signature mismatch")
 	}
-	if GetCachedSignature(sessionID, text2) != sig2 {
+	if GetCachedSignature("test-model", text2) != sig2 {
 		t.Error("text2 signature mismatch")
 	}
 }
@@ -165,13 +161,12 @@ func TestCacheSignature_TextHashCollisionResistance(t *testing.T) {
 func TestCacheSignature_UnicodeText(t *testing.T) {
 	ClearSignatureCache("")

-	sessionID := "unicode-session"
 	text := "한글 텍스트와 이모지 🎉 그리고 特殊文字"
 	sig := "unicodeSig123456789012345678901234567890123456789012345"

-	CacheSignature(sessionID, text, sig)
+	CacheSignature("test-model", text, sig)

-	if got := GetCachedSignature(sessionID, text); got != sig {
+	if got := GetCachedSignature("test-model", text); got != sig {
 		t.Errorf("Unicode text signature retrieval failed, got '%s'", got)
 	}
 }
@@ -179,15 +174,14 @@ func TestCacheSignature_UnicodeText(t *testing.T) {
 func TestCacheSignature_Overwrite(t *testing.T) {
 	ClearSignatureCache("")

-	sessionID := "overwrite-session"
 	text := "Same text"
 	sig1 := "firstSignature12345678901234567890123456789012345678901"
 	sig2 := "secondSignature1234567890123456789012345678901234567890"

-	CacheSignature(sessionID, text, sig1)
-	CacheSignature(sessionID, text, sig2) // Overwrite
+	CacheSignature("test-model", text, sig1)
+	CacheSignature("test-model", text, sig2) // Overwrite

-	if got := GetCachedSignature(sessionID, text); got != sig2 {
+	if got := GetCachedSignature("test-model", text); got != sig2 {
 		t.Errorf("Expected overwritten signature '%s', got '%s'", sig2, got)
 	}
 }
@@ -199,14 +193,13 @@ func TestCacheSignature_ExpirationLogic(t *testing.T) {

 	// This test verifies the expiration check exists
 	// In a real scenario, we'd mock time.Now()
-	sessionID := "expiration-test"
 	text := "text"
 	sig := "validSig1234567890123456789012345678901234567890123456"

-	CacheSignature(sessionID, text, sig)
+	CacheSignature("test-model", text, sig)

 	// Fresh entry should be retrievable
-	if got := GetCachedSignature(sessionID, text); got != sig {
+	if got := GetCachedSignature("test-model", text); got != sig {
 		t.Errorf("Fresh entry should be retrievable, got '%s'", got)
 	}

--- a/internal/logging/gin_logger.go
+++ b/internal/logging/gin_logger.go
@@ -4,6 +4,7 @@
 package logging

 import (
+	"errors"
 	"fmt"
 	"net/http"
 	"runtime/debug"
@@ -112,6 +113,11 @@ func isAIAPIPath(path string) bool {
 //   - gin.HandlerFunc: A middleware handler for panic recovery
 func GinLogrusRecovery() gin.HandlerFunc {
 	return gin.CustomRecovery(func(c *gin.Context, recovered interface{}) {
+		if err, ok := recovered.(error); ok && errors.Is(err, http.ErrAbortHandler) {
+			// Let net/http handle ErrAbortHandler so the connection is aborted without noisy stack logs.
+			panic(http.ErrAbortHandler)
+		}
+
 		log.WithFields(log.Fields{
 			"panic": recovered,
 			"stack": string(debug.Stack()),
--- a/internal/logging/gin_logger_test.go
+++ b/internal/logging/gin_logger_test.go
@@ -0,0 +1,60 @@
+package logging
+
+import (
+	"errors"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+
+	"github.com/gin-gonic/gin"
+)
+
+func TestGinLogrusRecoveryRepanicsErrAbortHandler(t *testing.T) {
+	gin.SetMode(gin.TestMode)
+
+	engine := gin.New()
+	engine.Use(GinLogrusRecovery())
+	engine.GET("/abort", func(c *gin.Context) {
+		panic(http.ErrAbortHandler)
+	})
+
+	req := httptest.NewRequest(http.MethodGet, "/abort", nil)
+	recorder := httptest.NewRecorder()
+
+	defer func() {
+		recovered := recover()
+		if recovered == nil {
+			t.Fatalf("expected panic, got nil")
+		}
+		err, ok := recovered.(error)
+		if !ok {
+			t.Fatalf("expected error panic, got %T", recovered)
+		}
+		if !errors.Is(err, http.ErrAbortHandler) {
+			t.Fatalf("expected ErrAbortHandler, got %v", err)
+		}
+		if err != http.ErrAbortHandler {
+			t.Fatalf("expected exact ErrAbortHandler sentinel, got %v", err)
+		}
+	}()
+
+	engine.ServeHTTP(recorder, req)
+}
+
+func TestGinLogrusRecoveryHandlesRegularPanic(t *testing.T) {
+	gin.SetMode(gin.TestMode)
+
+	engine := gin.New()
+	engine.Use(GinLogrusRecovery())
+	engine.GET("/panic", func(c *gin.Context) {
+		panic("boom")
+	})
+
+	req := httptest.NewRequest(http.MethodGet, "/panic", nil)
+	recorder := httptest.NewRecorder()
+
+	engine.ServeHTTP(recorder, req)
+	if recorder.Code != http.StatusInternalServerError {
+		t.Fatalf("expected 500, got %d", recorder.Code)
+	}
+}
--- a/internal/registry/kiro_model_converter.go
+++ b/internal/registry/kiro_model_converter.go
@@ -0,0 +1,303 @@
+// Package registry provides Kiro model conversion utilities.
+// This file handles converting dynamic Kiro API model lists to the internal ModelInfo format,
+// and merging with static metadata for thinking support and other capabilities.
+package registry
+
+import (
+	"strings"
+	"time"
+)
+
+// KiroAPIModel represents a model from Kiro API response.
+// This is a local copy to avoid import cycles with the kiro package.
+// The structure mirrors kiro.KiroModel for easy data conversion.
+type KiroAPIModel struct {
+	// ModelID is the unique identifier for the model (e.g., "claude-sonnet-4.5")
+	ModelID string
+	// ModelName is the human-readable name
+	ModelName string
+	// Description is the model description
+	Description string
+	// RateMultiplier is the credit multiplier for this model
+	RateMultiplier float64
+	// RateUnit is the unit for rate calculation (e.g., "credit")
+	RateUnit string
+	// MaxInputTokens is the maximum input token limit
+	MaxInputTokens int
+}
+
+// DefaultKiroThinkingSupport defines the default thinking configuration for Kiro models.
+// All Kiro models support thinking with the following budget range.
+var DefaultKiroThinkingSupport = &ThinkingSupport{
+	Min:            1024,  // Minimum thinking budget tokens
+	Max:            32000, // Maximum thinking budget tokens
+	ZeroAllowed:    true,  // Allow disabling thinking with 0
+	DynamicAllowed: true,  // Allow dynamic thinking budget (-1)
+}
+
+// DefaultKiroContextLength is the default context window size for Kiro models.
+const DefaultKiroContextLength = 200000
+
+// DefaultKiroMaxCompletionTokens is the default max completion tokens for Kiro models.
+const DefaultKiroMaxCompletionTokens = 64000
+
+// ConvertKiroAPIModels converts Kiro API models to internal ModelInfo format.
+// It performs the following transformations:
+//   - Normalizes model ID (e.g., claude-sonnet-4.5 → kiro-claude-sonnet-4-5)
+//   - Adds default thinking support metadata
+//   - Sets default context length and max completion tokens if not provided
+//
+// Parameters:
+//   - kiroModels: List of models from Kiro API response
+//
+// Returns:
+//   - []*ModelInfo: Converted model information list
+func ConvertKiroAPIModels(kiroModels []*KiroAPIModel) []*ModelInfo {
+	if len(kiroModels) == 0 {
+		return nil
+	}
+
+	now := time.Now().Unix()
+	result := make([]*ModelInfo, 0, len(kiroModels))
+
+	for _, km := range kiroModels {
+		// Skip nil models
+		if km == nil {
+			continue
+		}
+
+		// Skip models without valid ID
+		if km.ModelID == "" {
+			continue
+		}
+
+		// Normalize the model ID to kiro-* format
+		normalizedID := normalizeKiroModelID(km.ModelID)
+
+		// Create ModelInfo with converted data
+		info := &ModelInfo{
+			ID:          normalizedID,
+			Object:      "model",
+			Created:     now,
+			OwnedBy:     "aws",
+			Type:        "kiro",
+			DisplayName: generateKiroDisplayName(km.ModelName, normalizedID),
+			Description: km.Description,
+			// Use MaxInputTokens from API if available, otherwise use default
+			ContextLength:       getContextLength(km.MaxInputTokens),
+			MaxCompletionTokens: DefaultKiroMaxCompletionTokens,
+			// All Kiro models support thinking
+			Thinking: cloneThinkingSupport(DefaultKiroThinkingSupport),
+		}
+
+		result = append(result, info)
+	}
+
+	return result
+}
+
+// GenerateAgenticVariants creates -agentic variants for each model.
+// Agentic variants are optimized for coding agents with chunked writes.
+//
+// Parameters:
+//   - models: Base models to generate variants for
+//
+// Returns:
+//   - []*ModelInfo: Combined list of base models and their agentic variants
+func GenerateAgenticVariants(models []*ModelInfo) []*ModelInfo {
+	if len(models) == 0 {
+		return nil
+	}
+
+	// Pre-allocate result with capacity for both base models and variants
+	result := make([]*ModelInfo, 0, len(models)*2)
+
+	for _, model := range models {
+		if model == nil {
+			continue
+		}
+
+		// Add the base model first
+		result = append(result, model)
+
+		// Skip if model already has -agentic suffix
+		if strings.HasSuffix(model.ID, "-agentic") {
+			continue
+		}
+
+		// Skip special models that shouldn't have agentic variants
+		if model.ID == "kiro-auto" {
+			continue
+		}
+
+		// Create agentic variant
+		agenticModel := &ModelInfo{
+			ID:                  model.ID + "-agentic",
+			Object:              model.Object,
+			Created:             model.Created,
+			OwnedBy:             model.OwnedBy,
+			Type:                model.Type,
+			DisplayName:         model.DisplayName + " (Agentic)",
+			Description:         generateAgenticDescription(model.Description),
+			ContextLength:       model.ContextLength,
+			MaxCompletionTokens: model.MaxCompletionTokens,
+			Thinking:            cloneThinkingSupport(model.Thinking),
+		}
+
+		result = append(result, agenticModel)
+	}
+
+	return result
+}
+
+// MergeWithStaticMetadata merges dynamic models with static metadata.
+// Static metadata takes priority for any overlapping fields.
+// This allows manual overrides for specific models while keeping dynamic discovery.
+//
+// Parameters:
+//   - dynamicModels: Models from Kiro API (converted to ModelInfo)
+//   - staticModels: Predefined model metadata (from GetKiroModels())
+//
+// Returns:
+//   - []*ModelInfo: Merged model list with static metadata taking priority
+func MergeWithStaticMetadata(dynamicModels, staticModels []*ModelInfo) []*ModelInfo {
+	if len(dynamicModels) == 0 && len(staticModels) == 0 {
+		return nil
+	}
+
+	// Build a map of static models for quick lookup
+	staticMap := make(map[string]*ModelInfo, len(staticModels))
+	for _, sm := range staticModels {
+		if sm != nil && sm.ID != "" {
+			staticMap[sm.ID] = sm
+		}
+	}
+
+	// Build result, preferring static metadata where available
+	seenIDs := make(map[string]struct{})
+	result := make([]*ModelInfo, 0, len(dynamicModels)+len(staticModels))
+
+	// First, process dynamic models and merge with static if available
+	for _, dm := range dynamicModels {
+		if dm == nil || dm.ID == "" {
+			continue
+		}
+
+		// Skip duplicates
+		if _, seen := seenIDs[dm.ID]; seen {
+			continue
+		}
+		seenIDs[dm.ID] = struct{}{}
+
+		// Check if static metadata exists for this model
+		if sm, exists := staticMap[dm.ID]; exists {
+			// Static metadata takes priority - use static model
+			result = append(result, sm)
+		} else {
+			// No static metadata - use dynamic model
+			result = append(result, dm)
+		}
+	}
+
+	// Add any static models not in dynamic list
+	for _, sm := range staticModels {
+		if sm == nil || sm.ID == "" {
+			continue
+		}
+		if _, seen := seenIDs[sm.ID]; seen {
+			continue
+		}
+		seenIDs[sm.ID] = struct{}{}
+		result = append(result, sm)
+	}
+
+	return result
+}
+
+// normalizeKiroModelID converts Kiro API model IDs to internal format.
+// Transformation rules:
+//   - Adds "kiro-" prefix if not present
+//   - Replaces dots with hyphens (e.g., 4.5 → 4-5)
+//   - Handles special cases like "auto" → "kiro-auto"
+//
+// Examples:
+//   - "claude-sonnet-4.5" → "kiro-claude-sonnet-4-5"
+//   - "claude-opus-4.5" → "kiro-claude-opus-4-5"
+//   - "auto" → "kiro-auto"
+//   - "kiro-claude-sonnet-4-5" → "kiro-claude-sonnet-4-5" (unchanged)
+func normalizeKiroModelID(modelID string) string {
+	if modelID == "" {
+		return ""
+	}
+
+	// Trim whitespace
+	modelID = strings.TrimSpace(modelID)
+
+	// Replace dots with hyphens (e.g., 4.5 → 4-5)
+	normalized := strings.ReplaceAll(modelID, ".", "-")
+
+	// Add kiro- prefix if not present
+	if !strings.HasPrefix(normalized, "kiro-") {
+		normalized = "kiro-" + normalized
+	}
+
+	return normalized
+}
+
+// generateKiroDisplayName creates a human-readable display name.
+// Uses the API-provided model name if available, otherwise generates from ID.
+func generateKiroDisplayName(modelName, normalizedID string) string {
+	if modelName != "" {
+		return "Kiro " + modelName
+	}
+
+	// Generate from normalized ID by removing kiro- prefix and formatting
+	displayID := strings.TrimPrefix(normalizedID, "kiro-")
+	// Capitalize first letter of each word
+	words := strings.Split(displayID, "-")
+	for i, word := range words {
+		if len(word) > 0 {
+			words[i] = strings.ToUpper(word[:1]) + word[1:]
+		}
+	}
+	return "Kiro " + strings.Join(words, " ")
+}
+
+// generateAgenticDescription creates description for agentic variants.
+func generateAgenticDescription(baseDescription string) string {
+	if baseDescription == "" {
+		return "Optimized for coding agents with chunked writes"
+	}
+	return baseDescription + " (Agentic mode: chunked writes)"
+}
+
+// getContextLength returns the context length, using default if not provided.
+func getContextLength(maxInputTokens int) int {
+	if maxInputTokens > 0 {
+		return maxInputTokens
+	}
+	return DefaultKiroContextLength
+}
+
+// cloneThinkingSupport creates a deep copy of ThinkingSupport.
+// Returns nil if input is nil.
+func cloneThinkingSupport(ts *ThinkingSupport) *ThinkingSupport {
+	if ts == nil {
+		return nil
+	}
+
+	clone := &ThinkingSupport{
+		Min:            ts.Min,
+		Max:            ts.Max,
+		ZeroAllowed:    ts.ZeroAllowed,
+		DynamicAllowed: ts.DynamicAllowed,
+	}
+
+	// Deep copy Levels slice if present
+	if len(ts.Levels) > 0 {
+		clone.Levels = make([]string, len(ts.Levels))
+		copy(clone.Levels, ts.Levels)
+	}
+
+	return clone
+}
--- a/internal/runtime/executor/antigravity_executor.go
+++ b/internal/runtime/executor/antigravity_executor.go
@@ -1202,7 +1202,7 @@ func (e *AntigravityExecutor) buildRequest(ctx context.Context, auth *cliproxyau
 	payload = geminiToAntigravity(modelName, payload, projectID)
 	payload, _ = sjson.SetBytes(payload, "model", modelName)

-	if strings.Contains(modelName, "claude") {
+	if strings.Contains(modelName, "claude") || strings.Contains(modelName, "gemini-3-pro-high") {
 		strJSON := string(payload)
 		paths := make([]string, 0)
 		util.Walk(gjson.ParseBytes(payload), "", "parametersJsonSchema", &paths)
@@ -1405,9 +1405,9 @@ func geminiToAntigravity(modelName string, payload []byte, projectID string) []b
 	template, _ = sjson.Set(template, "request.sessionId", generateStableSessionID(payload))

 	template, _ = sjson.Delete(template, "request.safetySettings")
-	template, _ = sjson.Set(template, "request.toolConfig.functionCallingConfig.mode", "VALIDATED")
+	//	template, _ = sjson.Set(template, "request.toolConfig.functionCallingConfig.mode", "VALIDATED")

-	if strings.Contains(modelName, "claude") {
+	if strings.Contains(modelName, "claude") || strings.Contains(modelName, "gemini-3-pro-high") {
 		gjson.Get(template, "request.tools").ForEach(func(key, tool gjson.Result) bool {
 			tool.Get("functionDeclarations").ForEach(func(funKey, funcDecl gjson.Result) bool {
 				if funcDecl.Get("parametersJsonSchema").Exists() {
@@ -1419,7 +1419,9 @@ func geminiToAntigravity(modelName string, payload []byte, projectID string) []b
 			})
 			return true
 		})
-	} else {
+	}
+
+	if !strings.Contains(modelName, "claude") {
 		template, _ = sjson.Delete(template, "request.generationConfig.maxOutputTokens")
 	}

--- a/internal/runtime/executor/kiro_executor.go
+++ b/internal/runtime/executor/kiro_executor.go
@@ -7,13 +7,16 @@ import (
 	"encoding/base64"
 	"encoding/binary"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
+	"net"
 	"net/http"
 	"os"
 	"path/filepath"
 	"strings"
 	"sync"
+	"syscall"
 	"time"

 	"github.com/google/uuid"
@@ -53,9 +56,28 @@ const (
 	kiroIDEUserAgent     = "aws-sdk-js/1.0.18 ua/2.1 os/darwin#25.0.0 lang/js md/nodejs#20.16.0 api/codewhispererstreaming#1.0.18 m/E KiroIDE-0.2.13-66c23a8c5d15afabec89ef9954ef52a119f10d369df04d548fc6c1eac694b0d1"
 	kiroIDEAmzUserAgent  = "aws-sdk-js/1.0.18 KiroIDE-0.2.13-66c23a8c5d15afabec89ef9954ef52a119f10d369df04d548fc6c1eac694b0d1"
 	kiroIDEAgentModeSpec = "spec"
-	kiroAgentModeVibe    = "vibe"
+
+	// Socket retry configuration constants (based on kiro2Api reference implementation)
+	// Maximum number of retry attempts for socket/network errors
+	kiroSocketMaxRetries = 3
+	// Base delay between retry attempts (uses exponential backoff: delay * 2^attempt)
+	kiroSocketBaseRetryDelay = 1 * time.Second
+	// Maximum delay between retry attempts (cap for exponential backoff)
+	kiroSocketMaxRetryDelay = 30 * time.Second
+	// First token timeout for streaming responses (how long to wait for first response)
+	kiroFirstTokenTimeout = 15 * time.Second
+	// Streaming read timeout (how long to wait between chunks)
+	kiroStreamingReadTimeout = 300 * time.Second
 )

+// retryableHTTPStatusCodes defines HTTP status codes that are considered retryable.
+// Based on kiro2Api reference: 502 (Bad Gateway), 503 (Service Unavailable), 504 (Gateway Timeout)
+var retryableHTTPStatusCodes = map[int]bool{
+	502: true, // Bad Gateway - upstream server error
+	503: true, // Service Unavailable - server temporarily overloaded
+	504: true, // Gateway Timeout - upstream server timeout
+}
+
 // Real-time usage estimation configuration
 // These control how often usage updates are sent during streaming
 var (
@@ -63,6 +85,241 @@ var (
 	usageUpdateTimeInterval  = 15 * time.Second // Or every 15 seconds, whichever comes first
 )

+// Global FingerprintManager for dynamic User-Agent generation per token
+// Each token gets a unique fingerprint on first use, which is cached for subsequent requests
+var (
+	globalFingerprintManager     *kiroauth.FingerprintManager
+	globalFingerprintManagerOnce sync.Once
+)
+
+// getGlobalFingerprintManager returns the global FingerprintManager instance
+func getGlobalFingerprintManager() *kiroauth.FingerprintManager {
+	globalFingerprintManagerOnce.Do(func() {
+		globalFingerprintManager = kiroauth.NewFingerprintManager()
+		log.Infof("kiro: initialized global FingerprintManager for dynamic UA generation")
+	})
+	return globalFingerprintManager
+}
+
+// retryConfig holds configuration for socket retry logic.
+// Based on kiro2Api Python implementation patterns.
+type retryConfig struct {
+	MaxRetries       int           // Maximum number of retry attempts
+	BaseDelay        time.Duration // Base delay between retries (exponential backoff)
+	MaxDelay         time.Duration // Maximum delay cap
+	RetryableErrors  []string      // List of retryable error patterns
+	RetryableStatus  map[int]bool  // HTTP status codes to retry
+	FirstTokenTmout  time.Duration // Timeout for first token in streaming
+	StreamReadTmout  time.Duration // Timeout between stream chunks
+}
+
+// defaultRetryConfig returns the default retry configuration for Kiro socket operations.
+func defaultRetryConfig() retryConfig {
+	return retryConfig{
+		MaxRetries:      kiroSocketMaxRetries,
+		BaseDelay:       kiroSocketBaseRetryDelay,
+		MaxDelay:        kiroSocketMaxRetryDelay,
+		RetryableStatus: retryableHTTPStatusCodes,
+		RetryableErrors: []string{
+			"connection reset",
+			"connection refused",
+			"broken pipe",
+			"EOF",
+			"timeout",
+			"temporary failure",
+			"no such host",
+			"network is unreachable",
+			"i/o timeout",
+		},
+		FirstTokenTmout: kiroFirstTokenTimeout,
+		StreamReadTmout: kiroStreamingReadTimeout,
+	}
+}
+
+// isRetryableError checks if an error is retryable based on error type and message.
+// Returns true for network timeouts, connection resets, and temporary failures.
+// Based on kiro2Api's retry logic patterns.
+func isRetryableError(err error) bool {
+	if err == nil {
+		return false
+	}
+
+	// Check for context cancellation - not retryable
+	if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+		return false
+	}
+
+	// Check for net.Error (timeout, temporary)
+	var netErr net.Error
+	if errors.As(err, &netErr) {
+		if netErr.Timeout() {
+			log.Debugf("kiro: isRetryableError: network timeout detected")
+			return true
+		}
+		// Note: Temporary() is deprecated but still useful for some error types
+	}
+
+	// Check for specific syscall errors (connection reset, broken pipe, etc.)
+	var syscallErr syscall.Errno
+	if errors.As(err, &syscallErr) {
+		switch syscallErr {
+		case syscall.ECONNRESET: // Connection reset by peer
+			log.Debugf("kiro: isRetryableError: ECONNRESET detected")
+			return true
+		case syscall.ECONNREFUSED: // Connection refused
+			log.Debugf("kiro: isRetryableError: ECONNREFUSED detected")
+			return true
+		case syscall.EPIPE: // Broken pipe
+			log.Debugf("kiro: isRetryableError: EPIPE (broken pipe) detected")
+			return true
+		case syscall.ETIMEDOUT: // Connection timed out
+			log.Debugf("kiro: isRetryableError: ETIMEDOUT detected")
+			return true
+		case syscall.ENETUNREACH: // Network is unreachable
+			log.Debugf("kiro: isRetryableError: ENETUNREACH detected")
+			return true
+		case syscall.EHOSTUNREACH: // No route to host
+			log.Debugf("kiro: isRetryableError: EHOSTUNREACH detected")
+			return true
+		}
+	}
+
+	// Check for net.OpError wrapping other errors
+	var opErr *net.OpError
+	if errors.As(err, &opErr) {
+		log.Debugf("kiro: isRetryableError: net.OpError detected, op=%s", opErr.Op)
+		// Recursively check the wrapped error
+		if opErr.Err != nil {
+			return isRetryableError(opErr.Err)
+		}
+		return true
+	}
+
+	// Check error message for retryable patterns
+	errMsg := strings.ToLower(err.Error())
+	cfg := defaultRetryConfig()
+	for _, pattern := range cfg.RetryableErrors {
+		if strings.Contains(errMsg, pattern) {
+			log.Debugf("kiro: isRetryableError: pattern '%s' matched in error: %s", pattern, errMsg)
+			return true
+		}
+	}
+
+	// Check for EOF which may indicate connection was closed
+	if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) {
+		log.Debugf("kiro: isRetryableError: EOF/UnexpectedEOF detected")
+		return true
+	}
+
+	return false
+}
+
+// isRetryableHTTPStatus checks if an HTTP status code is retryable.
+// Based on kiro2Api: 502, 503, 504 are retryable server errors.
+func isRetryableHTTPStatus(statusCode int) bool {
+	return retryableHTTPStatusCodes[statusCode]
+}
+
+// calculateRetryDelay calculates the delay for the next retry attempt using exponential backoff.
+// delay = min(baseDelay * 2^attempt, maxDelay)
+// Adds ±30% jitter to prevent thundering herd.
+func calculateRetryDelay(attempt int, cfg retryConfig) time.Duration {
+	return kiroauth.ExponentialBackoffWithJitter(attempt, cfg.BaseDelay, cfg.MaxDelay)
+}
+
+// logRetryAttempt logs a retry attempt with relevant context.
+func logRetryAttempt(attempt, maxRetries int, reason string, delay time.Duration, endpoint string) {
+	log.Warnf("kiro: retry attempt %d/%d for %s, waiting %v before next attempt (endpoint: %s)",
+		attempt+1, maxRetries, reason, delay, endpoint)
+}
+
+// kiroHTTPClientPool provides a shared HTTP client with connection pooling for Kiro API.
+// This reduces connection overhead and improves performance for concurrent requests.
+// Based on kiro2Api's connection pooling pattern.
+var (
+	kiroHTTPClientPool     *http.Client
+	kiroHTTPClientPoolOnce sync.Once
+)
+
+// getKiroPooledHTTPClient returns a shared HTTP client with optimized connection pooling.
+// The client is lazily initialized on first use and reused across requests.
+// This is especially beneficial for:
+// - Reducing TCP handshake overhead
+// - Enabling HTTP/2 multiplexing
+// - Better handling of keep-alive connections
+func getKiroPooledHTTPClient() *http.Client {
+	kiroHTTPClientPoolOnce.Do(func() {
+		transport := &http.Transport{
+			// Connection pool settings
+			MaxIdleConns:        100,              // Max idle connections across all hosts
+			MaxIdleConnsPerHost: 20,               // Max idle connections per host
+			MaxConnsPerHost:     50,               // Max total connections per host
+			IdleConnTimeout:     90 * time.Second, // How long idle connections stay in pool
+
+			// Timeouts for connection establishment
+			DialContext: (&net.Dialer{
+				Timeout:   30 * time.Second, // TCP connection timeout
+				KeepAlive: 30 * time.Second, // TCP keep-alive interval
+			}).DialContext,
+
+			// TLS handshake timeout
+			TLSHandshakeTimeout: 10 * time.Second,
+
+			// Response header timeout
+			ResponseHeaderTimeout: 30 * time.Second,
+
+			// Expect 100-continue timeout
+			ExpectContinueTimeout: 1 * time.Second,
+
+			// Enable HTTP/2 when available
+			ForceAttemptHTTP2: true,
+		}
+
+		kiroHTTPClientPool = &http.Client{
+			Transport: transport,
+			// No global timeout - let individual requests set their own timeouts via context
+		}
+
+		log.Debugf("kiro: initialized pooled HTTP client (MaxIdleConns=%d, MaxIdleConnsPerHost=%d, MaxConnsPerHost=%d)",
+			transport.MaxIdleConns, transport.MaxIdleConnsPerHost, transport.MaxConnsPerHost)
+	})
+
+	return kiroHTTPClientPool
+}
+
+// newKiroHTTPClientWithPooling creates an HTTP client that uses connection pooling when appropriate.
+// It respects proxy configuration from auth or config, falling back to the pooled client.
+// This provides the best of both worlds: custom proxy support + connection reuse.
+func newKiroHTTPClientWithPooling(ctx context.Context, cfg *config.Config, auth *cliproxyauth.Auth, timeout time.Duration) *http.Client {
+	// Check if a proxy is configured - if so, we need a custom client
+	var proxyURL string
+	if auth != nil {
+		proxyURL = strings.TrimSpace(auth.ProxyURL)
+	}
+	if proxyURL == "" && cfg != nil {
+		proxyURL = strings.TrimSpace(cfg.ProxyURL)
+	}
+
+	// If proxy is configured, use the existing proxy-aware client (doesn't pool)
+	if proxyURL != "" {
+		log.Debugf("kiro: using proxy-aware HTTP client (proxy=%s)", proxyURL)
+		return newProxyAwareHTTPClient(ctx, cfg, auth, timeout)
+	}
+
+	// No proxy - use pooled client for better performance
+	pooledClient := getKiroPooledHTTPClient()
+
+	// If timeout is specified, we need to wrap the pooled transport with timeout
+	if timeout > 0 {
+		return &http.Client{
+			Transport: pooledClient.Transport,
+			Timeout:   timeout,
+		}
+	}
+
+	return pooledClient
+}
+
 // kiroEndpointConfig bundles endpoint URL with its compatible Origin and AmzTarget values.
 // This solves the "triple mismatch" problem where different endpoints require matching
 // Origin and X-Amz-Target header values.
@@ -99,7 +356,7 @@ var kiroEndpointConfigs = []kiroEndpointConfig{
 		Name:      "CodeWhisperer",
 	},
 	{
-		URL:       "https://q.us-east-1.amazonaws.com/generateAssistantResponse",
+		URL:       "https://q.us-east-1.amazonaws.com/",
 		Origin:    "CLI",
 		AmzTarget: "AmazonQDeveloperStreamingService.SendMessage",
 		Name:      "AmazonQ",
@@ -217,6 +474,29 @@ func NewKiroExecutor(cfg *config.Config) *KiroExecutor {
 // Identifier returns the unique identifier for this executor.
 func (e *KiroExecutor) Identifier() string { return "kiro" }

+// applyDynamicFingerprint applies token-specific fingerprint headers to the request
+// For IDC auth, uses dynamic fingerprint-based User-Agent
+// For other auth types, uses static Amazon Q CLI style headers
+func applyDynamicFingerprint(req *http.Request, auth *cliproxyauth.Auth) {
+	if isIDCAuth(auth) {
+		// Get token-specific fingerprint for dynamic UA generation
+		tokenKey := getTokenKey(auth)
+		fp := getGlobalFingerprintManager().GetFingerprint(tokenKey)
+		
+		// Use fingerprint-generated dynamic User-Agent
+		req.Header.Set("User-Agent", fp.BuildUserAgent())
+		req.Header.Set("X-Amz-User-Agent", fp.BuildAmzUserAgent())
+		req.Header.Set("x-amzn-kiro-agent-mode", kiroIDEAgentModeSpec)
+		
+		log.Debugf("kiro: using dynamic fingerprint for token %s (SDK:%s, OS:%s/%s, Kiro:%s)",
+			tokenKey[:8]+"...", fp.SDKVersion, fp.OSType, fp.OSVersion, fp.KiroVersion)
+	} else {
+		// Use static Amazon Q CLI style headers for non-IDC auth
+		req.Header.Set("User-Agent", kiroUserAgent)
+		req.Header.Set("X-Amz-User-Agent", kiroFullUserAgent)
+	}
+}
+
 // PrepareRequest prepares the HTTP request before execution.
 func (e *KiroExecutor) PrepareRequest(req *http.Request, auth *cliproxyauth.Auth) error {
 	if req == nil {
@@ -226,16 +506,10 @@ func (e *KiroExecutor) PrepareRequest(req *http.Request, auth *cliproxyauth.Auth
 	if strings.TrimSpace(accessToken) == "" {
 		return statusErr{code: http.StatusUnauthorized, msg: "missing access token"}
 	}
-	if isIDCAuth(auth) {
-		req.Header.Set("User-Agent", kiroIDEUserAgent)
-		req.Header.Set("X-Amz-User-Agent", kiroIDEAmzUserAgent)
-		req.Header.Set("x-amzn-kiro-agent-mode", kiroIDEAgentModeSpec)
-	} else {
-		req.Header.Set("User-Agent", kiroUserAgent)
-		req.Header.Set("X-Amz-User-Agent", kiroFullUserAgent)
-		req.Header.Set("x-amzn-kiro-agent-mode", kiroAgentModeVibe)
-	}
-	req.Header.Set("x-amzn-codewhisperer-optout", "true")
+	
+	// Apply dynamic fingerprint-based headers
+	applyDynamicFingerprint(req, auth)
+	
 	req.Header.Set("Amz-Sdk-Request", "attempt=1; max=3")
 	req.Header.Set("Amz-Sdk-Invocation-Id", uuid.New().String())
 	req.Header.Set("Authorization", "Bearer "+accessToken)
@@ -259,10 +533,23 @@ func (e *KiroExecutor) HttpRequest(ctx context.Context, auth *cliproxyauth.Auth,
 	if errPrepare := e.PrepareRequest(httpReq, auth); errPrepare != nil {
 		return nil, errPrepare
 	}
-	httpClient := newProxyAwareHTTPClient(ctx, e.cfg, auth, 0)
+	httpClient := newKiroHTTPClientWithPooling(ctx, e.cfg, auth, 0)
 	return httpClient.Do(httpReq)
 }

+// getTokenKey returns a unique key for rate limiting based on auth credentials.
+// Uses auth ID if available, otherwise falls back to a hash of the access token.
+func getTokenKey(auth *cliproxyauth.Auth) string {
+	if auth != nil && auth.ID != "" {
+		return auth.ID
+	}
+	accessToken, _ := kiroCredentials(auth)
+	if len(accessToken) > 16 {
+		return accessToken[:16]
+	}
+	return accessToken
+}
+
 // Execute sends the request to Kiro API and returns the response.
 // Supports automatic token refresh on 401/403 errors.
 func (e *KiroExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options) (resp cliproxyexecutor.Response, err error) {
@@ -271,23 +558,53 @@ func (e *KiroExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, req
 		return resp, fmt.Errorf("kiro: access token not found in auth")
 	}

+	// Rate limiting: get token key for tracking
+	tokenKey := getTokenKey(auth)
+	rateLimiter := kiroauth.GetGlobalRateLimiter()
+	cooldownMgr := kiroauth.GetGlobalCooldownManager()
+
+	// Check if token is in cooldown period
+	if cooldownMgr.IsInCooldown(tokenKey) {
+		remaining := cooldownMgr.GetRemainingCooldown(tokenKey)
+		reason := cooldownMgr.GetCooldownReason(tokenKey)
+		log.Warnf("kiro: token %s is in cooldown (reason: %s), remaining: %v", tokenKey, reason, remaining)
+		return resp, fmt.Errorf("kiro: token is in cooldown for %v (reason: %s)", remaining, reason)
+	}
+
+	// Wait for rate limiter before proceeding
+	log.Debugf("kiro: waiting for rate limiter for token %s", tokenKey)
+	rateLimiter.WaitForToken(tokenKey)
+	log.Debugf("kiro: rate limiter cleared for token %s", tokenKey)
+
 	reporter := newUsageReporter(ctx, e.Identifier(), req.Model, auth)
 	defer reporter.trackFailure(ctx, &err)

 	// Check if token is expired before making request
 	if e.isTokenExpired(accessToken) {
-		log.Infof("kiro: access token expired, attempting refresh before request")
-		refreshedAuth, refreshErr := e.Refresh(ctx, auth)
-		if refreshErr != nil {
-			log.Warnf("kiro: pre-request token refresh failed: %v", refreshErr)
-		} else if refreshedAuth != nil {
-			auth = refreshedAuth
-			// Persist the refreshed auth to file so subsequent requests use it
-			if persistErr := e.persistRefreshedAuth(auth); persistErr != nil {
-				log.Warnf("kiro: failed to persist refreshed auth: %v", persistErr)
-			}
+		log.Infof("kiro: access token expired, attempting recovery")
+
+		// 方案 B: 先尝试从文件重新加载 token（后台刷新器可能已更新文件）
+		reloadedAuth, reloadErr := e.reloadAuthFromFile(auth)
+		if reloadErr == nil && reloadedAuth != nil {
+			// 文件中有更新的 token，使用它
+			auth = reloadedAuth
 			accessToken, profileArn = kiroCredentials(auth)
-			log.Infof("kiro: token refreshed successfully before request")
+			log.Infof("kiro: recovered token from file (background refresh), expires_at: %v", auth.Metadata["expires_at"])
+		} else {
+			// 文件中的 token 也过期了，执行主动刷新
+			log.Debugf("kiro: file reload failed (%v), attempting active refresh", reloadErr)
+			refreshedAuth, refreshErr := e.Refresh(ctx, auth)
+			if refreshErr != nil {
+				log.Warnf("kiro: pre-request token refresh failed: %v", refreshErr)
+			} else if refreshedAuth != nil {
+				auth = refreshedAuth
+				// Persist the refreshed auth to file so subsequent requests use it
+				if persistErr := e.persistRefreshedAuth(auth); persistErr != nil {
+					log.Warnf("kiro: failed to persist refreshed auth: %v", persistErr)
+				}
+				accessToken, profileArn = kiroCredentials(auth)
+				log.Infof("kiro: token refreshed successfully before request")
+			}
 		}
 	}

@@ -303,7 +620,7 @@ func (e *KiroExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, req

 	// Execute with retry on 401/403 and 429 (quota exhausted)
 	// Note: currentOrigin and kiroPayload are built inside executeWithRetry for each endpoint
-	resp, err = e.executeWithRetry(ctx, auth, req, opts, accessToken, effectiveProfileArn, nil, body, from, to, reporter, "", kiroModelID, isAgentic, isChatOnly)
+	resp, err = e.executeWithRetry(ctx, auth, req, opts, accessToken, effectiveProfileArn, nil, body, from, to, reporter, "", kiroModelID, isAgentic, isChatOnly, tokenKey)
 	return resp, err
 }

@@ -312,9 +629,12 @@ func (e *KiroExecutor) Execute(ctx context.Context, auth *cliproxyauth.Auth, req
 // - Amazon Q endpoint (CLI origin) uses Amazon Q Developer quota
 // - CodeWhisperer endpoint (AI_EDITOR origin) uses Kiro IDE quota
 // Also supports multi-endpoint fallback similar to Antigravity implementation.
-func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, accessToken, profileArn string, kiroPayload, body []byte, from, to sdktranslator.Format, reporter *usageReporter, currentOrigin, kiroModelID string, isAgentic, isChatOnly bool) (cliproxyexecutor.Response, error) {
+// tokenKey is used for rate limiting and cooldown tracking.
+func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, accessToken, profileArn string, kiroPayload, body []byte, from, to sdktranslator.Format, reporter *usageReporter, currentOrigin, kiroModelID string, isAgentic, isChatOnly bool, tokenKey string) (cliproxyexecutor.Response, error) {
 	var resp cliproxyexecutor.Response
 	maxRetries := 2 // Allow retries for token refresh + endpoint fallback
+	rateLimiter := kiroauth.GetGlobalRateLimiter()
+	cooldownMgr := kiroauth.GetGlobalCooldownManager()
 	endpointConfigs := getKiroEndpointConfigs(auth)
 	var last429Err error

@@ -332,6 +652,12 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 			endpointIdx+1, len(endpointConfigs), url, endpointConfig.Name, currentOrigin)

 		for attempt := 0; attempt <= maxRetries; attempt++ {
+			// Apply human-like delay before first request (not on retries)
+			// This mimics natural user behavior patterns
+			if attempt == 0 && endpointIdx == 0 {
+				kiroauth.ApplyHumanLikeDelay()
+			}
+
 			httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(kiroPayload))
 			if err != nil {
 				return resp, err
@@ -342,20 +668,9 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 			// Use endpoint-specific X-Amz-Target (critical for avoiding 403 errors)
 			httpReq.Header.Set("X-Amz-Target", endpointConfig.AmzTarget)

-			// Use different headers based on auth type
-			// IDC auth uses Kiro IDE style headers (from kiro2api)
-			// Other auth types use Amazon Q CLI style headers
-			if isIDCAuth(auth) {
-				httpReq.Header.Set("User-Agent", kiroIDEUserAgent)
-				httpReq.Header.Set("X-Amz-User-Agent", kiroIDEAmzUserAgent)
-				httpReq.Header.Set("x-amzn-kiro-agent-mode", kiroIDEAgentModeSpec)
-				log.Debugf("kiro: using Kiro IDE headers for IDC auth")
-			} else {
-				httpReq.Header.Set("User-Agent", kiroUserAgent)
-				httpReq.Header.Set("X-Amz-User-Agent", kiroFullUserAgent)
-				httpReq.Header.Set("x-amzn-kiro-agent-mode", kiroAgentModeVibe)
-			}
-			httpReq.Header.Set("x-amzn-codewhisperer-optout", "true")
+			// Apply dynamic fingerprint-based headers
+			applyDynamicFingerprint(httpReq, auth)
+			
 			httpReq.Header.Set("Amz-Sdk-Request", "attempt=1; max=3")
 			httpReq.Header.Set("Amz-Sdk-Invocation-Id", uuid.New().String())

@@ -386,10 +701,34 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 				AuthValue: authValue,
 			})

-			httpClient := newProxyAwareHTTPClient(ctx, e.cfg, auth, 120*time.Second)
+			httpClient := newKiroHTTPClientWithPooling(ctx, e.cfg, auth, 120*time.Second)
 			httpResp, err := httpClient.Do(httpReq)
 			if err != nil {
+				// Check for context cancellation first - client disconnected, not a server error
+				// Use 499 (Client Closed Request - nginx convention) instead of 500
+				if errors.Is(err, context.Canceled) {
+					log.Debugf("kiro: request canceled by client (context.Canceled)")
+					return resp, statusErr{code: 499, msg: "client canceled request"}
+				}
+
+				// Check for context deadline exceeded - request timed out
+				// Return 504 Gateway Timeout instead of 500
+				if errors.Is(err, context.DeadlineExceeded) {
+					log.Debugf("kiro: request timed out (context.DeadlineExceeded)")
+					return resp, statusErr{code: http.StatusGatewayTimeout, msg: "upstream request timed out"}
+				}
+
 				recordAPIResponseError(ctx, e.cfg, err)
+
+				// Enhanced socket retry: Check if error is retryable (network timeout, connection reset, etc.)
+				retryCfg := defaultRetryConfig()
+				if isRetryableError(err) && attempt < retryCfg.MaxRetries {
+					delay := calculateRetryDelay(attempt, retryCfg)
+					logRetryAttempt(attempt, retryCfg.MaxRetries, fmt.Sprintf("socket error: %v", err), delay, endpointConfig.Name)
+					time.Sleep(delay)
+					continue
+				}
+
 				return resp, err
 			}
 			recordAPIResponseMetadata(ctx, e.cfg, httpResp.StatusCode, httpResp.Header.Clone())
@@ -401,6 +740,12 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 				_ = httpResp.Body.Close()
 				appendAPIResponseChunk(ctx, e.cfg, respBody)

+				// Record failure and set cooldown for 429
+				rateLimiter.MarkTokenFailed(tokenKey)
+				cooldownDuration := kiroauth.CalculateCooldownFor429(attempt)
+				cooldownMgr.SetCooldown(tokenKey, cooldownDuration, kiroauth.CooldownReason429)
+				log.Warnf("kiro: rate limit hit (429), token %s set to cooldown for %v", tokenKey, cooldownDuration)
+
 				// Preserve last 429 so callers can correctly backoff when all endpoints are exhausted
 				last429Err = statusErr{code: httpResp.StatusCode, msg: string(respBody)}

@@ -412,13 +757,21 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 			}

 			// Handle 5xx server errors with exponential backoff retry
+			// Enhanced: Use retryConfig for consistent retry behavior
 			if httpResp.StatusCode >= 500 && httpResp.StatusCode < 600 {
 				respBody, _ := io.ReadAll(httpResp.Body)
 				_ = httpResp.Body.Close()
 				appendAPIResponseChunk(ctx, e.cfg, respBody)

-				if attempt < maxRetries {
-					// Exponential backoff: 1s, 2s, 4s... (max 30s)
+				retryCfg := defaultRetryConfig()
+				// Check if this specific 5xx code is retryable (502, 503, 504)
+				if isRetryableHTTPStatus(httpResp.StatusCode) && attempt < retryCfg.MaxRetries {
+					delay := calculateRetryDelay(attempt, retryCfg)
+					logRetryAttempt(attempt, retryCfg.MaxRetries, fmt.Sprintf("HTTP %d", httpResp.StatusCode), delay, endpointConfig.Name)
+					time.Sleep(delay)
+					continue
+				} else if attempt < maxRetries {
+					// Fallback for other 5xx errors (500, 501, etc.)
 					backoff := time.Duration(1<<attempt) * time.Second
 					if backoff > 30*time.Second {
 						backoff = 30 * time.Second
@@ -492,7 +845,10 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.

 				// Check for SUSPENDED status - return immediately without retry
 				if strings.Contains(respBodyStr, "SUSPENDED") || strings.Contains(respBodyStr, "TEMPORARILY_SUSPENDED") {
-					log.Errorf("kiro: account is suspended, cannot proceed")
+					// Set long cooldown for suspended accounts
+					rateLimiter.CheckAndMarkSuspended(tokenKey, respBodyStr)
+					cooldownMgr.SetCooldown(tokenKey, kiroauth.LongCooldown, kiroauth.CooldownReasonSuspended)
+					log.Errorf("kiro: account is suspended, token %s set to cooldown for %v", tokenKey, kiroauth.LongCooldown)
 					return resp, statusErr{code: httpResp.StatusCode, msg: "account suspended: " + string(respBody)}
 				}

@@ -581,6 +937,10 @@ func (e *KiroExecutor) executeWithRetry(ctx context.Context, auth *cliproxyauth.
 			appendAPIResponseChunk(ctx, e.cfg, []byte(content))
 			reporter.publish(ctx, usageInfo)

+			// Record success for rate limiting
+			rateLimiter.MarkTokenSuccess(tokenKey)
+			log.Debugf("kiro: request successful, token %s marked as success", tokenKey)
+
 			// Build response in Claude format for Kiro translator
 			// stopReason is extracted from upstream response by parseEventStream
 			kiroResponse := kiroclaude.BuildClaudeResponse(content, toolUses, req.Model, usageInfo, stopReason)
@@ -608,23 +968,53 @@ func (e *KiroExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.Aut
 		return nil, fmt.Errorf("kiro: access token not found in auth")
 	}

+	// Rate limiting: get token key for tracking
+	tokenKey := getTokenKey(auth)
+	rateLimiter := kiroauth.GetGlobalRateLimiter()
+	cooldownMgr := kiroauth.GetGlobalCooldownManager()
+
+	// Check if token is in cooldown period
+	if cooldownMgr.IsInCooldown(tokenKey) {
+		remaining := cooldownMgr.GetRemainingCooldown(tokenKey)
+		reason := cooldownMgr.GetCooldownReason(tokenKey)
+		log.Warnf("kiro: token %s is in cooldown (reason: %s), remaining: %v", tokenKey, reason, remaining)
+		return nil, fmt.Errorf("kiro: token is in cooldown for %v (reason: %s)", remaining, reason)
+	}
+
+	// Wait for rate limiter before proceeding
+	log.Debugf("kiro: stream waiting for rate limiter for token %s", tokenKey)
+	rateLimiter.WaitForToken(tokenKey)
+	log.Debugf("kiro: stream rate limiter cleared for token %s", tokenKey)
+
 	reporter := newUsageReporter(ctx, e.Identifier(), req.Model, auth)
 	defer reporter.trackFailure(ctx, &err)

 	// Check if token is expired before making request
 	if e.isTokenExpired(accessToken) {
-		log.Infof("kiro: access token expired, attempting refresh before stream request")
-		refreshedAuth, refreshErr := e.Refresh(ctx, auth)
-		if refreshErr != nil {
-			log.Warnf("kiro: pre-request token refresh failed: %v", refreshErr)
-		} else if refreshedAuth != nil {
-			auth = refreshedAuth
-			// Persist the refreshed auth to file so subsequent requests use it
-			if persistErr := e.persistRefreshedAuth(auth); persistErr != nil {
-				log.Warnf("kiro: failed to persist refreshed auth: %v", persistErr)
-			}
+		log.Infof("kiro: access token expired, attempting recovery before stream request")
+
+		// 方案 B: 先尝试从文件重新加载 token（后台刷新器可能已更新文件）
+		reloadedAuth, reloadErr := e.reloadAuthFromFile(auth)
+		if reloadErr == nil && reloadedAuth != nil {
+			// 文件中有更新的 token，使用它
+			auth = reloadedAuth
 			accessToken, profileArn = kiroCredentials(auth)
-			log.Infof("kiro: token refreshed successfully before stream request")
+			log.Infof("kiro: recovered token from file (background refresh) for stream, expires_at: %v", auth.Metadata["expires_at"])
+		} else {
+			// 文件中的 token 也过期了，执行主动刷新
+			log.Debugf("kiro: file reload failed (%v), attempting active refresh for stream", reloadErr)
+			refreshedAuth, refreshErr := e.Refresh(ctx, auth)
+			if refreshErr != nil {
+				log.Warnf("kiro: pre-request token refresh failed: %v", refreshErr)
+			} else if refreshedAuth != nil {
+				auth = refreshedAuth
+				// Persist the refreshed auth to file so subsequent requests use it
+				if persistErr := e.persistRefreshedAuth(auth); persistErr != nil {
+					log.Warnf("kiro: failed to persist refreshed auth: %v", persistErr)
+				}
+				accessToken, profileArn = kiroCredentials(auth)
+				log.Infof("kiro: token refreshed successfully before stream request")
+			}
 		}
 	}

@@ -640,7 +1030,7 @@ func (e *KiroExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.Aut

 	// Execute stream with retry on 401/403 and 429 (quota exhausted)
 	// Note: currentOrigin and kiroPayload are built inside executeStreamWithRetry for each endpoint
-	return e.executeStreamWithRetry(ctx, auth, req, opts, accessToken, effectiveProfileArn, nil, body, from, reporter, "", kiroModelID, isAgentic, isChatOnly)
+	return e.executeStreamWithRetry(ctx, auth, req, opts, accessToken, effectiveProfileArn, nil, body, from, reporter, "", kiroModelID, isAgentic, isChatOnly, tokenKey)
 }

 // executeStreamWithRetry performs the streaming HTTP request with automatic retry on auth errors.
@@ -648,8 +1038,11 @@ func (e *KiroExecutor) ExecuteStream(ctx context.Context, auth *cliproxyauth.Aut
 // - Amazon Q endpoint (CLI origin) uses Amazon Q Developer quota
 // - CodeWhisperer endpoint (AI_EDITOR origin) uses Kiro IDE quota
 // Also supports multi-endpoint fallback similar to Antigravity implementation.
-func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, accessToken, profileArn string, kiroPayload, body []byte, from sdktranslator.Format, reporter *usageReporter, currentOrigin, kiroModelID string, isAgentic, isChatOnly bool) (<-chan cliproxyexecutor.StreamChunk, error) {
+// tokenKey is used for rate limiting and cooldown tracking.
+func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, accessToken, profileArn string, kiroPayload, body []byte, from sdktranslator.Format, reporter *usageReporter, currentOrigin, kiroModelID string, isAgentic, isChatOnly bool, tokenKey string) (<-chan cliproxyexecutor.StreamChunk, error) {
 	maxRetries := 2 // Allow retries for token refresh + endpoint fallback
+	rateLimiter := kiroauth.GetGlobalRateLimiter()
+	cooldownMgr := kiroauth.GetGlobalCooldownManager()
 	endpointConfigs := getKiroEndpointConfigs(auth)
 	var last429Err error

@@ -667,6 +1060,13 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox
 			endpointIdx+1, len(endpointConfigs), url, endpointConfig.Name, currentOrigin)

 		for attempt := 0; attempt <= maxRetries; attempt++ {
+			// Apply human-like delay before first streaming request (not on retries)
+			// This mimics natural user behavior patterns
+			// Note: Delay is NOT applied during streaming response - only before initial request
+			if attempt == 0 && endpointIdx == 0 {
+				kiroauth.ApplyHumanLikeDelay()
+			}
+
 			httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(kiroPayload))
 			if err != nil {
 				return nil, err
@@ -677,20 +1077,9 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox
 			// Use endpoint-specific X-Amz-Target (critical for avoiding 403 errors)
 			httpReq.Header.Set("X-Amz-Target", endpointConfig.AmzTarget)

-			// Use different headers based on auth type
-			// IDC auth uses Kiro IDE style headers (from kiro2api)
-			// Other auth types use Amazon Q CLI style headers
-			if isIDCAuth(auth) {
-				httpReq.Header.Set("User-Agent", kiroIDEUserAgent)
-				httpReq.Header.Set("X-Amz-User-Agent", kiroIDEAmzUserAgent)
-				httpReq.Header.Set("x-amzn-kiro-agent-mode", kiroIDEAgentModeSpec)
-				log.Debugf("kiro: using Kiro IDE headers for IDC auth")
-			} else {
-				httpReq.Header.Set("User-Agent", kiroUserAgent)
-				httpReq.Header.Set("X-Amz-User-Agent", kiroFullUserAgent)
-				httpReq.Header.Set("x-amzn-kiro-agent-mode", kiroAgentModeVibe)
-			}
-			httpReq.Header.Set("x-amzn-codewhisperer-optout", "true")
+			// Apply dynamic fingerprint-based headers
+			applyDynamicFingerprint(httpReq, auth)
+			
 			httpReq.Header.Set("Amz-Sdk-Request", "attempt=1; max=3")
 			httpReq.Header.Set("Amz-Sdk-Invocation-Id", uuid.New().String())

@@ -721,10 +1110,20 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox
 				AuthValue: authValue,
 			})

-			httpClient := newProxyAwareHTTPClient(ctx, e.cfg, auth, 0)
+			httpClient := newKiroHTTPClientWithPooling(ctx, e.cfg, auth, 0)
 			httpResp, err := httpClient.Do(httpReq)
 			if err != nil {
 				recordAPIResponseError(ctx, e.cfg, err)
+
+				// Enhanced socket retry for streaming: Check if error is retryable (network timeout, connection reset, etc.)
+				retryCfg := defaultRetryConfig()
+				if isRetryableError(err) && attempt < retryCfg.MaxRetries {
+					delay := calculateRetryDelay(attempt, retryCfg)
+					logRetryAttempt(attempt, retryCfg.MaxRetries, fmt.Sprintf("stream socket error: %v", err), delay, endpointConfig.Name)
+					time.Sleep(delay)
+					continue
+				}
+
 				return nil, err
 			}
 			recordAPIResponseMetadata(ctx, e.cfg, httpResp.StatusCode, httpResp.Header.Clone())
@@ -736,6 +1135,12 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox
 				_ = httpResp.Body.Close()
 				appendAPIResponseChunk(ctx, e.cfg, respBody)

+				// Record failure and set cooldown for 429
+				rateLimiter.MarkTokenFailed(tokenKey)
+				cooldownDuration := kiroauth.CalculateCooldownFor429(attempt)
+				cooldownMgr.SetCooldown(tokenKey, cooldownDuration, kiroauth.CooldownReason429)
+				log.Warnf("kiro: stream rate limit hit (429), token %s set to cooldown for %v", tokenKey, cooldownDuration)
+
 				// Preserve last 429 so callers can correctly backoff when all endpoints are exhausted
 				last429Err = statusErr{code: httpResp.StatusCode, msg: string(respBody)}

@@ -747,13 +1152,21 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox
 			}

 			// Handle 5xx server errors with exponential backoff retry
+			// Enhanced: Use retryConfig for consistent retry behavior
 			if httpResp.StatusCode >= 500 && httpResp.StatusCode < 600 {
 				respBody, _ := io.ReadAll(httpResp.Body)
 				_ = httpResp.Body.Close()
 				appendAPIResponseChunk(ctx, e.cfg, respBody)

-				if attempt < maxRetries {
-					// Exponential backoff: 1s, 2s, 4s... (max 30s)
+				retryCfg := defaultRetryConfig()
+				// Check if this specific 5xx code is retryable (502, 503, 504)
+				if isRetryableHTTPStatus(httpResp.StatusCode) && attempt < retryCfg.MaxRetries {
+					delay := calculateRetryDelay(attempt, retryCfg)
+					logRetryAttempt(attempt, retryCfg.MaxRetries, fmt.Sprintf("stream HTTP %d", httpResp.StatusCode), delay, endpointConfig.Name)
+					time.Sleep(delay)
+					continue
+				} else if attempt < maxRetries {
+					// Fallback for other 5xx errors (500, 501, etc.)
 					backoff := time.Duration(1<<attempt) * time.Second
 					if backoff > 30*time.Second {
 						backoff = 30 * time.Second
@@ -840,7 +1253,10 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox

 				// Check for SUSPENDED status - return immediately without retry
 				if strings.Contains(respBodyStr, "SUSPENDED") || strings.Contains(respBodyStr, "TEMPORARILY_SUSPENDED") {
-					log.Errorf("kiro: account is suspended, cannot proceed")
+					// Set long cooldown for suspended accounts
+					rateLimiter.CheckAndMarkSuspended(tokenKey, respBodyStr)
+					cooldownMgr.SetCooldown(tokenKey, kiroauth.LongCooldown, kiroauth.CooldownReasonSuspended)
+					log.Errorf("kiro: stream account is suspended, token %s set to cooldown for %v", tokenKey, kiroauth.LongCooldown)
 					return nil, statusErr{code: httpResp.StatusCode, msg: "account suspended: " + string(respBody)}
 				}

@@ -890,6 +1306,11 @@ func (e *KiroExecutor) executeStreamWithRetry(ctx context.Context, auth *cliprox

 			out := make(chan cliproxyexecutor.StreamChunk)

+			// Record success immediately since connection was established successfully
+			// Streaming errors will be handled separately
+			rateLimiter.MarkTokenSuccess(tokenKey)
+			log.Debugf("kiro: stream request successful, token %s marked as success", tokenKey)
+
 			go func(resp *http.Response, thinkingEnabled bool) {
 				defer close(out)
 				defer func() {
@@ -3116,14 +3537,14 @@ func (e *KiroExecutor) Refresh(ctx context.Context, auth *cliproxyauth.Auth) (*c
 		// Also check if expires_at is now in the future with sufficient buffer
 		if expiresAt, ok := auth.Metadata["expires_at"].(string); ok {
 			if expTime, err := time.Parse(time.RFC3339, expiresAt); err == nil {
-				// If token expires more than 5 minutes from now, it's still valid
-				if time.Until(expTime) > 5*time.Minute {
+				// If token expires more than 20 minutes from now, it's still valid
+				if time.Until(expTime) > 20*time.Minute {
 					log.Debugf("kiro executor: token is still valid (expires in %v), skipping refresh", time.Until(expTime))
 					// CRITICAL FIX: Set NextRefreshAfter to prevent frequent refresh checks
-					// Without this, shouldRefresh() will return true again in 5 seconds
+					// Without this, shouldRefresh() will return true again in 30 seconds
 					updated := auth.Clone()
-					// Set next refresh to 5 minutes before expiry, or at least 30 seconds from now
-					nextRefresh := expTime.Add(-5 * time.Minute)
+					// Set next refresh to 20 minutes before expiry, or at least 30 seconds from now
+					nextRefresh := expTime.Add(-20 * time.Minute)
 					minNextRefresh := time.Now().Add(30 * time.Second)
 					if nextRefresh.Before(minNextRefresh) {
 						nextRefresh = minNextRefresh
@@ -3220,6 +3641,13 @@ func (e *KiroExecutor) Refresh(ctx context.Context, auth *cliproxyauth.Auth) (*c
 	if tokenData.ClientSecret != "" {
 		updated.Metadata["client_secret"] = tokenData.ClientSecret
 	}
+	// Preserve region and start_url for IDC token refresh
+	if tokenData.Region != "" {
+		updated.Metadata["region"] = tokenData.Region
+	}
+	if tokenData.StartURL != "" {
+		updated.Metadata["start_url"] = tokenData.StartURL
+	}

 	if updated.Attributes == nil {
 		updated.Attributes = make(map[string]string)
@@ -3229,9 +3657,9 @@ func (e *KiroExecutor) Refresh(ctx context.Context, auth *cliproxyauth.Auth) (*c
 		updated.Attributes["profile_arn"] = tokenData.ProfileArn
 	}

-	// NextRefreshAfter is aligned with RefreshLead (5min)
+	// NextRefreshAfter is aligned with RefreshLead (20min)
 	if expiresAt, parseErr := time.Parse(time.RFC3339, tokenData.ExpiresAt); parseErr == nil {
-		updated.NextRefreshAfter = expiresAt.Add(-5 * time.Minute)
+		updated.NextRefreshAfter = expiresAt.Add(-20 * time.Minute)
 	}

 	log.Infof("kiro executor: token refreshed successfully, expires at %s", tokenData.ExpiresAt)
@@ -3285,6 +3713,121 @@ func (e *KiroExecutor) persistRefreshedAuth(auth *cliproxyauth.Auth) error {
 	return nil
 }

+// reloadAuthFromFile 从文件重新加载 auth 数据（方案 B: Fallback 机制）
+// 当内存中的 token 已过期时，尝试从文件读取最新的 token
+// 这解决了后台刷新器已更新文件但内存中 Auth 对象尚未同步的时间差问题
+func (e *KiroExecutor) reloadAuthFromFile(auth *cliproxyauth.Auth) (*cliproxyauth.Auth, error) {
+	if auth == nil {
+		return nil, fmt.Errorf("kiro executor: cannot reload nil auth")
+	}
+
+	// 确定文件路径
+	var authPath string
+	if auth.Attributes != nil {
+		if p := strings.TrimSpace(auth.Attributes["path"]); p != "" {
+			authPath = p
+		}
+	}
+	if authPath == "" {
+		fileName := strings.TrimSpace(auth.FileName)
+		if fileName == "" {
+			return nil, fmt.Errorf("kiro executor: auth has no file path or filename for reload")
+		}
+		if filepath.IsAbs(fileName) {
+			authPath = fileName
+		} else if e.cfg != nil && e.cfg.AuthDir != "" {
+			authPath = filepath.Join(e.cfg.AuthDir, fileName)
+		} else {
+			return nil, fmt.Errorf("kiro executor: cannot determine auth file path for reload")
+		}
+	}
+
+	// 读取文件
+	raw, err := os.ReadFile(authPath)
+	if err != nil {
+		return nil, fmt.Errorf("kiro executor: failed to read auth file %s: %w", authPath, err)
+	}
+
+	// 解析 JSON
+	var metadata map[string]any
+	if err := json.Unmarshal(raw, &metadata); err != nil {
+		return nil, fmt.Errorf("kiro executor: failed to parse auth file %s: %w", authPath, err)
+	}
+
+	// 检查文件中的 token 是否比内存中的更新
+	fileExpiresAt, _ := metadata["expires_at"].(string)
+	fileAccessToken, _ := metadata["access_token"].(string)
+	memExpiresAt, _ := auth.Metadata["expires_at"].(string)
+	memAccessToken, _ := auth.Metadata["access_token"].(string)
+
+	// 文件中必须有有效的 access_token
+	if fileAccessToken == "" {
+		return nil, fmt.Errorf("kiro executor: auth file has no access_token field")
+	}
+
+	// 如果有 expires_at，检查是否过期
+	if fileExpiresAt != "" {
+		fileExpTime, parseErr := time.Parse(time.RFC3339, fileExpiresAt)
+		if parseErr == nil {
+			// 如果文件中的 token 也已过期，不使用它
+			if time.Now().After(fileExpTime) {
+				log.Debugf("kiro executor: file token also expired at %s, not using", fileExpiresAt)
+				return nil, fmt.Errorf("kiro executor: file token also expired")
+			}
+		}
+	}
+
+	// 判断文件中的 token 是否比内存中的更新
+	// 条件1: access_token 不同（说明已刷新）
+	// 条件2: expires_at 更新（说明已刷新）
+	isNewer := false
+
+	// 优先检查 access_token 是否变化
+	if fileAccessToken != memAccessToken {
+		isNewer = true
+		log.Debugf("kiro executor: file access_token differs from memory, using file token")
+	}
+
+	// 如果 access_token 相同，检查 expires_at
+	if !isNewer && fileExpiresAt != "" && memExpiresAt != "" {
+		fileExpTime, fileParseErr := time.Parse(time.RFC3339, fileExpiresAt)
+		memExpTime, memParseErr := time.Parse(time.RFC3339, memExpiresAt)
+		if fileParseErr == nil && memParseErr == nil && fileExpTime.After(memExpTime) {
+			isNewer = true
+			log.Debugf("kiro executor: file expires_at (%s) is newer than memory (%s)", fileExpiresAt, memExpiresAt)
+		}
+	}
+
+	// 如果文件中没有 expires_at 但 access_token 相同，无法判断是否更新
+	if !isNewer && fileExpiresAt == "" && fileAccessToken == memAccessToken {
+		return nil, fmt.Errorf("kiro executor: cannot determine if file token is newer (no expires_at, same access_token)")
+	}
+
+	if !isNewer {
+		log.Debugf("kiro executor: file token not newer than memory token")
+		return nil, fmt.Errorf("kiro executor: file token not newer")
+	}
+
+	// 创建更新后的 auth 对象
+	updated := auth.Clone()
+	updated.Metadata = metadata
+	updated.UpdatedAt = time.Now()
+
+	// 同步更新 Attributes
+	if updated.Attributes == nil {
+		updated.Attributes = make(map[string]string)
+	}
+	if accessToken, ok := metadata["access_token"].(string); ok {
+		updated.Attributes["access_token"] = accessToken
+	}
+	if profileArn, ok := metadata["profile_arn"].(string); ok {
+		updated.Attributes["profile_arn"] = profileArn
+	}
+
+	log.Infof("kiro executor: reloaded auth from file %s, new expires_at: %s", authPath, fileExpiresAt)
+	return updated, nil
+}
+
 // isTokenExpired checks if a JWT access token has expired.
 // Returns true if the token is expired or cannot be parsed.
 func (e *KiroExecutor) isTokenExpired(accessToken string) bool {
--- a/internal/translator/antigravity/claude/antigravity_claude_request.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_request.go
@@ -7,8 +7,6 @@ package claude

 import (
 	"bytes"
-	"crypto/sha256"
-	"encoding/hex"
 	"strings"

 	"github.com/router-for-me/CLIProxyAPI/v6/internal/cache"
@@ -19,37 +17,6 @@ import (
 	"github.com/tidwall/sjson"
 )

-// deriveSessionID generates a stable session ID from the request.
-// Uses the hash of the first user message to identify the conversation.
-func deriveSessionID(rawJSON []byte) string {
-	userIDResult := gjson.GetBytes(rawJSON, "metadata.user_id")
-	if userIDResult.Exists() {
-		userID := userIDResult.String()
-		idx := strings.Index(userID, "session_")
-		if idx != -1 {
-			return userID[idx+8:]
-		}
-	}
-	messages := gjson.GetBytes(rawJSON, "messages")
-	if !messages.IsArray() {
-		return ""
-	}
-	for _, msg := range messages.Array() {
-		if msg.Get("role").String() == "user" {
-			content := msg.Get("content").String()
-			if content == "" {
-				// Try to get text from content array
-				content = msg.Get("content.0.text").String()
-			}
-			if content != "" {
-				h := sha256.Sum256([]byte(content))
-				return hex.EncodeToString(h[:16])
-			}
-		}
-	}
-	return ""
-}
-
 // ConvertClaudeRequestToAntigravity parses and transforms a Claude Code API request into Gemini CLI API format.
 // It extracts the model name, system instruction, message contents, and tool declarations
 // from the raw JSON request and returns them in the format expected by the Gemini CLI API.
@@ -72,9 +39,6 @@ func ConvertClaudeRequestToAntigravity(modelName string, inputRawJSON []byte, _
 	enableThoughtTranslate := true
 	rawJSON := bytes.Clone(inputRawJSON)

-	// Derive session ID for signature caching
-	sessionID := deriveSessionID(rawJSON)
-
 	// system instruction
 	systemInstructionJSON := ""
 	hasSystemInstruction := false
@@ -137,8 +101,8 @@ func ConvertClaudeRequestToAntigravity(modelName string, inputRawJSON []byte, _
 						// Always try cached signature first (more reliable than client-provided)
 						// Client may send stale or invalid signatures from different sessions
 						signature := ""
-						if sessionID != "" && thinkingText != "" {
-							if cachedSig := cache.GetCachedSignature(modelName, sessionID, thinkingText); cachedSig != "" {
+						if thinkingText != "" {
+							if cachedSig := cache.GetCachedSignature(modelName, thinkingText); cachedSig != "" {
 								signature = cachedSig
 								// log.Debugf("Using cached signature for thinking block")
 							}
@@ -156,19 +120,19 @@ func ConvertClaudeRequestToAntigravity(modelName string, inputRawJSON []byte, _
 									}
 								}
 							}
-							if cache.HasValidSignature(clientSignature) {
+							if cache.HasValidSignature(modelName, clientSignature) {
 								signature = clientSignature
 							}
 							// log.Debugf("Using client-provided signature for thinking block")
 						}

 						// Store for subsequent tool_use in the same message
-						if cache.HasValidSignature(signature) {
+						if cache.HasValidSignature(modelName, signature) {
 							currentMessageThinkingSignature = signature
 						}

 						// Skip trailing unsigned thinking blocks on last assistant message
-						isUnsigned := !cache.HasValidSignature(signature)
+						isUnsigned := !cache.HasValidSignature(modelName, signature)

 						// If unsigned, skip entirely (don't convert to text)
 						// Claude requires assistant messages to start with thinking blocks when thinking is enabled
@@ -223,7 +187,7 @@ func ConvertClaudeRequestToAntigravity(modelName string, inputRawJSON []byte, _
 							// This is the approach used in opencode-google-antigravity-auth for Gemini
 							// and also works for Claude through Antigravity API
 							const skipSentinel = "skip_thought_signature_validator"
-							if cache.HasValidSignature(currentMessageThinkingSignature) {
+							if cache.HasValidSignature(modelName, currentMessageThinkingSignature) {
 								partJSON, _ = sjson.Set(partJSON, "thoughtSignature", currentMessageThinkingSignature)
 							} else {
 								// No valid signature - use skip sentinel to bypass validation
--- a/internal/translator/antigravity/claude/antigravity_claude_request_test.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_request_test.go
@@ -98,10 +98,7 @@ func TestConvertClaudeRequestToAntigravity_ThinkingBlocks(t *testing.T) {
 		]
 	}`)

-	// Derive session ID and cache the signature
-	sessionID := deriveSessionID(inputJSON)
-	cache.CacheSignature(sessionID, thinkingText, validSignature)
-	defer cache.ClearSignatureCache(sessionID)
+	cache.CacheSignature("claude-sonnet-4-5-thinking", thinkingText, validSignature)

 	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5-thinking", inputJSON, false)
 	outputStr := string(output)
@@ -266,10 +263,7 @@ func TestConvertClaudeRequestToAntigravity_ToolUse_WithSignature(t *testing.T) {
 		]
 	}`)

-	// Derive session ID and cache the signature
-	sessionID := deriveSessionID(inputJSON)
-	cache.CacheSignature(sessionID, thinkingText, validSignature)
-	defer cache.ClearSignatureCache(sessionID)
+	cache.CacheSignature("claude-sonnet-4-5-thinking", thinkingText, validSignature)

 	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5-thinking", inputJSON, false)
 	outputStr := string(output)
@@ -306,10 +300,7 @@ func TestConvertClaudeRequestToAntigravity_ReorderThinking(t *testing.T) {
 		]
 	}`)

-	// Derive session ID and cache the signature
-	sessionID := deriveSessionID(inputJSON)
-	cache.CacheSignature(sessionID, thinkingText, validSignature)
-	defer cache.ClearSignatureCache(sessionID)
+	cache.CacheSignature("claude-sonnet-4-5-thinking", thinkingText, validSignature)

 	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5-thinking", inputJSON, false)
 	outputStr := string(output)
@@ -517,10 +508,7 @@ func TestConvertClaudeRequestToAntigravity_TrailingSignedThinking_Kept(t *testin
 		]
 	}`)

-	// Derive session ID and cache the signature
-	sessionID := deriveSessionID(inputJSON)
-	cache.CacheSignature(sessionID, thinkingText, validSignature)
-	defer cache.ClearSignatureCache(sessionID)
+	cache.CacheSignature("claude-sonnet-4-5-thinking", thinkingText, validSignature)

 	output := ConvertClaudeRequestToAntigravity("claude-sonnet-4-5-thinking", inputJSON, false)
 	outputStr := string(output)
--- a/internal/translator/antigravity/claude/antigravity_claude_response.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_response.go
@@ -41,7 +41,6 @@ type Params struct {
 	HasContent           bool   // Tracks whether any content (text, thinking, or tool use) has been output

 	// Signature caching support
-	SessionID           string          // Session ID derived from request for signature caching
 	CurrentThinkingText strings.Builder // Accumulates thinking text for signature caching
 }

@@ -70,7 +69,6 @@ func ConvertAntigravityResponseToClaude(_ context.Context, _ string, originalReq
 			HasFirstResponse: false,
 			ResponseType:     0,
 			ResponseIndex:    0,
-			SessionID:        deriveSessionID(originalRequestRawJSON),
 		}
 	}
 	modelName := gjson.GetBytes(requestRawJSON, "model").String()
@@ -139,8 +137,8 @@ func ConvertAntigravityResponseToClaude(_ context.Context, _ string, originalReq
 					if thoughtSignature := partResult.Get("thoughtSignature"); thoughtSignature.Exists() && thoughtSignature.String() != "" {
 						// log.Debug("Branch: signature_delta")

-						if params.SessionID != "" && params.CurrentThinkingText.Len() > 0 {
-							cache.CacheSignature(modelName, params.SessionID, params.CurrentThinkingText.String(), thoughtSignature.String())
+						if params.CurrentThinkingText.Len() > 0 {
+							cache.CacheSignature(modelName, params.CurrentThinkingText.String(), thoughtSignature.String())
 							// log.Debugf("Cached signature for thinking block (sessionID=%s, textLen=%d)", params.SessionID, params.CurrentThinkingText.Len())
 							params.CurrentThinkingText.Reset()
 						}
--- a/internal/translator/antigravity/claude/antigravity_claude_response_test.go
+++ b/internal/translator/antigravity/claude/antigravity_claude_response_test.go
@@ -97,6 +97,7 @@ func TestConvertAntigravityResponseToClaude_SignatureCached(t *testing.T) {
 	cache.ClearSignatureCache("")

 	requestJSON := []byte(`{
+		"model": "claude-sonnet-4-5-thinking",
 		"messages": [{"role": "user", "content": [{"type": "text", "text": "Cache test"}]}]
 	}`)

@@ -143,7 +144,7 @@ func TestConvertAntigravityResponseToClaude_SignatureCached(t *testing.T) {
 	ConvertAntigravityResponseToClaude(ctx, "claude-sonnet-4-5-thinking", requestJSON, requestJSON, signatureChunk, &param)

 	// Verify signature was cached
-	cachedSig := cache.GetCachedSignature(sessionID, thinkingText)
+	cachedSig := cache.GetCachedSignature("claude-sonnet-4-5-thinking", thinkingText)
 	if cachedSig != validSignature {
 		t.Errorf("Expected cached signature '%s', got '%s'", validSignature, cachedSig)
 	}
@@ -158,6 +159,7 @@ func TestConvertAntigravityResponseToClaude_MultipleThinkingBlocks(t *testing.T)
 	cache.ClearSignatureCache("")

 	requestJSON := []byte(`{
+		"model": "claude-sonnet-4-5-thinking",
 		"messages": [{"role": "user", "content": [{"type": "text", "text": "Multi block test"}]}]
 	}`)

@@ -221,13 +223,12 @@ func TestConvertAntigravityResponseToClaude_MultipleThinkingBlocks(t *testing.T)
 	// Process first thinking block
 	ConvertAntigravityResponseToClaude(ctx, "claude-sonnet-4-5-thinking", requestJSON, requestJSON, block1Thinking, &param)
 	params := param.(*Params)
-	sessionID := params.SessionID
 	firstThinkingText := params.CurrentThinkingText.String()

 	ConvertAntigravityResponseToClaude(ctx, "claude-sonnet-4-5-thinking", requestJSON, requestJSON, block1Sig, &param)

 	// Verify first signature cached
-	if cache.GetCachedSignature(sessionID, firstThinkingText) != validSig1 {
+	if cache.GetCachedSignature("claude-sonnet-4-5-thinking", firstThinkingText) != validSig1 {
 		t.Error("First thinking block signature should be cached")
 	}

@@ -241,76 +242,7 @@ func TestConvertAntigravityResponseToClaude_MultipleThinkingBlocks(t *testing.T)
 	ConvertAntigravityResponseToClaude(ctx, "claude-sonnet-4-5-thinking", requestJSON, requestJSON, block2Sig, &param)

 	// Verify second signature cached
-	if cache.GetCachedSignature(sessionID, secondThinkingText) != validSig2 {
+	if cache.GetCachedSignature("claude-sonnet-4-5-thinking", secondThinkingText) != validSig2 {
 		t.Error("Second thinking block signature should be cached")
 	}
 }
-
-func TestDeriveSessionIDFromRequest(t *testing.T) {
-	tests := []struct {
-		name      string
-		input     []byte
-		wantEmpty bool
-	}{
-		{
-			name:      "valid user message",
-			input:     []byte(`{"messages": [{"role": "user", "content": "Hello"}]}`),
-			wantEmpty: false,
-		},
-		{
-			name:      "user message with content array",
-			input:     []byte(`{"messages": [{"role": "user", "content": [{"type": "text", "text": "Hello"}]}]}`),
-			wantEmpty: false,
-		},
-		{
-			name:      "no user message",
-			input:     []byte(`{"messages": [{"role": "assistant", "content": "Hi"}]}`),
-			wantEmpty: true,
-		},
-		{
-			name:      "empty messages",
-			input:     []byte(`{"messages": []}`),
-			wantEmpty: true,
-		},
-		{
-			name:      "no messages field",
-			input:     []byte(`{}`),
-			wantEmpty: true,
-		},
-	}
-
-	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
-			result := deriveSessionID(tt.input)
-			if tt.wantEmpty && result != "" {
-				t.Errorf("Expected empty session ID, got '%s'", result)
-			}
-			if !tt.wantEmpty && result == "" {
-				t.Error("Expected non-empty session ID")
-			}
-		})
-	}
-}
-
-func TestDeriveSessionIDFromRequest_Deterministic(t *testing.T) {
-	input := []byte(`{"messages": [{"role": "user", "content": "Same message"}]}`)
-
-	id1 := deriveSessionID(input)
-	id2 := deriveSessionID(input)
-
-	if id1 != id2 {
-		t.Errorf("Session ID should be deterministic: '%s' != '%s'", id1, id2)
-	}
-}
-
-func TestDeriveSessionIDFromRequest_DifferentMessages(t *testing.T) {
-	input1 := []byte(`{"messages": [{"role": "user", "content": "Message A"}]}`)
-	input2 := []byte(`{"messages": [{"role": "user", "content": "Message B"}]}`)
-
-	id1 := deriveSessionID(input1)
-	id2 := deriveSessionID(input2)
-
-	if id1 == id2 {
-		t.Error("Different messages should produce different session IDs")
-	}
-}
--- a/internal/translator/antigravity/gemini/antigravity_gemini_request.go
+++ b/internal/translator/antigravity/gemini/antigravity_gemini_request.go
@@ -8,6 +8,7 @@ package gemini
 import (
 	"bytes"
 	"fmt"
+	"strings"

 	"github.com/router-for-me/CLIProxyAPI/v6/internal/translator/gemini/common"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/util"
@@ -32,12 +33,12 @@ import (
 //
 // Returns:
 //   - []byte: The transformed request data in Gemini API format
-func ConvertGeminiRequestToAntigravity(_ string, inputRawJSON []byte, _ bool) []byte {
+func ConvertGeminiRequestToAntigravity(modelName string, inputRawJSON []byte, _ bool) []byte {
 	rawJSON := bytes.Clone(inputRawJSON)
 	template := ""
 	template = `{"project":"","request":{},"model":""}`
 	template, _ = sjson.SetRaw(template, "request", string(rawJSON))
-	template, _ = sjson.Set(template, "model", gjson.Get(template, "request.model").String())
+	template, _ = sjson.Set(template, "model", modelName)
 	template, _ = sjson.Delete(template, "request.model")

 	template, errFixCLIToolResponse := fixCLIToolResponse(template)
@@ -97,37 +98,40 @@ func ConvertGeminiRequestToAntigravity(_ string, inputRawJSON []byte, _ bool) []
 		}
 	}

-	// Gemini-specific handling: add skip_thought_signature_validator to functionCall parts
-	// and remove thinking blocks entirely (Gemini doesn't need to preserve them)
-	const skipSentinel = "skip_thought_signature_validator"
+	// Gemini-specific handling for non-Claude models:
+	// - Add skip_thought_signature_validator to functionCall parts so upstream can bypass signature validation.
+	// - Also mark thinking parts with the same sentinel when present (we keep the parts; we only annotate them).
+	if !strings.Contains(modelName, "claude") {
+		const skipSentinel = "skip_thought_signature_validator"

-	gjson.GetBytes(rawJSON, "request.contents").ForEach(func(contentIdx, content gjson.Result) bool {
-		if content.Get("role").String() == "model" {
-			// First pass: collect indices of thinking parts to remove
-			var thinkingIndicesToRemove []int64
-			content.Get("parts").ForEach(func(partIdx, part gjson.Result) bool {
-				// Mark thinking blocks for removal
-				if part.Get("thought").Bool() {
-					thinkingIndicesToRemove = append(thinkingIndicesToRemove, partIdx.Int())
-				}
-				// Add skip sentinel to functionCall parts
-				if part.Get("functionCall").Exists() {
-					existingSig := part.Get("thoughtSignature").String()
-					if existingSig == "" || len(existingSig) < 50 {
-						rawJSON, _ = sjson.SetBytes(rawJSON, fmt.Sprintf("request.contents.%d.parts.%d.thoughtSignature", contentIdx.Int(), partIdx.Int()), skipSentinel)
+		gjson.GetBytes(rawJSON, "request.contents").ForEach(func(contentIdx, content gjson.Result) bool {
+			if content.Get("role").String() == "model" {
+				// First pass: collect indices of thinking parts to mark with skip sentinel
+				var thinkingIndicesToSkipSignature []int64
+				content.Get("parts").ForEach(func(partIdx, part gjson.Result) bool {
+					// Collect indices of thinking blocks to mark with skip sentinel
+					if part.Get("thought").Bool() {
+						thinkingIndicesToSkipSignature = append(thinkingIndicesToSkipSignature, partIdx.Int())
 					}
-				}
-				return true
-			})
+					// Add skip sentinel to functionCall parts
+					if part.Get("functionCall").Exists() {
+						existingSig := part.Get("thoughtSignature").String()
+						if existingSig == "" || len(existingSig) < 50 {
+							rawJSON, _ = sjson.SetBytes(rawJSON, fmt.Sprintf("request.contents.%d.parts.%d.thoughtSignature", contentIdx.Int(), partIdx.Int()), skipSentinel)
+						}
+					}
+					return true
+				})

-			// Remove thinking blocks in reverse order to preserve indices
-			for i := len(thinkingIndicesToRemove) - 1; i >= 0; i-- {
-				idx := thinkingIndicesToRemove[i]
-				rawJSON, _ = sjson.DeleteBytes(rawJSON, fmt.Sprintf("request.contents.%d.parts.%d", contentIdx.Int(), idx))
+				// Add skip_thought_signature_validator sentinel to thinking blocks in reverse order to preserve indices
+				for i := len(thinkingIndicesToSkipSignature) - 1; i >= 0; i-- {
+					idx := thinkingIndicesToSkipSignature[i]
+					rawJSON, _ = sjson.SetBytes(rawJSON, fmt.Sprintf("request.contents.%d.parts.%d.thoughtSignature", contentIdx.Int(), idx), skipSentinel)
+				}
 			}
-		}
-		return true
-	})
+			return true
+		})
+	}

 	return common.AttachDefaultSafetySettings(rawJSON, "request.safetySettings")
 }
--- a/internal/translator/gemini/openai/responses/gemini_openai-responses_request.go
+++ b/internal/translator/gemini/openai/responses/gemini_openai-responses_request.go
@@ -298,6 +298,15 @@ func ConvertOpenAIResponsesRequestToGemini(modelName string, inputRawJSON []byte
 				}
 				functionContent, _ = sjson.SetRaw(functionContent, "parts.-1", functionResponse)
 				out, _ = sjson.SetRaw(out, "contents.-1", functionContent)
+
+			case "reasoning":
+				thoughtContent := `{"role":"model","parts":[]}`
+				thought := `{"text":"","thoughtSignature":"","thought":true}`
+				thought, _ = sjson.Set(thought, "text", item.Get("summary.0.text").String())
+				thought, _ = sjson.Set(thought, "thoughtSignature", item.Get("encrypted_content").String())
+
+				thoughtContent, _ = sjson.SetRaw(thoughtContent, "parts.-1", thought)
+				out, _ = sjson.SetRaw(out, "contents.-1", thoughtContent)
 			}
 		}
 	} else if input.Exists() && input.Type == gjson.String {
--- a/internal/translator/gemini/openai/responses/gemini_openai-responses_response.go
+++ b/internal/translator/gemini/openai/responses/gemini_openai-responses_response.go
@@ -20,6 +20,7 @@ type geminiToResponsesState struct {

 	// message aggregation
 	MsgOpened    bool
+	MsgClosed    bool
 	MsgIndex     int
 	CurrentMsgID string
 	TextBuf      strings.Builder
@@ -29,6 +30,7 @@ type geminiToResponsesState struct {
 	ReasoningOpened bool
 	ReasoningIndex  int
 	ReasoningItemID string
+	ReasoningEnc    string
 	ReasoningBuf    strings.Builder
 	ReasoningClosed bool

@@ -37,6 +39,7 @@ type geminiToResponsesState struct {
 	FuncArgsBuf map[int]*strings.Builder
 	FuncNames   map[int]string
 	FuncCallIDs map[int]string
+	FuncDone    map[int]bool
 }

 // responseIDCounter provides a process-wide unique counter for synthesized response identifiers.
@@ -45,6 +48,39 @@ var responseIDCounter uint64
 // funcCallIDCounter provides a process-wide unique counter for function call identifiers.
 var funcCallIDCounter uint64

+func pickRequestJSON(originalRequestRawJSON, requestRawJSON []byte) []byte {
+	if len(originalRequestRawJSON) > 0 && gjson.ValidBytes(originalRequestRawJSON) {
+		return originalRequestRawJSON
+	}
+	if len(requestRawJSON) > 0 && gjson.ValidBytes(requestRawJSON) {
+		return requestRawJSON
+	}
+	return nil
+}
+
+func unwrapRequestRoot(root gjson.Result) gjson.Result {
+	req := root.Get("request")
+	if !req.Exists() {
+		return root
+	}
+	if req.Get("model").Exists() || req.Get("input").Exists() || req.Get("instructions").Exists() {
+		return req
+	}
+	return root
+}
+
+func unwrapGeminiResponseRoot(root gjson.Result) gjson.Result {
+	resp := root.Get("response")
+	if !resp.Exists() {
+		return root
+	}
+	// Vertex-style Gemini responses wrap the actual payload in a "response" object.
+	if resp.Get("candidates").Exists() || resp.Get("responseId").Exists() || resp.Get("usageMetadata").Exists() {
+		return resp
+	}
+	return root
+}
+
 func emitEvent(event string, payload string) string {
 	return fmt.Sprintf("event: %s\ndata: %s", event, payload)
 }
@@ -56,18 +92,37 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 			FuncArgsBuf: make(map[int]*strings.Builder),
 			FuncNames:   make(map[int]string),
 			FuncCallIDs: make(map[int]string),
+			FuncDone:    make(map[int]bool),
 		}
 	}
 	st := (*param).(*geminiToResponsesState)
+	if st.FuncArgsBuf == nil {
+		st.FuncArgsBuf = make(map[int]*strings.Builder)
+	}
+	if st.FuncNames == nil {
+		st.FuncNames = make(map[int]string)
+	}
+	if st.FuncCallIDs == nil {
+		st.FuncCallIDs = make(map[int]string)
+	}
+	if st.FuncDone == nil {
+		st.FuncDone = make(map[int]bool)
+	}

 	if bytes.HasPrefix(rawJSON, []byte("data:")) {
 		rawJSON = bytes.TrimSpace(rawJSON[5:])
 	}

+	rawJSON = bytes.TrimSpace(rawJSON)
+	if len(rawJSON) == 0 || bytes.Equal(rawJSON, []byte("[DONE]")) {
+		return []string{}
+	}
+
 	root := gjson.ParseBytes(rawJSON)
 	if !root.Exists() {
 		return []string{}
 	}
+	root = unwrapGeminiResponseRoot(root)

 	var out []string
 	nextSeq := func() int { st.Seq++; return st.Seq }
@@ -98,19 +153,54 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 		itemDone, _ = sjson.Set(itemDone, "sequence_number", nextSeq())
 		itemDone, _ = sjson.Set(itemDone, "item.id", st.ReasoningItemID)
 		itemDone, _ = sjson.Set(itemDone, "output_index", st.ReasoningIndex)
+		itemDone, _ = sjson.Set(itemDone, "item.encrypted_content", st.ReasoningEnc)
 		itemDone, _ = sjson.Set(itemDone, "item.summary.0.text", full)
 		out = append(out, emitEvent("response.output_item.done", itemDone))

 		st.ReasoningClosed = true
 	}

+	// Helper to finalize the assistant message in correct order.
+	// It emits response.output_text.done, response.content_part.done,
+	// and response.output_item.done exactly once.
+	finalizeMessage := func() {
+		if !st.MsgOpened || st.MsgClosed {
+			return
+		}
+		fullText := st.ItemTextBuf.String()
+		done := `{"type":"response.output_text.done","sequence_number":0,"item_id":"","output_index":0,"content_index":0,"text":"","logprobs":[]}`
+		done, _ = sjson.Set(done, "sequence_number", nextSeq())
+		done, _ = sjson.Set(done, "item_id", st.CurrentMsgID)
+		done, _ = sjson.Set(done, "output_index", st.MsgIndex)
+		done, _ = sjson.Set(done, "text", fullText)
+		out = append(out, emitEvent("response.output_text.done", done))
+		partDone := `{"type":"response.content_part.done","sequence_number":0,"item_id":"","output_index":0,"content_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":""}}`
+		partDone, _ = sjson.Set(partDone, "sequence_number", nextSeq())
+		partDone, _ = sjson.Set(partDone, "item_id", st.CurrentMsgID)
+		partDone, _ = sjson.Set(partDone, "output_index", st.MsgIndex)
+		partDone, _ = sjson.Set(partDone, "part.text", fullText)
+		out = append(out, emitEvent("response.content_part.done", partDone))
+		final := `{"type":"response.output_item.done","sequence_number":0,"output_index":0,"item":{"id":"","type":"message","status":"completed","content":[{"type":"output_text","text":""}],"role":"assistant"}}`
+		final, _ = sjson.Set(final, "sequence_number", nextSeq())
+		final, _ = sjson.Set(final, "output_index", st.MsgIndex)
+		final, _ = sjson.Set(final, "item.id", st.CurrentMsgID)
+		final, _ = sjson.Set(final, "item.content.0.text", fullText)
+		out = append(out, emitEvent("response.output_item.done", final))
+
+		st.MsgClosed = true
+	}
+
 	// Initialize per-response fields and emit created/in_progress once
 	if !st.Started {
-		if v := root.Get("responseId"); v.Exists() {
-			st.ResponseID = v.String()
+		st.ResponseID = root.Get("responseId").String()
+		if st.ResponseID == "" {
+			st.ResponseID = fmt.Sprintf("resp_%x_%d", time.Now().UnixNano(), atomic.AddUint64(&responseIDCounter, 1))
+		}
+		if !strings.HasPrefix(st.ResponseID, "resp_") {
+			st.ResponseID = fmt.Sprintf("resp_%s", st.ResponseID)
 		}
 		if v := root.Get("createTime"); v.Exists() {
-			if t, err := time.Parse(time.RFC3339Nano, v.String()); err == nil {
+			if t, errParseCreateTime := time.Parse(time.RFC3339Nano, v.String()); errParseCreateTime == nil {
 				st.CreatedAt = t.Unix()
 			}
 		}
@@ -143,15 +233,21 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 					// Ignore any late thought chunks after reasoning is finalized.
 					return true
 				}
+				if sig := part.Get("thoughtSignature"); sig.Exists() && sig.String() != "" && sig.String() != geminiResponsesThoughtSignature {
+					st.ReasoningEnc = sig.String()
+				} else if sig = part.Get("thought_signature"); sig.Exists() && sig.String() != "" && sig.String() != geminiResponsesThoughtSignature {
+					st.ReasoningEnc = sig.String()
+				}
 				if !st.ReasoningOpened {
 					st.ReasoningOpened = true
 					st.ReasoningIndex = st.NextIndex
 					st.NextIndex++
 					st.ReasoningItemID = fmt.Sprintf("rs_%s_%d", st.ResponseID, st.ReasoningIndex)
-					item := `{"type":"response.output_item.added","sequence_number":0,"output_index":0,"item":{"id":"","type":"reasoning","status":"in_progress","summary":[]}}`
+					item := `{"type":"response.output_item.added","sequence_number":0,"output_index":0,"item":{"id":"","type":"reasoning","status":"in_progress","encrypted_content":"","summary":[]}}`
 					item, _ = sjson.Set(item, "sequence_number", nextSeq())
 					item, _ = sjson.Set(item, "output_index", st.ReasoningIndex)
 					item, _ = sjson.Set(item, "item.id", st.ReasoningItemID)
+					item, _ = sjson.Set(item, "item.encrypted_content", st.ReasoningEnc)
 					out = append(out, emitEvent("response.output_item.added", item))
 					partAdded := `{"type":"response.reasoning_summary_part.added","sequence_number":0,"item_id":"","output_index":0,"summary_index":0,"part":{"type":"summary_text","text":""}}`
 					partAdded, _ = sjson.Set(partAdded, "sequence_number", nextSeq())
@@ -191,9 +287,9 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 					partAdded, _ = sjson.Set(partAdded, "output_index", st.MsgIndex)
 					out = append(out, emitEvent("response.content_part.added", partAdded))
 					st.ItemTextBuf.Reset()
-					st.ItemTextBuf.WriteString(t.String())
 				}
 				st.TextBuf.WriteString(t.String())
+				st.ItemTextBuf.WriteString(t.String())
 				msg := `{"type":"response.output_text.delta","sequence_number":0,"item_id":"","output_index":0,"content_index":0,"delta":"","logprobs":[]}`
 				msg, _ = sjson.Set(msg, "sequence_number", nextSeq())
 				msg, _ = sjson.Set(msg, "item_id", st.CurrentMsgID)
@@ -205,8 +301,10 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,

 			// Function call
 			if fc := part.Get("functionCall"); fc.Exists() {
-				// Before emitting function-call outputs, finalize reasoning if open.
+				// Before emitting function-call outputs, finalize reasoning and the message (if open).
+				// Responses streaming requires message done events before the next output_item.added.
 				finalizeReasoning()
+				finalizeMessage()
 				name := fc.Get("name").String()
 				idx := st.NextIndex
 				st.NextIndex++
@@ -219,6 +317,14 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 				}
 				st.FuncNames[idx] = name

+				argsJSON := "{}"
+				if args := fc.Get("args"); args.Exists() {
+					argsJSON = args.Raw
+				}
+				if st.FuncArgsBuf[idx].Len() == 0 && argsJSON != "" {
+					st.FuncArgsBuf[idx].WriteString(argsJSON)
+				}
+
 				// Emit item.added for function call
 				item := `{"type":"response.output_item.added","sequence_number":0,"output_index":0,"item":{"id":"","type":"function_call","status":"in_progress","arguments":"","call_id":"","name":""}}`
 				item, _ = sjson.Set(item, "sequence_number", nextSeq())
@@ -228,10 +334,9 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 				item, _ = sjson.Set(item, "item.name", name)
 				out = append(out, emitEvent("response.output_item.added", item))

-				// Emit arguments delta (full args in one chunk)
-				if args := fc.Get("args"); args.Exists() {
-					argsJSON := args.Raw
-					st.FuncArgsBuf[idx].WriteString(argsJSON)
+				// Emit arguments delta (full args in one chunk).
+				// When Gemini omits args, emit "{}" to keep Responses streaming event order consistent.
+				if argsJSON != "" {
 					ad := `{"type":"response.function_call_arguments.delta","sequence_number":0,"item_id":"","output_index":0,"delta":""}`
 					ad, _ = sjson.Set(ad, "sequence_number", nextSeq())
 					ad, _ = sjson.Set(ad, "item_id", fmt.Sprintf("fc_%s", st.FuncCallIDs[idx]))
@@ -240,6 +345,27 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 					out = append(out, emitEvent("response.function_call_arguments.delta", ad))
 				}

+				// Gemini emits the full function call payload at once, so we can finalize it immediately.
+				if !st.FuncDone[idx] {
+					fcDone := `{"type":"response.function_call_arguments.done","sequence_number":0,"item_id":"","output_index":0,"arguments":""}`
+					fcDone, _ = sjson.Set(fcDone, "sequence_number", nextSeq())
+					fcDone, _ = sjson.Set(fcDone, "item_id", fmt.Sprintf("fc_%s", st.FuncCallIDs[idx]))
+					fcDone, _ = sjson.Set(fcDone, "output_index", idx)
+					fcDone, _ = sjson.Set(fcDone, "arguments", argsJSON)
+					out = append(out, emitEvent("response.function_call_arguments.done", fcDone))
+
+					itemDone := `{"type":"response.output_item.done","sequence_number":0,"output_index":0,"item":{"id":"","type":"function_call","status":"completed","arguments":"","call_id":"","name":""}}`
+					itemDone, _ = sjson.Set(itemDone, "sequence_number", nextSeq())
+					itemDone, _ = sjson.Set(itemDone, "output_index", idx)
+					itemDone, _ = sjson.Set(itemDone, "item.id", fmt.Sprintf("fc_%s", st.FuncCallIDs[idx]))
+					itemDone, _ = sjson.Set(itemDone, "item.arguments", argsJSON)
+					itemDone, _ = sjson.Set(itemDone, "item.call_id", st.FuncCallIDs[idx])
+					itemDone, _ = sjson.Set(itemDone, "item.name", st.FuncNames[idx])
+					out = append(out, emitEvent("response.output_item.done", itemDone))
+
+					st.FuncDone[idx] = true
+				}
+
 				return true
 			}

@@ -251,28 +377,7 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 	if fr := root.Get("candidates.0.finishReason"); fr.Exists() && fr.String() != "" {
 		// Finalize reasoning first to keep ordering tight with last delta
 		finalizeReasoning()
-		// Close message output if opened
-		if st.MsgOpened {
-			fullText := st.ItemTextBuf.String()
-			done := `{"type":"response.output_text.done","sequence_number":0,"item_id":"","output_index":0,"content_index":0,"text":"","logprobs":[]}`
-			done, _ = sjson.Set(done, "sequence_number", nextSeq())
-			done, _ = sjson.Set(done, "item_id", st.CurrentMsgID)
-			done, _ = sjson.Set(done, "output_index", st.MsgIndex)
-			done, _ = sjson.Set(done, "text", fullText)
-			out = append(out, emitEvent("response.output_text.done", done))
-			partDone := `{"type":"response.content_part.done","sequence_number":0,"item_id":"","output_index":0,"content_index":0,"part":{"type":"output_text","annotations":[],"logprobs":[],"text":""}}`
-			partDone, _ = sjson.Set(partDone, "sequence_number", nextSeq())
-			partDone, _ = sjson.Set(partDone, "item_id", st.CurrentMsgID)
-			partDone, _ = sjson.Set(partDone, "output_index", st.MsgIndex)
-			partDone, _ = sjson.Set(partDone, "part.text", fullText)
-			out = append(out, emitEvent("response.content_part.done", partDone))
-			final := `{"type":"response.output_item.done","sequence_number":0,"output_index":0,"item":{"id":"","type":"message","status":"completed","content":[{"type":"output_text","text":""}],"role":"assistant"}}`
-			final, _ = sjson.Set(final, "sequence_number", nextSeq())
-			final, _ = sjson.Set(final, "output_index", st.MsgIndex)
-			final, _ = sjson.Set(final, "item.id", st.CurrentMsgID)
-			final, _ = sjson.Set(final, "item.content.0.text", fullText)
-			out = append(out, emitEvent("response.output_item.done", final))
-		}
+		finalizeMessage()

 		// Close function calls
 		if len(st.FuncArgsBuf) > 0 {
@@ -289,6 +394,9 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 				}
 			}
 			for _, idx := range idxs {
+				if st.FuncDone[idx] {
+					continue
+				}
 				args := "{}"
 				if b := st.FuncArgsBuf[idx]; b != nil && b.Len() > 0 {
 					args = b.String()
@@ -308,6 +416,8 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 				itemDone, _ = sjson.Set(itemDone, "item.call_id", st.FuncCallIDs[idx])
 				itemDone, _ = sjson.Set(itemDone, "item.name", st.FuncNames[idx])
 				out = append(out, emitEvent("response.output_item.done", itemDone))
+
+				st.FuncDone[idx] = true
 			}
 		}

@@ -319,8 +429,8 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 		completed, _ = sjson.Set(completed, "response.id", st.ResponseID)
 		completed, _ = sjson.Set(completed, "response.created_at", st.CreatedAt)

-		if requestRawJSON != nil {
-			req := gjson.ParseBytes(requestRawJSON)
+		if reqJSON := pickRequestJSON(originalRequestRawJSON, requestRawJSON); len(reqJSON) > 0 {
+			req := unwrapRequestRoot(gjson.ParseBytes(reqJSON))
 			if v := req.Get("instructions"); v.Exists() {
 				completed, _ = sjson.Set(completed, "response.instructions", v.String())
 			}
@@ -383,41 +493,34 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 			}
 		}

-		// Compose outputs in encountered order: reasoning, message, function_calls
+		// Compose outputs in output_index order.
 		outputsWrapper := `{"arr":[]}`
-		if st.ReasoningOpened {
-			item := `{"id":"","type":"reasoning","summary":[{"type":"summary_text","text":""}]}`
-			item, _ = sjson.Set(item, "id", st.ReasoningItemID)
-			item, _ = sjson.Set(item, "summary.0.text", st.ReasoningBuf.String())
-			outputsWrapper, _ = sjson.SetRaw(outputsWrapper, "arr.-1", item)
-		}
-		if st.MsgOpened {
-			item := `{"id":"","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":""}],"role":"assistant"}`
-			item, _ = sjson.Set(item, "id", st.CurrentMsgID)
-			item, _ = sjson.Set(item, "content.0.text", st.TextBuf.String())
-			outputsWrapper, _ = sjson.SetRaw(outputsWrapper, "arr.-1", item)
-		}
-		if len(st.FuncArgsBuf) > 0 {
-			idxs := make([]int, 0, len(st.FuncArgsBuf))
-			for idx := range st.FuncArgsBuf {
-				idxs = append(idxs, idx)
+		for idx := 0; idx < st.NextIndex; idx++ {
+			if st.ReasoningOpened && idx == st.ReasoningIndex {
+				item := `{"id":"","type":"reasoning","encrypted_content":"","summary":[{"type":"summary_text","text":""}]}`
+				item, _ = sjson.Set(item, "id", st.ReasoningItemID)
+				item, _ = sjson.Set(item, "encrypted_content", st.ReasoningEnc)
+				item, _ = sjson.Set(item, "summary.0.text", st.ReasoningBuf.String())
+				outputsWrapper, _ = sjson.SetRaw(outputsWrapper, "arr.-1", item)
+				continue
 			}
-			for i := 0; i < len(idxs); i++ {
-				for j := i + 1; j < len(idxs); j++ {
-					if idxs[j] < idxs[i] {
-						idxs[i], idxs[j] = idxs[j], idxs[i]
-					}
-				}
+			if st.MsgOpened && idx == st.MsgIndex {
+				item := `{"id":"","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":""}],"role":"assistant"}`
+				item, _ = sjson.Set(item, "id", st.CurrentMsgID)
+				item, _ = sjson.Set(item, "content.0.text", st.TextBuf.String())
+				outputsWrapper, _ = sjson.SetRaw(outputsWrapper, "arr.-1", item)
+				continue
 			}
-			for _, idx := range idxs {
-				args := ""
-				if b := st.FuncArgsBuf[idx]; b != nil {
+
+			if callID, ok := st.FuncCallIDs[idx]; ok && callID != "" {
+				args := "{}"
+				if b := st.FuncArgsBuf[idx]; b != nil && b.Len() > 0 {
 					args = b.String()
 				}
 				item := `{"id":"","type":"function_call","status":"completed","arguments":"","call_id":"","name":""}`
-				item, _ = sjson.Set(item, "id", fmt.Sprintf("fc_%s", st.FuncCallIDs[idx]))
+				item, _ = sjson.Set(item, "id", fmt.Sprintf("fc_%s", callID))
 				item, _ = sjson.Set(item, "arguments", args)
-				item, _ = sjson.Set(item, "call_id", st.FuncCallIDs[idx])
+				item, _ = sjson.Set(item, "call_id", callID)
 				item, _ = sjson.Set(item, "name", st.FuncNames[idx])
 				outputsWrapper, _ = sjson.SetRaw(outputsWrapper, "arr.-1", item)
 			}
@@ -431,8 +534,8 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 			// input tokens = prompt + thoughts
 			input := um.Get("promptTokenCount").Int() + um.Get("thoughtsTokenCount").Int()
 			completed, _ = sjson.Set(completed, "response.usage.input_tokens", input)
-			// cached_tokens not provided by Gemini; default to 0 for structure compatibility
-			completed, _ = sjson.Set(completed, "response.usage.input_tokens_details.cached_tokens", 0)
+			// cached token details: align with OpenAI "cached_tokens" semantics.
+			completed, _ = sjson.Set(completed, "response.usage.input_tokens_details.cached_tokens", um.Get("cachedContentTokenCount").Int())
 			// output tokens
 			if v := um.Get("candidatesTokenCount"); v.Exists() {
 				completed, _ = sjson.Set(completed, "response.usage.output_tokens", v.Int())
@@ -460,6 +563,7 @@ func ConvertGeminiResponseToOpenAIResponses(_ context.Context, modelName string,
 // ConvertGeminiResponseToOpenAIResponsesNonStream aggregates Gemini response JSON into a single OpenAI Responses JSON object.
 func ConvertGeminiResponseToOpenAIResponsesNonStream(_ context.Context, _ string, originalRequestRawJSON, requestRawJSON, rawJSON []byte, _ *any) string {
 	root := gjson.ParseBytes(rawJSON)
+	root = unwrapGeminiResponseRoot(root)

 	// Base response scaffold
 	resp := `{"id":"","object":"response","created_at":0,"status":"completed","background":false,"error":null,"incomplete_details":null}`
@@ -478,15 +582,15 @@ func ConvertGeminiResponseToOpenAIResponsesNonStream(_ context.Context, _ string
 	// created_at: map from createTime if available
 	createdAt := time.Now().Unix()
 	if v := root.Get("createTime"); v.Exists() {
-		if t, err := time.Parse(time.RFC3339Nano, v.String()); err == nil {
+		if t, errParseCreateTime := time.Parse(time.RFC3339Nano, v.String()); errParseCreateTime == nil {
 			createdAt = t.Unix()
 		}
 	}
 	resp, _ = sjson.Set(resp, "created_at", createdAt)

 	// Echo request fields when present; fallback model from response modelVersion
-	if len(requestRawJSON) > 0 {
-		req := gjson.ParseBytes(requestRawJSON)
+	if reqJSON := pickRequestJSON(originalRequestRawJSON, requestRawJSON); len(reqJSON) > 0 {
+		req := unwrapRequestRoot(gjson.ParseBytes(reqJSON))
 		if v := req.Get("instructions"); v.Exists() {
 			resp, _ = sjson.Set(resp, "instructions", v.String())
 		}
@@ -636,8 +740,8 @@ func ConvertGeminiResponseToOpenAIResponsesNonStream(_ context.Context, _ string
 		// input tokens = prompt + thoughts
 		input := um.Get("promptTokenCount").Int() + um.Get("thoughtsTokenCount").Int()
 		resp, _ = sjson.Set(resp, "usage.input_tokens", input)
-		// cached_tokens not provided by Gemini; default to 0 for structure compatibility
-		resp, _ = sjson.Set(resp, "usage.input_tokens_details.cached_tokens", 0)
+		// cached token details: align with OpenAI "cached_tokens" semantics.
+		resp, _ = sjson.Set(resp, "usage.input_tokens_details.cached_tokens", um.Get("cachedContentTokenCount").Int())
 		// output tokens
 		if v := um.Get("candidatesTokenCount"); v.Exists() {
 			resp, _ = sjson.Set(resp, "usage.output_tokens", v.Int())
--- a/internal/translator/gemini/openai/responses/gemini_openai-responses_response_test.go
+++ b/internal/translator/gemini/openai/responses/gemini_openai-responses_response_test.go
@@ -0,0 +1,353 @@
+package responses
+
+import (
+	"context"
+	"strings"
+	"testing"
+
+	"github.com/tidwall/gjson"
+)
+
+func parseSSEEvent(t *testing.T, chunk string) (string, gjson.Result) {
+	t.Helper()
+
+	lines := strings.Split(chunk, "\n")
+	if len(lines) < 2 {
+		t.Fatalf("unexpected SSE chunk: %q", chunk)
+	}
+
+	event := strings.TrimSpace(strings.TrimPrefix(lines[0], "event:"))
+	dataLine := strings.TrimSpace(strings.TrimPrefix(lines[1], "data:"))
+	if !gjson.Valid(dataLine) {
+		t.Fatalf("invalid SSE data JSON: %q", dataLine)
+	}
+	return event, gjson.Parse(dataLine)
+}
+
+func TestConvertGeminiResponseToOpenAIResponses_UnwrapAndAggregateText(t *testing.T) {
+	// Vertex-style Gemini stream wraps the actual response payload under "response".
+	// This test ensures we unwrap and that output_text.done contains the full text.
+	in := []string{
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":""}]}}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":"让"}]}}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":"我先"}]}}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":"了解"}]}}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"functionCall":{"name":"mcp__serena__list_dir","args":{"recursive":false,"relative_path":"internal"},"id":"toolu_1"}}]}}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":5,"totalTokenCount":15,"cachedContentTokenCount":2},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+	}
+
+	originalReq := []byte(`{"instructions":"test instructions","model":"gpt-5","max_output_tokens":123}`)
+
+	var param any
+	var out []string
+	for _, line := range in {
+		out = append(out, ConvertGeminiResponseToOpenAIResponses(context.Background(), "test-model", originalReq, nil, []byte(line), &param)...)
+	}
+
+	var (
+		gotTextDone     bool
+		gotMessageDone  bool
+		gotResponseDone bool
+		gotFuncDone     bool
+
+		textDone     string
+		messageText  string
+		responseID   string
+		instructions string
+		cachedTokens int64
+
+		funcName string
+		funcArgs string
+
+		posTextDone    = -1
+		posPartDone    = -1
+		posMessageDone = -1
+		posFuncAdded   = -1
+	)
+
+	for i, chunk := range out {
+		ev, data := parseSSEEvent(t, chunk)
+		switch ev {
+		case "response.output_text.done":
+			gotTextDone = true
+			if posTextDone == -1 {
+				posTextDone = i
+			}
+			textDone = data.Get("text").String()
+		case "response.content_part.done":
+			if posPartDone == -1 {
+				posPartDone = i
+			}
+		case "response.output_item.done":
+			switch data.Get("item.type").String() {
+			case "message":
+				gotMessageDone = true
+				if posMessageDone == -1 {
+					posMessageDone = i
+				}
+				messageText = data.Get("item.content.0.text").String()
+			case "function_call":
+				gotFuncDone = true
+				funcName = data.Get("item.name").String()
+				funcArgs = data.Get("item.arguments").String()
+			}
+		case "response.output_item.added":
+			if data.Get("item.type").String() == "function_call" && posFuncAdded == -1 {
+				posFuncAdded = i
+			}
+		case "response.completed":
+			gotResponseDone = true
+			responseID = data.Get("response.id").String()
+			instructions = data.Get("response.instructions").String()
+			cachedTokens = data.Get("response.usage.input_tokens_details.cached_tokens").Int()
+		}
+	}
+
+	if !gotTextDone {
+		t.Fatalf("missing response.output_text.done event")
+	}
+	if posTextDone == -1 || posPartDone == -1 || posMessageDone == -1 || posFuncAdded == -1 {
+		t.Fatalf("missing ordering events: textDone=%d partDone=%d messageDone=%d funcAdded=%d", posTextDone, posPartDone, posMessageDone, posFuncAdded)
+	}
+	if !(posTextDone < posPartDone && posPartDone < posMessageDone && posMessageDone < posFuncAdded) {
+		t.Fatalf("unexpected message/function ordering: textDone=%d partDone=%d messageDone=%d funcAdded=%d", posTextDone, posPartDone, posMessageDone, posFuncAdded)
+	}
+	if !gotMessageDone {
+		t.Fatalf("missing message response.output_item.done event")
+	}
+	if !gotFuncDone {
+		t.Fatalf("missing function_call response.output_item.done event")
+	}
+	if !gotResponseDone {
+		t.Fatalf("missing response.completed event")
+	}
+
+	if textDone != "让我先了解" {
+		t.Fatalf("unexpected output_text.done text: got %q", textDone)
+	}
+	if messageText != "让我先了解" {
+		t.Fatalf("unexpected message done text: got %q", messageText)
+	}
+
+	if responseID != "resp_req_vrtx_1" {
+		t.Fatalf("unexpected response id: got %q", responseID)
+	}
+	if instructions != "test instructions" {
+		t.Fatalf("unexpected instructions echo: got %q", instructions)
+	}
+	if cachedTokens != 2 {
+		t.Fatalf("unexpected cached token count: got %d", cachedTokens)
+	}
+
+	if funcName != "mcp__serena__list_dir" {
+		t.Fatalf("unexpected function name: got %q", funcName)
+	}
+	if !gjson.Valid(funcArgs) {
+		t.Fatalf("invalid function arguments JSON: %q", funcArgs)
+	}
+	if gjson.Get(funcArgs, "recursive").Bool() != false {
+		t.Fatalf("unexpected recursive arg: %v", gjson.Get(funcArgs, "recursive").Value())
+	}
+	if gjson.Get(funcArgs, "relative_path").String() != "internal" {
+		t.Fatalf("unexpected relative_path arg: %q", gjson.Get(funcArgs, "relative_path").String())
+	}
+}
+
+func TestConvertGeminiResponseToOpenAIResponses_ReasoningEncryptedContent(t *testing.T) {
+	sig := "RXE0RENrZ0lDeEFDR0FJcVFOZDdjUzlleGFuRktRdFcvSzNyZ2MvWDNCcDQ4RmxSbGxOWUlOVU5kR1l1UHMrMGdkMVp0Vkg3ekdKU0g4YVljc2JjN3lNK0FrdGpTNUdqamI4T3Z0VVNETzdQd3pmcFhUOGl3U3hXUEJvTVFRQ09mWTFyMEtTWGZxUUlJakFqdmFGWk83RW1XRlBKckJVOVpkYzdDKw=="
+	in := []string{
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"thought":true,"thoughtSignature":"` + sig + `","text":""}]}}],"modelVersion":"test-model","responseId":"req_vrtx_sig"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"thought":true,"text":"a"}]}}],"modelVersion":"test-model","responseId":"req_vrtx_sig"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":"hello"}]}}],"modelVersion":"test-model","responseId":"req_vrtx_sig"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"modelVersion":"test-model","responseId":"req_vrtx_sig"},"traceId":"t1"}`,
+	}
+
+	var param any
+	var out []string
+	for _, line := range in {
+		out = append(out, ConvertGeminiResponseToOpenAIResponses(context.Background(), "test-model", nil, nil, []byte(line), &param)...)
+	}
+
+	var (
+		addedEnc string
+		doneEnc  string
+	)
+	for _, chunk := range out {
+		ev, data := parseSSEEvent(t, chunk)
+		switch ev {
+		case "response.output_item.added":
+			if data.Get("item.type").String() == "reasoning" {
+				addedEnc = data.Get("item.encrypted_content").String()
+			}
+		case "response.output_item.done":
+			if data.Get("item.type").String() == "reasoning" {
+				doneEnc = data.Get("item.encrypted_content").String()
+			}
+		}
+	}
+
+	if addedEnc != sig {
+		t.Fatalf("unexpected encrypted_content in response.output_item.added: got %q", addedEnc)
+	}
+	if doneEnc != sig {
+		t.Fatalf("unexpected encrypted_content in response.output_item.done: got %q", doneEnc)
+	}
+}
+
+func TestConvertGeminiResponseToOpenAIResponses_FunctionCallEventOrder(t *testing.T) {
+	in := []string{
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"functionCall":{"name":"tool0"}}]}}],"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"functionCall":{"name":"tool1"}}]}}],"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"functionCall":{"name":"tool2","args":{"a":1}}}]}}],"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":5,"totalTokenCount":15,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_1"},"traceId":"t1"}`,
+	}
+
+	var param any
+	var out []string
+	for _, line := range in {
+		out = append(out, ConvertGeminiResponseToOpenAIResponses(context.Background(), "test-model", nil, nil, []byte(line), &param)...)
+	}
+
+	posAdded := []int{-1, -1, -1}
+	posArgsDelta := []int{-1, -1, -1}
+	posArgsDone := []int{-1, -1, -1}
+	posItemDone := []int{-1, -1, -1}
+	posCompleted := -1
+	deltaByIndex := map[int]string{}
+
+	for i, chunk := range out {
+		ev, data := parseSSEEvent(t, chunk)
+		switch ev {
+		case "response.output_item.added":
+			if data.Get("item.type").String() != "function_call" {
+				continue
+			}
+			idx := int(data.Get("output_index").Int())
+			if idx >= 0 && idx < len(posAdded) {
+				posAdded[idx] = i
+			}
+		case "response.function_call_arguments.delta":
+			idx := int(data.Get("output_index").Int())
+			if idx >= 0 && idx < len(posArgsDelta) {
+				posArgsDelta[idx] = i
+				deltaByIndex[idx] = data.Get("delta").String()
+			}
+		case "response.function_call_arguments.done":
+			idx := int(data.Get("output_index").Int())
+			if idx >= 0 && idx < len(posArgsDone) {
+				posArgsDone[idx] = i
+			}
+		case "response.output_item.done":
+			if data.Get("item.type").String() != "function_call" {
+				continue
+			}
+			idx := int(data.Get("output_index").Int())
+			if idx >= 0 && idx < len(posItemDone) {
+				posItemDone[idx] = i
+			}
+		case "response.completed":
+			posCompleted = i
+
+			output := data.Get("response.output")
+			if !output.Exists() || !output.IsArray() {
+				t.Fatalf("missing response.output in response.completed")
+			}
+			if len(output.Array()) != 3 {
+				t.Fatalf("unexpected response.output length: got %d", len(output.Array()))
+			}
+			if data.Get("response.output.0.name").String() != "tool0" || data.Get("response.output.0.arguments").String() != "{}" {
+				t.Fatalf("unexpected output[0]: %s", data.Get("response.output.0").Raw)
+			}
+			if data.Get("response.output.1.name").String() != "tool1" || data.Get("response.output.1.arguments").String() != "{}" {
+				t.Fatalf("unexpected output[1]: %s", data.Get("response.output.1").Raw)
+			}
+			if data.Get("response.output.2.name").String() != "tool2" {
+				t.Fatalf("unexpected output[2] name: %s", data.Get("response.output.2").Raw)
+			}
+			if !gjson.Valid(data.Get("response.output.2.arguments").String()) {
+				t.Fatalf("unexpected output[2] arguments: %q", data.Get("response.output.2.arguments").String())
+			}
+		}
+	}
+
+	if posCompleted == -1 {
+		t.Fatalf("missing response.completed event")
+	}
+	for idx := 0; idx < 3; idx++ {
+		if posAdded[idx] == -1 || posArgsDelta[idx] == -1 || posArgsDone[idx] == -1 || posItemDone[idx] == -1 {
+			t.Fatalf("missing function call events for output_index %d: added=%d argsDelta=%d argsDone=%d itemDone=%d", idx, posAdded[idx], posArgsDelta[idx], posArgsDone[idx], posItemDone[idx])
+		}
+		if !(posAdded[idx] < posArgsDelta[idx] && posArgsDelta[idx] < posArgsDone[idx] && posArgsDone[idx] < posItemDone[idx]) {
+			t.Fatalf("unexpected ordering for output_index %d: added=%d argsDelta=%d argsDone=%d itemDone=%d", idx, posAdded[idx], posArgsDelta[idx], posArgsDone[idx], posItemDone[idx])
+		}
+		if idx > 0 && !(posItemDone[idx-1] < posAdded[idx]) {
+			t.Fatalf("function call events overlap between %d and %d: prevDone=%d nextAdded=%d", idx-1, idx, posItemDone[idx-1], posAdded[idx])
+		}
+	}
+
+	if deltaByIndex[0] != "{}" {
+		t.Fatalf("unexpected delta for output_index 0: got %q", deltaByIndex[0])
+	}
+	if deltaByIndex[1] != "{}" {
+		t.Fatalf("unexpected delta for output_index 1: got %q", deltaByIndex[1])
+	}
+	if deltaByIndex[2] == "" || !gjson.Valid(deltaByIndex[2]) || gjson.Get(deltaByIndex[2], "a").Int() != 1 {
+		t.Fatalf("unexpected delta for output_index 2: got %q", deltaByIndex[2])
+	}
+	if !(posItemDone[2] < posCompleted) {
+		t.Fatalf("response.completed should be after last output_item.done: last=%d completed=%d", posItemDone[2], posCompleted)
+	}
+}
+
+func TestConvertGeminiResponseToOpenAIResponses_ResponseOutputOrdering(t *testing.T) {
+	in := []string{
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"functionCall":{"name":"tool0","args":{"x":"y"}}}]}}],"modelVersion":"test-model","responseId":"req_vrtx_2"},"traceId":"t2"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":"hi"}]}}],"modelVersion":"test-model","responseId":"req_vrtx_2"},"traceId":"t2"}`,
+		`data: {"response":{"candidates":[{"content":{"role":"model","parts":[{"text":""}]},"finishReason":"STOP"}],"usageMetadata":{"promptTokenCount":1,"candidatesTokenCount":1,"totalTokenCount":2,"cachedContentTokenCount":0},"modelVersion":"test-model","responseId":"req_vrtx_2"},"traceId":"t2"}`,
+	}
+
+	var param any
+	var out []string
+	for _, line := range in {
+		out = append(out, ConvertGeminiResponseToOpenAIResponses(context.Background(), "test-model", nil, nil, []byte(line), &param)...)
+	}
+
+	posFuncDone := -1
+	posMsgAdded := -1
+	posCompleted := -1
+
+	for i, chunk := range out {
+		ev, data := parseSSEEvent(t, chunk)
+		switch ev {
+		case "response.output_item.done":
+			if data.Get("item.type").String() == "function_call" && data.Get("output_index").Int() == 0 {
+				posFuncDone = i
+			}
+		case "response.output_item.added":
+			if data.Get("item.type").String() == "message" && data.Get("output_index").Int() == 1 {
+				posMsgAdded = i
+			}
+		case "response.completed":
+			posCompleted = i
+			if data.Get("response.output.0.type").String() != "function_call" {
+				t.Fatalf("expected response.output[0] to be function_call: %s", data.Get("response.output.0").Raw)
+			}
+			if data.Get("response.output.1.type").String() != "message" {
+				t.Fatalf("expected response.output[1] to be message: %s", data.Get("response.output.1").Raw)
+			}
+			if data.Get("response.output.1.content.0.text").String() != "hi" {
+				t.Fatalf("unexpected message text in response.output[1]: %s", data.Get("response.output.1").Raw)
+			}
+		}
+	}
+
+	if posFuncDone == -1 || posMsgAdded == -1 || posCompleted == -1 {
+		t.Fatalf("missing required events: funcDone=%d msgAdded=%d completed=%d", posFuncDone, posMsgAdded, posCompleted)
+	}
+	if !(posFuncDone < posMsgAdded) {
+		t.Fatalf("expected function_call to complete before message is added: funcDone=%d msgAdded=%d", posFuncDone, posMsgAdded)
+	}
+	if !(posMsgAdded < posCompleted) {
+		t.Fatalf("expected response.completed after message added: msgAdded=%d completed=%d", posMsgAdded, posCompleted)
+	}
+}
--- a/internal/watcher/events.go
+++ b/internal/watcher/events.go
@@ -170,7 +170,9 @@ func (w *Watcher) handleKiroIDETokenChange(event fsnotify.Event) {
 		}
 	}

-	tokenData, err := kiroauth.LoadKiroIDEToken()
+	// Use retry logic to handle file lock contention (e.g., Kiro IDE writing the file)
+	// This prevents "being used by another process" errors on Windows
+	tokenData, err := kiroauth.LoadKiroIDETokenWithRetry(10, 50*time.Millisecond)
 	if err != nil {
 		log.Debugf("failed to load Kiro IDE token after change: %v", err)
 		return
--- a/internal/watcher/watcher.go
+++ b/internal/watcher/watcher.go
@@ -145,3 +145,111 @@ func (w *Watcher) SnapshotCoreAuths() []*coreauth.Auth {
 	w.clientsMutex.RUnlock()
 	return snapshotCoreAuths(cfg, w.authDir)
 }
+
+// NotifyTokenRefreshed 处理后台刷新器的 token 更新通知
+// 当后台刷新器成功刷新 token 后调用此方法，更新内存中的 Auth 对象
+// tokenID: token 文件名（如 kiro-xxx.json）
+// accessToken: 新的 access token
+// refreshToken: 新的 refresh token
+// expiresAt: 新的过期时间
+func (w *Watcher) NotifyTokenRefreshed(tokenID, accessToken, refreshToken, expiresAt string) {
+	if w == nil {
+		return
+	}
+
+	w.clientsMutex.Lock()
+	defer w.clientsMutex.Unlock()
+
+	// 遍历 currentAuths，找到匹配的 Auth 并更新
+	updated := false
+	for id, auth := range w.currentAuths {
+		if auth == nil || auth.Metadata == nil {
+			continue
+		}
+
+		// 检查是否是 kiro 类型的 auth
+		authType, _ := auth.Metadata["type"].(string)
+		if authType != "kiro" {
+			continue
+		}
+
+		// 多种匹配方式，解决不同来源的 auth 对象字段差异
+		matched := false
+
+		// 1. 通过 auth.ID 匹配（ID 可能包含文件名）
+		if !matched && auth.ID != "" {
+			if auth.ID == tokenID || strings.HasSuffix(auth.ID, "/"+tokenID) || strings.HasSuffix(auth.ID, "\\"+tokenID) {
+				matched = true
+			}
+			// ID 可能是 "kiro-xxx" 格式（无扩展名），tokenID 是 "kiro-xxx.json"
+			if !matched && strings.TrimSuffix(tokenID, ".json") == auth.ID {
+				matched = true
+			}
+		}
+
+		// 2. 通过 auth.Attributes["path"] 匹配
+		if !matched && auth.Attributes != nil {
+			if authPath := auth.Attributes["path"]; authPath != "" {
+				// 提取文件名部分进行比较
+				pathBase := authPath
+				if idx := strings.LastIndexAny(authPath, "/\\"); idx >= 0 {
+					pathBase = authPath[idx+1:]
+				}
+				if pathBase == tokenID || strings.TrimSuffix(pathBase, ".json") == strings.TrimSuffix(tokenID, ".json") {
+					matched = true
+				}
+			}
+		}
+
+		// 3. 通过 auth.FileName 匹配（原有逻辑）
+		if !matched && auth.FileName != "" {
+			if auth.FileName == tokenID || strings.HasSuffix(auth.FileName, "/"+tokenID) || strings.HasSuffix(auth.FileName, "\\"+tokenID) {
+				matched = true
+			}
+		}
+
+		if matched {
+			// 更新内存中的 token
+			auth.Metadata["access_token"] = accessToken
+			auth.Metadata["refresh_token"] = refreshToken
+			auth.Metadata["expires_at"] = expiresAt
+			auth.Metadata["last_refresh"] = time.Now().Format(time.RFC3339)
+			auth.UpdatedAt = time.Now()
+			auth.LastRefreshedAt = time.Now()
+
+			log.Infof("watcher: updated in-memory auth for token %s (auth ID: %s)", tokenID, id)
+			updated = true
+
+			// 同时更新 runtimeAuths 中的副本（如果存在）
+			if w.runtimeAuths != nil {
+				if runtimeAuth, ok := w.runtimeAuths[id]; ok && runtimeAuth != nil {
+					if runtimeAuth.Metadata == nil {
+						runtimeAuth.Metadata = make(map[string]any)
+					}
+					runtimeAuth.Metadata["access_token"] = accessToken
+					runtimeAuth.Metadata["refresh_token"] = refreshToken
+					runtimeAuth.Metadata["expires_at"] = expiresAt
+					runtimeAuth.Metadata["last_refresh"] = time.Now().Format(time.RFC3339)
+					runtimeAuth.UpdatedAt = time.Now()
+					runtimeAuth.LastRefreshedAt = time.Now()
+				}
+			}
+
+			// 发送更新通知到 authQueue
+			if w.authQueue != nil {
+				go func(authClone *coreauth.Auth) {
+					update := AuthUpdate{
+						Action: AuthUpdateActionModify,
+						ID:     authClone.ID,
+						Auth:   authClone,
+					}
+					w.dispatchAuthUpdates([]AuthUpdate{update})
+				}(auth.Clone())
+			}
+		}
+	}
+
+	if !updated {
+		log.Debugf("watcher: no matching auth found for token %s, will be picked up on next file scan", tokenID)
+	}
+}
--- a/sdk/auth/filestore.go
+++ b/sdk/auth/filestore.go
@@ -216,6 +216,15 @@ func (s *FileTokenStore) readAuthFile(path, baseDir string) (*cliproxyauth.Auth,
 		return nil, fmt.Errorf("stat file: %w", err)
 	}
 	id := s.idFor(path, baseDir)
+
+	// Calculate NextRefreshAfter from expires_at (20 minutes before expiry)
+	var nextRefreshAfter time.Time
+	if expiresAtStr, ok := metadata["expires_at"].(string); ok && expiresAtStr != "" {
+		if expiresAt, err := time.Parse(time.RFC3339, expiresAtStr); err == nil {
+			nextRefreshAfter = expiresAt.Add(-20 * time.Minute)
+		}
+	}
+
 	auth := &cliproxyauth.Auth{
 		ID:               id,
 		Provider:         provider,
@@ -227,7 +236,7 @@ func (s *FileTokenStore) readAuthFile(path, baseDir string) (*cliproxyauth.Auth,
 		CreatedAt:        info.ModTime(),
 		UpdatedAt:        info.ModTime(),
 		LastRefreshedAt:  time.Time{},
-		NextRefreshAfter: time.Time{},
+		NextRefreshAfter: nextRefreshAfter,
 	}
 	if email, ok := metadata["email"].(string); ok && email != "" {
 		auth.Attributes["email"] = email
--- a/sdk/auth/kiro.go
+++ b/sdk/auth/kiro.go
@@ -12,9 +12,9 @@ import (
 )

 // extractKiroIdentifier extracts a meaningful identifier for file naming.
-// Returns account name if provided, otherwise profile ARN ID.
+// Returns account name if provided, otherwise profile ARN ID, then client ID.
 // All extracted values are sanitized to prevent path injection attacks.
-func extractKiroIdentifier(accountName, profileArn string) string {
+func extractKiroIdentifier(accountName, profileArn, clientID string) string {
 	// Priority 1: Use account name if provided
 	if accountName != "" {
 		return kiroauth.SanitizeEmailForFilename(accountName)
@@ -29,6 +29,11 @@ func extractKiroIdentifier(accountName, profileArn string) string {
 		}
 	}

+	// Priority 3: Use client ID (for IDC auth without email/profileArn)
+	if clientID != "" {
+		return kiroauth.SanitizeEmailForFilename(clientID)
+	}
+
 	// Fallback: timestamp
 	return fmt.Sprintf("%d", time.Now().UnixNano()%100000)
 }
@@ -47,9 +52,9 @@ func (a *KiroAuthenticator) Provider() string {
 }

 // RefreshLead indicates how soon before expiry a refresh should be attempted.
-// Set to 5 minutes to match Antigravity and avoid frequent refresh checks while still ensuring timely token refresh.
+// Set to 20 minutes for proactive refresh before token expiry.
 func (a *KiroAuthenticator) RefreshLead() *time.Duration {
-	d := 5 * time.Minute
+	d := 20 * time.Minute
 	return &d
 }

@@ -61,13 +66,19 @@ func (a *KiroAuthenticator) createAuthRecord(tokenData *kiroauth.KiroTokenData,
 		expiresAt = time.Now().Add(1 * time.Hour)
 	}

-	// Extract identifier for file naming
-	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn)
-
-	// Determine label based on auth method
-	label := fmt.Sprintf("kiro-%s", source)
+	// Determine label and identifier based on auth method
+	var label, idPart string
 	if tokenData.AuthMethod == "idc" {
 		label = "kiro-idc"
+		// For IDC auth, always use clientID as identifier
+		if tokenData.ClientID != "" {
+			idPart = kiroauth.SanitizeEmailForFilename(tokenData.ClientID)
+		} else {
+			idPart = fmt.Sprintf("%d", time.Now().UnixNano()%100000)
+		}
+	} else {
+		label = fmt.Sprintf("kiro-%s", source)
+		idPart = extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn, tokenData.ClientID)
 	}

 	now := time.Now()
@@ -121,8 +132,8 @@ func (a *KiroAuthenticator) createAuthRecord(tokenData *kiroauth.KiroTokenData,
 		UpdatedAt: now,
 		Metadata:  metadata,
 		Attributes: attributes,
-		// NextRefreshAfter is aligned with RefreshLead (5min)
-		NextRefreshAfter: expiresAt.Add(-5 * time.Minute),
+		// NextRefreshAfter: 20 minutes before expiry
+		NextRefreshAfter: expiresAt.Add(-20 * time.Minute),
 	}

 	if tokenData.Email != "" {
@@ -173,7 +184,7 @@ func (a *KiroAuthenticator) LoginWithAuthCode(ctx context.Context, cfg *config.C
 	}

 	// Extract identifier for file naming
-	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn)
+	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn, tokenData.ClientID)

 	now := time.Now()
 	fileName := fmt.Sprintf("kiro-aws-%s.json", idPart)
@@ -203,8 +214,8 @@ func (a *KiroAuthenticator) LoginWithAuthCode(ctx context.Context, cfg *config.C
 			"source":      "aws-builder-id-authcode",
 			"email":       tokenData.Email,
 		},
-		// NextRefreshAfter is aligned with RefreshLead (5min)
-		NextRefreshAfter: expiresAt.Add(-5 * time.Minute),
+		// NextRefreshAfter: 20 minutes before expiry
+		NextRefreshAfter: expiresAt.Add(-20 * time.Minute),
 	}

 	if tokenData.Email != "" {
@@ -217,129 +228,17 @@ func (a *KiroAuthenticator) LoginWithAuthCode(ctx context.Context, cfg *config.C
 }

 // LoginWithGoogle performs OAuth login for Kiro with Google.
-// This uses a custom protocol handler (kiro://) to receive the callback.
+// NOTE: Google login is not available for third-party applications due to AWS Cognito restrictions.
+// Please use AWS Builder ID or import your token from Kiro IDE.
 func (a *KiroAuthenticator) LoginWithGoogle(ctx context.Context, cfg *config.Config, opts *LoginOptions) (*coreauth.Auth, error) {
-	if cfg == nil {
-		return nil, fmt.Errorf("kiro auth: configuration is required")
-	}
-
-	oauth := kiroauth.NewKiroOAuth(cfg)
-
-	// Use Google OAuth flow with protocol handler
-	tokenData, err := oauth.LoginWithGoogle(ctx)
-	if err != nil {
-		return nil, fmt.Errorf("google login failed: %w", err)
-	}
-
-	// Parse expires_at
-	expiresAt, err := time.Parse(time.RFC3339, tokenData.ExpiresAt)
-	if err != nil {
-		expiresAt = time.Now().Add(1 * time.Hour)
-	}
-
-	// Extract identifier for file naming
-	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn)
-
-	now := time.Now()
-	fileName := fmt.Sprintf("kiro-google-%s.json", idPart)
-
-	record := &coreauth.Auth{
-		ID:        fileName,
-		Provider:  "kiro",
-		FileName:  fileName,
-		Label:     "kiro-google",
-		Status:    coreauth.StatusActive,
-		CreatedAt: now,
-		UpdatedAt: now,
-		Metadata: map[string]any{
-			"type":          "kiro",
-			"access_token":  tokenData.AccessToken,
-			"refresh_token": tokenData.RefreshToken,
-			"profile_arn":   tokenData.ProfileArn,
-			"expires_at":    tokenData.ExpiresAt,
-			"auth_method":   tokenData.AuthMethod,
-			"provider":      tokenData.Provider,
-			"email":         tokenData.Email,
-		},
-		Attributes: map[string]string{
-			"profile_arn": tokenData.ProfileArn,
-			"source":      "google-oauth",
-			"email":       tokenData.Email,
-		},
-		// NextRefreshAfter is aligned with RefreshLead (5min)
-		NextRefreshAfter: expiresAt.Add(-5 * time.Minute),
-	}
-
-	if tokenData.Email != "" {
-		fmt.Printf("\n✓ Kiro Google authentication completed successfully! (Account: %s)\n", tokenData.Email)
-	} else {
-		fmt.Println("\n✓ Kiro Google authentication completed successfully!")
-	}
-
-	return record, nil
+	return nil, fmt.Errorf("Google login is not available for third-party applications due to AWS Cognito restrictions.\n\nAlternatives:\n  1. Use AWS Builder ID: cliproxy kiro --builder-id\n  2. Import token from Kiro IDE: cliproxy kiro --import\n\nTo get a token from Kiro IDE:\n  1. Open Kiro IDE and login with Google\n  2. Find: ~/.kiro/kiro-auth-token.json\n  3. Run: cliproxy kiro --import")
 }

 // LoginWithGitHub performs OAuth login for Kiro with GitHub.
-// This uses a custom protocol handler (kiro://) to receive the callback.
+// NOTE: GitHub login is not available for third-party applications due to AWS Cognito restrictions.
+// Please use AWS Builder ID or import your token from Kiro IDE.
 func (a *KiroAuthenticator) LoginWithGitHub(ctx context.Context, cfg *config.Config, opts *LoginOptions) (*coreauth.Auth, error) {
-	if cfg == nil {
-		return nil, fmt.Errorf("kiro auth: configuration is required")
-	}
-
-	oauth := kiroauth.NewKiroOAuth(cfg)
-
-	// Use GitHub OAuth flow with protocol handler
-	tokenData, err := oauth.LoginWithGitHub(ctx)
-	if err != nil {
-		return nil, fmt.Errorf("github login failed: %w", err)
-	}
-
-	// Parse expires_at
-	expiresAt, err := time.Parse(time.RFC3339, tokenData.ExpiresAt)
-	if err != nil {
-		expiresAt = time.Now().Add(1 * time.Hour)
-	}
-
-	// Extract identifier for file naming
-	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn)
-
-	now := time.Now()
-	fileName := fmt.Sprintf("kiro-github-%s.json", idPart)
-
-	record := &coreauth.Auth{
-		ID:        fileName,
-		Provider:  "kiro",
-		FileName:  fileName,
-		Label:     "kiro-github",
-		Status:    coreauth.StatusActive,
-		CreatedAt: now,
-		UpdatedAt: now,
-		Metadata: map[string]any{
-			"type":          "kiro",
-			"access_token":  tokenData.AccessToken,
-			"refresh_token": tokenData.RefreshToken,
-			"profile_arn":   tokenData.ProfileArn,
-			"expires_at":    tokenData.ExpiresAt,
-			"auth_method":   tokenData.AuthMethod,
-			"provider":      tokenData.Provider,
-			"email":         tokenData.Email,
-		},
-		Attributes: map[string]string{
-			"profile_arn": tokenData.ProfileArn,
-			"source":      "github-oauth",
-			"email":       tokenData.Email,
-		},
-		// NextRefreshAfter is aligned with RefreshLead (5min)
-		NextRefreshAfter: expiresAt.Add(-5 * time.Minute),
-	}
-
-	if tokenData.Email != "" {
-		fmt.Printf("\n✓ Kiro GitHub authentication completed successfully! (Account: %s)\n", tokenData.Email)
-	} else {
-		fmt.Println("\n✓ Kiro GitHub authentication completed successfully!")
-	}
-
-	return record, nil
+	return nil, fmt.Errorf("GitHub login is not available for third-party applications due to AWS Cognito restrictions.\n\nAlternatives:\n  1. Use AWS Builder ID: cliproxy kiro --builder-id\n  2. Import token from Kiro IDE: cliproxy kiro --import\n\nTo get a token from Kiro IDE:\n  1. Open Kiro IDE and login with GitHub\n  2. Find: ~/.kiro/kiro-auth-token.json\n  3. Run: cliproxy kiro --import")
 }

 // ImportFromKiroIDE imports token from Kiro IDE's token file.
@@ -361,7 +260,7 @@ func (a *KiroAuthenticator) ImportFromKiroIDE(ctx context.Context, cfg *config.C
 	}

 	// Extract identifier for file naming
-	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn)
+	idPart := extractKiroIdentifier(tokenData.Email, tokenData.ProfileArn, tokenData.ClientID)
 	// Sanitize provider to prevent path traversal (defense-in-depth)
 	provider := kiroauth.SanitizeEmailForFilename(strings.ToLower(strings.TrimSpace(tokenData.Provider)))
 	if provider == "" {
@@ -387,15 +286,20 @@ func (a *KiroAuthenticator) ImportFromKiroIDE(ctx context.Context, cfg *config.C
 			"expires_at":    tokenData.ExpiresAt,
 			"auth_method":   tokenData.AuthMethod,
 			"provider":      tokenData.Provider,
+			"client_id":     tokenData.ClientID,
+			"client_secret": tokenData.ClientSecret,
 			"email":         tokenData.Email,
+			"region":        tokenData.Region,
+			"start_url":     tokenData.StartURL,
 		},
 		Attributes: map[string]string{
 			"profile_arn": tokenData.ProfileArn,
 			"source":      "kiro-ide-import",
 			"email":       tokenData.Email,
+			"region":      tokenData.Region,
 		},
-		// NextRefreshAfter is aligned with RefreshLead (5min)
-		NextRefreshAfter: expiresAt.Add(-5 * time.Minute),
+		// NextRefreshAfter: 20 minutes before expiry
+		NextRefreshAfter: expiresAt.Add(-20 * time.Minute),
 	}

 	// Display the email if extracted
@@ -463,8 +367,8 @@ func (a *KiroAuthenticator) Refresh(ctx context.Context, cfg *config.Config, aut
 	updated.Metadata["refresh_token"] = tokenData.RefreshToken
 	updated.Metadata["expires_at"] = tokenData.ExpiresAt
 	updated.Metadata["last_refresh"] = now.Format(time.RFC3339) // For double-check optimization
-	// NextRefreshAfter is aligned with RefreshLead (5min)
-	updated.NextRefreshAfter = expiresAt.Add(-5 * time.Minute)
+	// NextRefreshAfter: 20 minutes before expiry
+	updated.NextRefreshAfter = expiresAt.Add(-20 * time.Minute)

 	return updated, nil
 }
--- a/sdk/cliproxy/auth/conductor.go
+++ b/sdk/cliproxy/auth/conductor.go
@@ -47,7 +47,7 @@ type RefreshEvaluator interface {
 }

 const (
-	refreshCheckInterval  = 5 * time.Second
+	refreshCheckInterval  = 30 * time.Second
 	refreshPendingBackoff = time.Minute
 	refreshFailureBackoff = 1 * time.Minute
 	quotaBackoffBase      = time.Second
--- a/sdk/cliproxy/service.go
+++ b/sdk/cliproxy/service.go
@@ -13,6 +13,7 @@ import (
 	"time"

 	"github.com/router-for-me/CLIProxyAPI/v6/internal/api"
+	kiroauth "github.com/router-for-me/CLIProxyAPI/v6/internal/auth/kiro"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/registry"
 	"github.com/router-for-me/CLIProxyAPI/v6/internal/runtime/executor"
 	_ "github.com/router-for-me/CLIProxyAPI/v6/internal/usage"
@@ -97,6 +98,16 @@ func (s *Service) RegisterUsagePlugin(plugin usage.Plugin) {
 	usage.RegisterPlugin(plugin)
 }

+// GetWatcher returns the underlying WatcherWrapper instance.
+// This allows external components (e.g., RefreshManager) to interact with the watcher.
+// Returns nil if the service or watcher is not initialized.
+func (s *Service) GetWatcher() *WatcherWrapper {
+	if s == nil {
+		return nil
+	}
+	return s.watcher
+}
+
 // newDefaultAuthManager creates a default authentication manager with all supported providers.
 func newDefaultAuthManager() *sdkAuth.Manager {
 	return sdkAuth.NewManager(
@@ -574,6 +585,18 @@ func (s *Service) Run(ctx context.Context) error {
 	}
 	watcherWrapper.SetConfig(s.cfg)

+	// 方案 A: 连接 Kiro 后台刷新器回调到 Watcher
+	// 当后台刷新器成功刷新 token 后，立即通知 Watcher 更新内存中的 Auth 对象
+	// 这解决了后台刷新与内存 Auth 对象之间的时间差问题
+	kiroauth.GetRefreshManager().SetOnTokenRefreshed(func(tokenID string, tokenData *kiroauth.KiroTokenData) {
+		if tokenData == nil || watcherWrapper == nil {
+			return
+		}
+		log.Debugf("kiro refresh callback: notifying watcher for token %s", tokenID)
+		watcherWrapper.NotifyTokenRefreshed(tokenID, tokenData.AccessToken, tokenData.RefreshToken, tokenData.ExpiresAt)
+	})
+	log.Debug("kiro: connected background refresh callback to watcher")
+
 	watcherCtx, watcherCancel := context.WithCancel(context.Background())
 	s.watcherCancel = watcherCancel
 	if err = watcherWrapper.Start(watcherCtx); err != nil {
@@ -775,7 +798,7 @@ func (s *Service) registerModelsForAuth(a *coreauth.Auth) {
 		models = registry.GetGitHubCopilotModels()
 		models = applyExcludedModels(models, excluded)
 	case "kiro":
-		models = registry.GetKiroModels()
+		models = s.fetchKiroModels(a)
 		models = applyExcludedModels(models, excluded)
 	default:
 		// Handle OpenAI-compatibility providers by name using config
@@ -1338,3 +1361,201 @@ func applyOAuthModelAlias(cfg *config.Config, provider, authKind string, models
 	}
 	return out
 }
+
+// fetchKiroModels attempts to dynamically fetch Kiro models from the API.
+// If dynamic fetch fails, it falls back to static registry.GetKiroModels().
+func (s *Service) fetchKiroModels(a *coreauth.Auth) []*ModelInfo {
+	if a == nil {
+		log.Debug("kiro: auth is nil, using static models")
+		return registry.GetKiroModels()
+	}
+
+	// Extract token data from auth attributes
+	tokenData := s.extractKiroTokenData(a)
+	if tokenData == nil || tokenData.AccessToken == "" {
+		log.Debug("kiro: no valid token data in auth, using static models")
+		return registry.GetKiroModels()
+	}
+
+	// Create KiroAuth instance
+	kAuth := kiroauth.NewKiroAuth(s.cfg)
+	if kAuth == nil {
+		log.Warn("kiro: failed to create KiroAuth instance, using static models")
+		return registry.GetKiroModels()
+	}
+
+	// Use timeout context for API call
+	ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
+	defer cancel()
+
+	// Attempt to fetch dynamic models
+	apiModels, err := kAuth.ListAvailableModels(ctx, tokenData)
+	if err != nil {
+		log.Warnf("kiro: failed to fetch dynamic models: %v, using static models", err)
+		return registry.GetKiroModels()
+	}
+
+	if len(apiModels) == 0 {
+		log.Debug("kiro: API returned no models, using static models")
+		return registry.GetKiroModels()
+	}
+
+	// Convert API models to ModelInfo
+	models := convertKiroAPIModels(apiModels)
+
+	// Generate agentic variants
+	models = generateKiroAgenticVariants(models)
+
+	log.Infof("kiro: successfully fetched %d models from API (including agentic variants)", len(models))
+	return models
+}
+
+// extractKiroTokenData extracts KiroTokenData from auth attributes and metadata.
+func (s *Service) extractKiroTokenData(a *coreauth.Auth) *kiroauth.KiroTokenData {
+	if a == nil || a.Attributes == nil {
+		return nil
+	}
+
+	accessToken := strings.TrimSpace(a.Attributes["access_token"])
+	if accessToken == "" {
+		return nil
+	}
+
+	tokenData := &kiroauth.KiroTokenData{
+		AccessToken: accessToken,
+		ProfileArn:  strings.TrimSpace(a.Attributes["profile_arn"]),
+	}
+
+	// Also try to get refresh token from metadata
+	if a.Metadata != nil {
+		if rt, ok := a.Metadata["refresh_token"].(string); ok {
+			tokenData.RefreshToken = rt
+		}
+	}
+
+	return tokenData
+}
+
+// convertKiroAPIModels converts Kiro API models to ModelInfo slice.
+func convertKiroAPIModels(apiModels []*kiroauth.KiroModel) []*ModelInfo {
+	if len(apiModels) == 0 {
+		return nil
+	}
+
+	now := time.Now().Unix()
+	models := make([]*ModelInfo, 0, len(apiModels))
+
+	for _, m := range apiModels {
+		if m == nil || m.ModelID == "" {
+			continue
+		}
+
+		// Create model ID with kiro- prefix
+		modelID := "kiro-" + normalizeKiroModelID(m.ModelID)
+
+		info := &ModelInfo{
+			ID:                  modelID,
+			Object:              "model",
+			Created:             now,
+			OwnedBy:             "aws",
+			Type:                "kiro",
+			DisplayName:         formatKiroDisplayName(m.ModelName, m.RateMultiplier),
+			Description:         m.Description,
+			ContextLength:       200000,
+			MaxCompletionTokens: 64000,
+			Thinking:            &registry.ThinkingSupport{Min: 1024, Max: 32000, ZeroAllowed: true, DynamicAllowed: true},
+		}
+
+		if m.MaxInputTokens > 0 {
+			info.ContextLength = m.MaxInputTokens
+		}
+
+		models = append(models, info)
+	}
+
+	return models
+}
+
+// normalizeKiroModelID normalizes a Kiro model ID by converting dots to dashes
+// and removing common prefixes.
+func normalizeKiroModelID(modelID string) string {
+	// Remove common prefixes
+	modelID = strings.TrimPrefix(modelID, "anthropic.")
+	modelID = strings.TrimPrefix(modelID, "amazon.")
+
+	// Replace dots with dashes for consistency
+	modelID = strings.ReplaceAll(modelID, ".", "-")
+
+	// Replace underscores with dashes
+	modelID = strings.ReplaceAll(modelID, "_", "-")
+
+	return strings.ToLower(modelID)
+}
+
+// formatKiroDisplayName formats the display name with rate multiplier info.
+func formatKiroDisplayName(modelName string, rateMultiplier float64) string {
+	if modelName == "" {
+		return ""
+	}
+
+	displayName := "Kiro " + modelName
+	if rateMultiplier > 0 && rateMultiplier != 1.0 {
+		displayName += fmt.Sprintf(" (%.1fx credit)", rateMultiplier)
+	}
+
+	return displayName
+}
+
+// generateKiroAgenticVariants generates agentic variants for Kiro models.
+// Agentic variants have optimized system prompts for coding agents.
+func generateKiroAgenticVariants(models []*ModelInfo) []*ModelInfo {
+	if len(models) == 0 {
+		return models
+	}
+
+	result := make([]*ModelInfo, 0, len(models)*2)
+	result = append(result, models...)
+
+	for _, m := range models {
+		if m == nil {
+			continue
+		}
+
+		// Skip if already an agentic variant
+		if strings.HasSuffix(m.ID, "-agentic") {
+			continue
+		}
+
+		// Skip auto models from agentic variant generation
+		if strings.Contains(m.ID, "-auto") {
+			continue
+		}
+
+		// Create agentic variant
+		agentic := &ModelInfo{
+			ID:                  m.ID + "-agentic",
+			Object:              m.Object,
+			Created:             m.Created,
+			OwnedBy:             m.OwnedBy,
+			Type:                m.Type,
+			DisplayName:         m.DisplayName + " (Agentic)",
+			Description:         m.Description + " - Optimized for coding agents (chunked writes)",
+			ContextLength:       m.ContextLength,
+			MaxCompletionTokens: m.MaxCompletionTokens,
+		}
+
+		// Copy thinking support if present
+		if m.Thinking != nil {
+			agentic.Thinking = &registry.ThinkingSupport{
+				Min:            m.Thinking.Min,
+				Max:            m.Thinking.Max,
+				ZeroAllowed:    m.Thinking.ZeroAllowed,
+				DynamicAllowed: m.Thinking.DynamicAllowed,
+			}
+		}
+
+		result = append(result, agentic)
+	}
+
+	return result
+}
--- a/sdk/cliproxy/types.go
+++ b/sdk/cliproxy/types.go
@@ -89,6 +89,7 @@ type WatcherWrapper struct {
 	snapshotAuths         func() []*coreauth.Auth
 	setUpdateQueue        func(queue chan<- watcher.AuthUpdate)
 	dispatchRuntimeUpdate func(update watcher.AuthUpdate) bool
+	notifyTokenRefreshed  func(tokenID, accessToken, refreshToken, expiresAt string) // 方案 A: 后台刷新通知
 }

 // Start proxies to the underlying watcher Start implementation.
@@ -146,3 +147,16 @@ func (w *WatcherWrapper) SetAuthUpdateQueue(queue chan<- watcher.AuthUpdate) {
 	}
 	w.setUpdateQueue(queue)
 }
+
+// NotifyTokenRefreshed 通知 Watcher 后台刷新器已更新 token
+// 这是方案 A 的核心方法，用于解决后台刷新与内存 Auth 对象的时间差问题
+// tokenID: token 文件名（如 kiro-xxx.json）
+// accessToken: 新的 access token
+// refreshToken: 新的 refresh token
+// expiresAt: 新的过期时间（RFC3339 格式）
+func (w *WatcherWrapper) NotifyTokenRefreshed(tokenID, accessToken, refreshToken, expiresAt string) {
+	if w == nil || w.notifyTokenRefreshed == nil {
+		return
+	}
+	w.notifyTokenRefreshed(tokenID, accessToken, refreshToken, expiresAt)
+}
--- a/sdk/cliproxy/watcher.go
+++ b/sdk/cliproxy/watcher.go
@@ -31,5 +31,8 @@ func defaultWatcherFactory(configPath, authDir string, reload func(*config.Confi
 		dispatchRuntimeUpdate: func(update watcher.AuthUpdate) bool {
 			return w.DispatchRuntimeAuthUpdate(update)
 		},
+		notifyTokenRefreshed: func(tokenID, accessToken, refreshToken, expiresAt string) {
+			w.NotifyTokenRefreshed(tokenID, accessToken, refreshToken, expiresAt)
+		},
 	}, nil
 }
Author	SHA1	Message	Date
Luis Pater	51611c25d7	Merge branch 'router-for-me:main' into main	2026-01-21 22:12:28 +08:00
Luis Pater	eb1bbaa63b	Merge pull request #119 from linlang781/main 支持Kiro sso idc	2026-01-21 22:11:58 +08:00
yuechenglong.5	4c8026ac3d	chore(build): 更新 .gitignore 文件 - 添加 *.bak 文件扩展名到忽略列表	2026-01-21 21:38:47 +08:00
gogoing1024	8aeb4b7d54	Merge pull request #1 from gogoing1024/main Merge pull request #1 from linlang781/main	2026-01-21 21:09:34 +08:00
gogoing1024	b2172cb047	Merge pull request #1 from linlang781/main 1	2026-01-21 21:07:24 +08:00
Luis Pater	ef4508dbc8	refactor(cache, translator): remove session ID from signature caching and clean up logic	2026-01-21 13:37:10 +08:00
Luis Pater	f775e46fe2	refactor(translator): remove session ID logic from signature caching and associated tests	2026-01-21 12:45:07 +08:00
Luis Pater	65ad5c0c9d	refactor(cache): simplify signature caching by removing sessionID parameter	2026-01-21 12:38:05 +08:00
Luis Pater	88bf4e77ec	fix(translator): update `HasValidSignature` to require `modelName` parameter for improved validation	2026-01-21 11:31:37 +08:00
yuechenglong.5	194f66ca9c	feat(kiro): 添加后台令牌刷新通知机制 - 在 BackgroundRefresher 中添加 onTokenRefreshed 回调函数和并发安全锁 - 实现 WithOnTokenRefreshed 选项函数用于设置刷新成功回调 - 在 RefreshManager 中添加 SetOnTokenRefreshed 方法支持运行时更新回调 - 为 KiroExecutor 添加 reloadAuthFromFile 方法实现文件重新加载回退机制 - 在 Watcher 中实现 NotifyTokenRefreshed 方法处理刷新通知并更新内存Auth对象 - 通过 Service.GetWatcher 连接刷新器回调到 Watcher 通知链路 - 添加方案A和方案B双重保障解决后台刷新与内存对象时间差问题	2026-01-21 11:03:07 +08:00
Luis Pater	a4f8015caa	test(logging): add unit tests for `GinLogrusRecovery` middleware panic handling	2026-01-21 10:57:27 +08:00
Luis Pater	ffd129909e	Merge pull request #1130 from router-for-me/agty fix(executor): only strip maxOutputTokens for non-claude models	2026-01-21 10:50:39 +08:00
hkfires	9332316383	fix(translator): preserve thinking blocks by skipping signature	2026-01-21 10:49:20 +08:00
hkfires	6dcbbf64c3	fix(executor): only strip maxOutputTokens for non-claude models	2026-01-21 10:49:20 +08:00
yuechenglong.5	c9aa1ff99d	Merge remote-tracking branch 'origin/main' # Conflicts: # internal/auth/kiro/oauth_web.go	2026-01-21 10:31:55 +08:00
Luis Pater	2ce3553612	feat(cache): handle gemini family in signature cache with fallback validator logic	2026-01-21 10:11:21 +08:00
Luis Pater	2e14f787d4	feat(translator): enhance `ConvertGeminiRequestToAntigravity` with model name and refine reasoning block handling	2026-01-21 08:31:23 +08:00
Luis Pater	523b41ccd2	test(responses): add comprehensive tests for SSE event ordering and response transformations	2026-01-21 07:08:59 +08:00
Luis Pater	c6fa1d0e67	Merge pull request #1117 from router-for-me/cache fix(translator): enhance signature cache clearing logic and update test cases with model name	2026-01-20 23:18:48 +08:00
Luis Pater	ac56e1e88b	Merge pull request #1116 from bexcodex/fix/antigravity Fix antigravity malformed_function_call	2026-01-20 22:40:00 +08:00
781456868@qq.com	a9ee971e1c	fix(kiro): improve auto-refresh and IDC auth file handling Amp-Thread-ID: https://ampcode.com/threads/T-019bdb94-80e3-7302-be0f-a69937826d13 Co-authored-by: Amp <amp@ampcode.com>	2026-01-20 21:57:45 +08:00
781456868@qq.com	73cef3a25a	Merge remote-tracking branch 'upstream/main'	2026-01-20 21:57:16 +08:00
hkfires	9b72ea9efa	fix(translator): enhance signature cache clearing logic and update test cases with model name	2026-01-20 20:02:29 +08:00
bexcodex	9f364441e8	Fix antigravity malformed_function_call	2026-01-20 19:54:54 +08:00
Luis Pater	e49a1c07bf	chore(translator): update cache functions to include model name parameter in tests	2026-01-20 18:36:51 +08:00
yuechenglong.5	0f63d973be	Merge remote-tracking branch 'origin/main'	2026-01-20 10:20:03 +08:00
yuechenglong.5	fa2abd560a	chore: cherry-pick 文档更新和删除测试文件 - docs: 添加 Kiro OAuth web 认证端点说明 (`ace7c0c`) - chore: 删除包含敏感数据的测试文件 (`8f06f6a`) - 保留本地修改: refresh_manager, token_repository 等	2026-01-20 10:17:39 +08:00
yuechenglong.5	564c2d763e	Merge upstream/main (`08779cc`) - sync with original repo updates	2026-01-20 09:52:11 +08:00
781456868@qq.com	92fb6b012a	feat(kiro): add manual token refresh button to OAuth web UI Amp-Thread-ID: https://ampcode.com/threads/T-019bd642-9806-75d8-9101-27812e0eb6ab Co-authored-by: Amp <amp@ampcode.com>	2026-01-19 20:55:51 +08:00
781456868@qq.com	8f06f6a9ed	chore: remove test files containing sensitive data Amp-Thread-ID: https://ampcode.com/threads/T-019bd618-7e42-715a-960d-dd45425851e3 Co-authored-by: Amp <amp@ampcode.com>	2026-01-19 20:31:33 +08:00
781456868@qq.com	ace7c0ccb4	docs: add Kiro OAuth web authentication endpoint /v0/oauth/kiro	2026-01-19 20:28:40 +08:00
781456868@qq.com	f87fe0a0e8	feat: proactive token refresh 10 minutes before expiry Amp-Thread-ID: https://ampcode.com/threads/T-019bd618-7e42-715a-960d-dd45425851e3 Co-authored-by: Amp <amp@ampcode.com>	2026-01-19 20:09:38 +08:00
781456868@qq.com	87edc6f35e	Merge remote-tracking branch 'upstream/main'	2026-01-19 20:09:17 +08:00
781456868@qq.com	c9301a6d18	docs: update README with new features and Docker deployment guide	2026-01-18 15:07:29 +08:00
781456868@qq.com	0e77e93e5d	feat: add Kiro OAuth web, rate limiter, metrics, fingerprint, background refresh and model converter	2026-01-18 15:04:29 +08:00