feat: parallelize character eval runs

2026-04-23 14:45:46 +00:00 · 2026-04-08 20:05:24 +01:00
parent f1e75d3259
commit 21ef1bf8de
8 changed files with 219 additions and 56 deletions
--- a/docs/concepts/qa-e2e-automation.md
+++ b/docs/concepts/qa-e2e-automation.md
@@ -98,7 +98,9 @@ pnpm openclaw qa character-eval \
  --model xiaomi/mimo-v2-pro,thinking=high \
  --model google/gemini-3.1-pro-preview,thinking=high \
  --judge-model openai/gpt-5.4,thinking=xhigh,fast \
-  --judge-model anthropic/claude-opus-4-6,thinking=high
+  --judge-model anthropic/claude-opus-4-6,thinking=high \
+  --concurrency 8 \
+  --judge-concurrency 8
 ```

 The command runs local QA gateway child processes, not Docker. Character eval
@@ -118,6 +120,9 @@ single candidate or judge needs an override. Pass `--fast` only when you want to
 force fast mode on for every candidate model. Candidate and judge durations are
 recorded in the report for benchmark analysis, but judge prompts explicitly say
 not to rank by speed.
+Candidate and judge model runs both default to concurrency 8. Lower
+`--concurrency` or `--judge-concurrency` when provider limits or local gateway
+pressure make a run too noisy.
 When no candidate `--model` is passed, the character eval defaults to
 `openai/gpt-5.4`, `openai/gpt-5.2`, `anthropic/claude-opus-4-6`,
 `anthropic/claude-sonnet-4-6`, `minimax/MiniMax-M2.7`, `zai/glm-5.1`,