feat: add QA character eval reports

2026-04-26 07:57:40 +00:00 · 2026-04-08 15:52:49 +01:00
parent aa3b1357cb
commit 3101d81053
7 changed files with 734 additions and 2 deletions
--- a/docs/concepts/qa-e2e-automation.md
+++ b/docs/concepts/qa-e2e-automation.md
@@ -82,6 +82,23 @@ The report should answer:
 - What stayed blocked
 - What follow-up scenarios are worth adding

+For character and style checks, run the same scenario across multiple live model
+refs and write a judged Markdown report:
+
+```bash
+pnpm openclaw qa character-eval \
+  --model openai/gpt-5.4 \
+  --model anthropic/claude-opus-4-6 \
+  --model minimax/MiniMax-M2.7 \
+  --judge-model openai/gpt-5.4
+```
+
+The command runs local QA gateway child processes, not Docker. It preserves each
+full transcript, records basic run stats, then asks the judge model in fast mode
+with `xhigh` reasoning to rank the runs by naturalness, vibe, and humor.
+When no candidate `--model` is passed, the character eval defaults to
+`openai/gpt-5.4` and `anthropic/claude-opus-4-6`.
+
 ## Related docs

 - [Testing](/help/testing)