mirror of
https://github.com/moltbot/moltbot.git
synced 2026-04-24 07:01:49 +00:00
feat(qa): recreate qa lab docker stack
This commit is contained in:
@@ -1,865 +1,66 @@
|
||||
---
|
||||
title: "QA E2E Automation"
|
||||
summary: "Design note for a full end-to-end QA system built on a synthetic message-channel plugin, Dockerized OpenClaw, and subagent-driven scenario execution"
|
||||
summary: "Private QA automation shape for qa-lab, qa-channel, seeded scenarios, and protocol reports"
|
||||
read_when:
|
||||
- You are designing a true end-to-end QA harness for OpenClaw
|
||||
- You want a synthetic message channel for automated feature verification
|
||||
- You want subagents to discover features, run scenarios, and propose fixes
|
||||
- Extending qa-lab or qa-channel
|
||||
- Adding repo-backed QA scenarios
|
||||
- Building higher-realism QA automation around the Gateway dashboard
|
||||
title: "QA E2E Automation"
|
||||
---
|
||||
|
||||
# QA E2E Automation
|
||||
|
||||
This note proposes a true end-to-end QA system for OpenClaw built around a
|
||||
real channel plugin dedicated to testing.
|
||||
The private QA stack is meant to exercise OpenClaw in a more realistic,
|
||||
channel-shaped way than a single unit test can.
|
||||
|
||||
The core idea:
|
||||
Current pieces:
|
||||
|
||||
- run OpenClaw inside Docker in a realistic gateway configuration
|
||||
- expose a synthetic but full-featured message channel as a normal plugin
|
||||
- let a QA harness inject inbound traffic and inspect outbound state
|
||||
- let OpenClaw agents and subagents explore, verify, and report on behavior
|
||||
- optionally escalate failing scenarios into host-side fix workflows that open PRs
|
||||
- `extensions/qa-channel`: synthetic message channel with DM, channel, thread,
|
||||
reaction, edit, and delete surfaces.
|
||||
- `extensions/qa-lab`: debugger UI and QA bus for observing the transcript,
|
||||
injecting inbound messages, and exporting a Markdown report.
|
||||
- `qa/`: repo-backed seed assets for the kickoff task and baseline QA
|
||||
scenarios.
|
||||
|
||||
This is not a unit-test replacement. It is a product-level system test layer.
|
||||
The long-term goal is a two-pane QA site:
|
||||
|
||||
## Chosen direction
|
||||
- Left: Gateway dashboard (Control UI) with the agent.
|
||||
- Right: QA Lab, showing the Slack-ish transcript and scenario plan.
|
||||
|
||||
The initial direction for this project is:
|
||||
That lets an operator or automation loop give the agent a QA mission, observe
|
||||
real channel behavior, and record what worked, failed, or stayed blocked.
|
||||
|
||||
- build the full system inside this repo
|
||||
- test against a matrix, not a single model/provider pair
|
||||
- use Markdown reports as the first output artifact
|
||||
- defer auto-PR and auto-fix work until later
|
||||
- treat Slack-class semantics as the MVP transport target
|
||||
- keep orchestration simple in v1, with a host-side controller that exercises
|
||||
the moving parts directly
|
||||
- evolve toward OpenClaw becoming the orchestration layer later, once the
|
||||
transport, scenario, and reporting model are proven
|
||||
## Repo-backed seeds
|
||||
|
||||
## Goals
|
||||
Seed assets live in `qa/`:
|
||||
|
||||
- Test OpenClaw through a real messaging-channel boundary, not only `chat.send`
|
||||
or embedded mocks.
|
||||
- Verify channel semantics that matter for real use:
|
||||
- DMs
|
||||
- channels/groups
|
||||
- threads
|
||||
- edits
|
||||
- deletes
|
||||
- reactions
|
||||
- polls
|
||||
- attachments
|
||||
- Verify agent behavior across realistic user flows:
|
||||
- memory
|
||||
- thread binding
|
||||
- model switching
|
||||
- cron jobs
|
||||
- subagents
|
||||
- approvals
|
||||
- routing
|
||||
- channel-specific `message` actions
|
||||
- Make the QA runner capable of feature discovery:
|
||||
- read docs
|
||||
- inspect plugin capability discovery
|
||||
- inspect code and config
|
||||
- generate a scenario protocol
|
||||
- Support deterministic protocol tests and best-effort real-model tests as
|
||||
separate lanes.
|
||||
- Allow automated bug triage artifacts that can feed a host-side fix worker.
|
||||
- `qa/QA_KICKOFF_TASK.md`
|
||||
- `qa/seed-scenarios.json`
|
||||
|
||||
## Non-goals
|
||||
These are intentionally in git so the QA plan is visible to both humans and the
|
||||
agent. The baseline list should stay broad enough to cover:
|
||||
|
||||
- Not a replacement for existing unit, contract, or live tests.
|
||||
- Not a production channel.
|
||||
- Not a requirement that all bug fixing happen from inside the Dockerized
|
||||
OpenClaw runtime.
|
||||
- Not a reason to add test-only core branches for one channel.
|
||||
- DM and channel chat
|
||||
- thread behavior
|
||||
- message action lifecycle
|
||||
- cron callbacks
|
||||
- memory recall
|
||||
- model switching
|
||||
- subagent handoff
|
||||
- repo-reading and docs-reading
|
||||
- one small build task such as Lobster Invaders
|
||||
|
||||
## Why a channel plugin
|
||||
## Reporting
|
||||
|
||||
OpenClaw already has the right boundary:
|
||||
`qa-lab` exports a Markdown protocol report from the observed bus timeline.
|
||||
The report should answer:
|
||||
|
||||
- core owns the shared `message` tool, prompt wiring, outer session
|
||||
bookkeeping, and dispatch
|
||||
- channel plugins own:
|
||||
- config
|
||||
- pairing
|
||||
- security
|
||||
- session grammar
|
||||
- threading
|
||||
- outbound delivery
|
||||
- channel-owned actions and capability discovery
|
||||
- What worked
|
||||
- What failed
|
||||
- What stayed blocked
|
||||
- What follow-up scenarios are worth adding
|
||||
|
||||
That means the cleanest design is:
|
||||
## Related docs
|
||||
|
||||
- a real channel plugin for QA transport semantics
|
||||
- a separate QA control plane for injection and inspection
|
||||
|
||||
This keeps the test transport inside the same architecture used by Slack,
|
||||
Discord, Teams, and similar channels.
|
||||
|
||||
## System overview
|
||||
|
||||
The system has six pieces.
|
||||
|
||||
1. `qa-channel` plugin
|
||||
|
||||
- Bundled extension under `extensions/qa-channel`
|
||||
- Normal `ChannelPlugin`
|
||||
- Behaves like a Slack/Discord/Teams-class channel
|
||||
- Registers channel-owned message actions through the shared `message` tool
|
||||
|
||||
2. `qa-bus` sidecar
|
||||
|
||||
- Small HTTP and/or WS service
|
||||
- Canonical state store for synthetic conversations, messages, threads,
|
||||
reactions, edits, and event history
|
||||
- Accepts inbound events from the harness
|
||||
- Exposes inspection and wait APIs for assertions
|
||||
|
||||
3. Dockerized OpenClaw gateway
|
||||
|
||||
- Runs as close to real deployment as practical
|
||||
- Loads `qa-channel`
|
||||
- Uses normal config, routing, session, cron, and plugin loading
|
||||
|
||||
4. QA orchestrator
|
||||
|
||||
- Host-side runner or dedicated OpenClaw-driven controller
|
||||
- Provisions scenario environments
|
||||
- Seeds config
|
||||
- Resets state
|
||||
- Executes test matrix
|
||||
- Collects structured outcomes
|
||||
|
||||
5. Auto-fix worker
|
||||
|
||||
- Host-side workflow
|
||||
- Creates a worktree
|
||||
- launches a coding agent
|
||||
- runs scoped verification
|
||||
- opens a PR
|
||||
|
||||
The auto-fix worker should start outside the container. It needs direct repo
|
||||
and GitHub access, clean worktree control, and better isolation from the
|
||||
runtime under test.
|
||||
|
||||
6. `qa-lab` extension
|
||||
|
||||
- Bundled extension under `extensions/qa-lab`
|
||||
- Owns the QA harness, Markdown report flow, and private debugger UI
|
||||
- Registers hidden CLI entrypoints such as `openclaw qa run` and
|
||||
`openclaw qa ui`
|
||||
- Stays separate from the shipped Control UI bundle
|
||||
|
||||
## High-level flow
|
||||
|
||||
1. Start `qa-bus`.
|
||||
2. Start OpenClaw in Docker with `qa-channel` enabled.
|
||||
3. QA orchestrator injects inbound messages into `qa-bus`.
|
||||
4. `qa-channel` receives them as normal inbound traffic.
|
||||
5. OpenClaw runs the agent loop normally.
|
||||
6. Outbound replies and channel actions flow back through `qa-channel` into
|
||||
`qa-bus`.
|
||||
7. QA orchestrator inspects state or waits on events.
|
||||
8. Orchestrator records pass/fail/flaky/unknown plus artifacts.
|
||||
9. Severe failures optionally emit a bug packet for the host-side fix worker.
|
||||
|
||||
## Lanes
|
||||
|
||||
The system should have two distinct lanes.
|
||||
|
||||
### Lane A: deterministic protocol lane
|
||||
|
||||
Use a deterministic or tightly controlled model setup.
|
||||
|
||||
Preferred options:
|
||||
|
||||
- a canned provider fixture
|
||||
- the bundled `synthetic` provider when useful
|
||||
- fixed prompts with exact assertions
|
||||
|
||||
Purpose:
|
||||
|
||||
- verify transport and product semantics
|
||||
- keep flakiness low
|
||||
- catch regressions in routing, memory plumbing, thread binding, cron, and tool
|
||||
invocation
|
||||
|
||||
### Lane B: quality lane
|
||||
|
||||
Use real providers and real models in a matrix.
|
||||
|
||||
Purpose:
|
||||
|
||||
- verify that the agent can still do good work end to end
|
||||
- evaluate feature discoverability and instruction following
|
||||
- surface model-specific breakage or degraded behavior
|
||||
|
||||
Expected result type:
|
||||
|
||||
- best-effort
|
||||
- rubric-based
|
||||
- more tolerant of wording variation
|
||||
|
||||
Matrix guidance for v1:
|
||||
|
||||
- start with a small curated matrix, not "everything configured"
|
||||
- keep deterministic protocol runs separate from quality runs
|
||||
- report matrix cells independently so one provider/model failure does not hide
|
||||
transport correctness
|
||||
|
||||
Do not mix these lanes. Protocol correctness and model quality should fail
|
||||
independently.
|
||||
|
||||
## Use existing bootstrap seam first
|
||||
|
||||
Before the custom channel exists, OpenClaw already has a useful bootstrap path:
|
||||
|
||||
- admin-scoped synthetic originating-route fields on `chat.send`
|
||||
- synthetic message-channel headers for HTTP flows
|
||||
|
||||
That is enough to build a first QA controller for:
|
||||
|
||||
- thread/session routing
|
||||
- ACP bind flows
|
||||
- subagent delivery
|
||||
- cron wake paths
|
||||
- memory persistence checks
|
||||
|
||||
This should be Phase 0 because it de-risks the scenario protocol before the
|
||||
full channel lands.
|
||||
|
||||
## `qa-lab` extension design
|
||||
|
||||
`qa-lab` is the private operator-facing half of this system.
|
||||
|
||||
Suggested package:
|
||||
|
||||
- `extensions/qa-lab/`
|
||||
|
||||
Suggested responsibilities:
|
||||
|
||||
- host the synthetic bus state machine
|
||||
- host the scenario runner
|
||||
- write Markdown reports
|
||||
- serve a private debugger UI on a separate local server
|
||||
- keep that UI entirely outside the shipped Control UI bundle
|
||||
|
||||
Suggested UI shape:
|
||||
|
||||
- left rail for conversations and threads
|
||||
- center transcript pane
|
||||
- right rail for event stream and report inspection
|
||||
- bottom inject-composer for inbound QA traffic
|
||||
|
||||
## `qa-channel` plugin design
|
||||
|
||||
## Package layout
|
||||
|
||||
Suggested package:
|
||||
|
||||
- `extensions/qa-channel/`
|
||||
|
||||
Suggested file layout:
|
||||
|
||||
- `package.json`
|
||||
- `openclaw.plugin.json`
|
||||
- `index.ts`
|
||||
- `setup-entry.ts`
|
||||
- `api.ts`
|
||||
- `runtime-api.ts`
|
||||
- `src/channel.ts`
|
||||
- `src/channel-api.ts`
|
||||
- `src/config-schema.ts`
|
||||
- `src/setup-core.ts`
|
||||
- `src/setup-surface.ts`
|
||||
- `src/runtime.ts`
|
||||
- `src/channel.runtime.ts`
|
||||
- `src/inbound.ts`
|
||||
- `src/outbound.ts`
|
||||
- `src/state-client.ts`
|
||||
- `src/targets.ts`
|
||||
- `src/threading.ts`
|
||||
- `src/message-actions.ts`
|
||||
- `src/probe.ts`
|
||||
- `src/doctor.ts`
|
||||
- `src/*.test.ts`
|
||||
|
||||
Model it after Slack, Discord, Teams, or Google Chat packaging, not as a one-off
|
||||
test helper.
|
||||
|
||||
## Capabilities
|
||||
|
||||
MVP capabilities:
|
||||
|
||||
- one account
|
||||
- DMs
|
||||
- channels
|
||||
- threads
|
||||
- send text
|
||||
- reply in thread
|
||||
- read
|
||||
- edit
|
||||
- delete
|
||||
- react
|
||||
- search
|
||||
- upload-file
|
||||
- download-file
|
||||
|
||||
Phase 2 capabilities:
|
||||
|
||||
- polls
|
||||
- member-info
|
||||
- channel-info
|
||||
- channel-list
|
||||
- pin and unpin
|
||||
- permissions
|
||||
- topic create and edit
|
||||
|
||||
These map naturally onto the shared `message` tool action model already used by
|
||||
channel plugins.
|
||||
|
||||
## Conversation model
|
||||
|
||||
Use a stable synthetic grammar that supports both simplicity and realistic
|
||||
coverage.
|
||||
|
||||
Suggested ids:
|
||||
|
||||
- DM conversation: `dm:<user-id>`
|
||||
- channel: `chan:<space-id>`
|
||||
- thread: `thread:<space-id>:<thread-id>`
|
||||
- message id: `msg:<ulid>`
|
||||
|
||||
Suggested target forms:
|
||||
|
||||
- `qa:dm:<user-id>`
|
||||
- `qa:chan:<space-id>`
|
||||
- `qa:thread:<space-id>:<thread-id>`
|
||||
|
||||
The plugin should own translation between external target strings and canonical
|
||||
conversation ids.
|
||||
|
||||
## Pairing and security
|
||||
|
||||
Even though this is a QA channel, it should still implement real policy
|
||||
surfaces:
|
||||
|
||||
- DM allowlist / pairing flow
|
||||
- group policy
|
||||
- mention gating where relevant
|
||||
- trusted sender ids
|
||||
|
||||
Reason:
|
||||
|
||||
- these are product features and should be testable through the QA transport
|
||||
- the QA lane should be able to verify policy failures, not only happy paths
|
||||
|
||||
## Threading model
|
||||
|
||||
Threading is one of the main reasons to build this channel.
|
||||
|
||||
Required semantics:
|
||||
|
||||
- create thread from a top-level message
|
||||
- reply inside an existing thread
|
||||
- list thread messages
|
||||
- preserve parent message linkage
|
||||
- let OpenClaw thread binding attach a session to a thread
|
||||
|
||||
The QA bus must preserve:
|
||||
|
||||
- conversation id
|
||||
- thread id
|
||||
- parent message id
|
||||
- sender id
|
||||
- timestamps
|
||||
|
||||
## Channel-owned message actions
|
||||
|
||||
The plugin should implement `actions.describeMessageTool(...)` and
|
||||
`actions.handleAction(...)`.
|
||||
|
||||
MVP action list:
|
||||
|
||||
- `send`
|
||||
- `read`
|
||||
- `reply`
|
||||
- `react`
|
||||
- `edit`
|
||||
- `delete`
|
||||
- `thread-create`
|
||||
- `thread-reply`
|
||||
- `search`
|
||||
- `upload-file`
|
||||
- `download-file`
|
||||
|
||||
This is enough to test the shared `message` tool end to end with real channel
|
||||
semantics.
|
||||
|
||||
## `qa-bus` design
|
||||
|
||||
`qa-bus` is the transport simulator and assertion backend.
|
||||
|
||||
It should not know OpenClaw internals. It should know channel state.
|
||||
|
||||
For v1, keep `qa-bus` in this repo so:
|
||||
|
||||
- fixtures and scenarios evolve with product code
|
||||
- the transport contract can change in lock-step with the plugin
|
||||
- CI and local dev do not need another repo checkout
|
||||
|
||||
## Responsibilities
|
||||
|
||||
- accept inbound user/platform events
|
||||
- persist canonical conversation state
|
||||
- persist append-only event log
|
||||
- expose inspection APIs
|
||||
- expose blocking wait APIs
|
||||
- support reset per scenario or per suite
|
||||
|
||||
## Transport
|
||||
|
||||
HTTP is enough for MVP.
|
||||
|
||||
Suggested endpoints:
|
||||
|
||||
- `POST /reset`
|
||||
- `POST /inbound/message`
|
||||
- `POST /inbound/edit`
|
||||
- `POST /inbound/delete`
|
||||
- `POST /inbound/reaction`
|
||||
- `POST /inbound/thread/create`
|
||||
- `GET /state/conversations`
|
||||
- `GET /state/messages`
|
||||
- `GET /state/threads`
|
||||
- `GET /events`
|
||||
- `POST /wait`
|
||||
|
||||
Optional WS stream:
|
||||
|
||||
- `/stream`
|
||||
|
||||
Useful for live event taps and debugging.
|
||||
|
||||
## State model
|
||||
|
||||
Persist three layers.
|
||||
|
||||
1. Conversation snapshot
|
||||
|
||||
- participants
|
||||
- type
|
||||
- thread topology
|
||||
- latest message pointers
|
||||
|
||||
2. Message snapshot
|
||||
|
||||
- sender
|
||||
- content
|
||||
- attachments
|
||||
- edit history
|
||||
- reactions
|
||||
- parent and thread linkage
|
||||
|
||||
3. Append-only event log
|
||||
|
||||
- canonical timestamp
|
||||
- causal ordering
|
||||
- source: inbound, outbound, action, system
|
||||
- payload
|
||||
|
||||
The append-only log matters because many QA assertions are event-oriented, not
|
||||
just state-oriented.
|
||||
|
||||
## Assertion API
|
||||
|
||||
The harness needs waiters, not just snapshots.
|
||||
|
||||
Suggested `POST /wait` contract:
|
||||
|
||||
- `kind`
|
||||
- `match`
|
||||
- `timeoutMs`
|
||||
|
||||
Examples:
|
||||
|
||||
- wait for outbound message matching text regex
|
||||
- wait for thread creation
|
||||
- wait for reaction added
|
||||
- wait for message edit
|
||||
- wait for no event of type X within Y ms
|
||||
|
||||
This gives stable tests without custom polling code in every scenario.
|
||||
|
||||
## QA orchestrator design
|
||||
|
||||
The orchestrator should own scenario planning and artifact collection.
|
||||
|
||||
Start host-side. Later, OpenClaw can orchestrate parts of it.
|
||||
|
||||
This is the chosen v1 direction.
|
||||
|
||||
Why:
|
||||
|
||||
- simpler to iterate while the transport and scenario protocol are still moving
|
||||
- easier access to the repo, logs, Docker, and test fixtures
|
||||
- easier artifact collection and report generation
|
||||
- avoids over-coupling the first version to subagent behavior before the QA
|
||||
protocol itself is stable
|
||||
|
||||
## Inputs
|
||||
|
||||
- docs pages
|
||||
- channel capability discovery
|
||||
- configured provider/model lane
|
||||
- scenario catalog
|
||||
- repo/test metadata
|
||||
|
||||
## Outputs
|
||||
|
||||
- structured protocol report
|
||||
- scenario transcript
|
||||
- captured channel state
|
||||
- gateway logs
|
||||
- failure packets
|
||||
|
||||
For v1, the primary output is a Markdown report.
|
||||
|
||||
Suggested report sections:
|
||||
|
||||
- suite summary
|
||||
- environment
|
||||
- provider/model matrix
|
||||
- scenarios passed
|
||||
- scenarios failed
|
||||
- flaky or inconclusive scenarios
|
||||
- captured evidence links or inline excerpts
|
||||
- suspected ownership or file hints
|
||||
- follow-up recommendations
|
||||
|
||||
## Scenario format
|
||||
|
||||
Use a data-driven scenario spec.
|
||||
|
||||
Suggested shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "thread-memory-recall",
|
||||
"lane": "deterministic",
|
||||
"preconditions": ["qa-channel", "memory-enabled"],
|
||||
"steps": [
|
||||
{
|
||||
"type": "injectMessage",
|
||||
"to": "qa:dm:user-a",
|
||||
"text": "Remember that the deploy key is kiwi."
|
||||
},
|
||||
{ "type": "waitForOutbound", "match": { "textIncludes": "kiwi" } },
|
||||
{ "type": "injectMessage", "to": "qa:dm:user-a", "text": "What was the deploy key?" },
|
||||
{ "type": "waitForOutbound", "match": { "textIncludes": "kiwi" } }
|
||||
],
|
||||
"assertions": [{ "type": "outboundTextIncludes", "value": "kiwi" }]
|
||||
}
|
||||
```
|
||||
|
||||
Keep the execution engine generic and the scenario catalog declarative.
|
||||
|
||||
## Feature discovery
|
||||
|
||||
The orchestrator can discover candidate scenarios from three sources.
|
||||
|
||||
1. Docs
|
||||
|
||||
- channel docs
|
||||
- testing docs
|
||||
- gateway docs
|
||||
- subagents docs
|
||||
- cron docs
|
||||
|
||||
2. Runtime capability discovery
|
||||
|
||||
- channel `message` action discovery
|
||||
- plugin status and channel capabilities
|
||||
- configured providers/models
|
||||
|
||||
3. Code hints
|
||||
|
||||
- known action names
|
||||
- channel-specific feature flags
|
||||
- config schema
|
||||
|
||||
This should produce a proposed protocol with:
|
||||
|
||||
- must-test
|
||||
- can-test
|
||||
- blocked
|
||||
- unsupported
|
||||
|
||||
## Scenario classes
|
||||
|
||||
Recommended catalog:
|
||||
|
||||
- transport basics
|
||||
- DM send and reply
|
||||
- channel send
|
||||
- thread create and reply
|
||||
- reaction add and read
|
||||
- edit and delete
|
||||
- policy
|
||||
- allowlist
|
||||
- pairing
|
||||
- group mention gating
|
||||
- shared `message` tool
|
||||
- read
|
||||
- search
|
||||
- reply
|
||||
- react
|
||||
- upload and download
|
||||
- agent quality
|
||||
- follows channel context
|
||||
- obeys thread semantics
|
||||
- uses memory across turns
|
||||
- switches model when instructed
|
||||
- automation
|
||||
- cron add and run
|
||||
- cron delivery into channel
|
||||
- scheduled reminders
|
||||
- subagents
|
||||
- spawn
|
||||
- announce
|
||||
- threaded follow-up
|
||||
- nested orchestration when enabled
|
||||
- failure handling
|
||||
- unsupported action
|
||||
- timeout
|
||||
- malformed target
|
||||
- policy denial
|
||||
|
||||
## OpenClaw as orchestrator
|
||||
|
||||
Longer-term, OpenClaw itself can coordinate the QA run.
|
||||
|
||||
Suggested architecture:
|
||||
|
||||
- one controller session
|
||||
- N worker subagents
|
||||
- each worker owns one scenario or scenario shard
|
||||
- workers report structured results back to controller
|
||||
|
||||
Good fits for existing OpenClaw primitives:
|
||||
|
||||
- `sessions_spawn`
|
||||
- `subagents`
|
||||
- cron-based wakeups for long-running suites
|
||||
- thread-bound sessions for scenario-local follow-up
|
||||
|
||||
Best near-term use:
|
||||
|
||||
- controller generates the plan
|
||||
- workers execute scenarios in parallel
|
||||
- controller synthesizes report
|
||||
|
||||
Avoid making the controller also own host Git operations in the first version.
|
||||
|
||||
Chosen direction:
|
||||
|
||||
- v1: host-side controller
|
||||
- v2+: OpenClaw-native orchestration once the scenario protocol and transport
|
||||
model are stable
|
||||
|
||||
## Auto-fix workflow
|
||||
|
||||
The system should emit a structured bug packet when a scenario fails.
|
||||
|
||||
Suggested bug packet:
|
||||
|
||||
- scenario id
|
||||
- lane
|
||||
- failure kind
|
||||
- minimal repro steps
|
||||
- channel event transcript
|
||||
- gateway transcript
|
||||
- logs
|
||||
- suspected files
|
||||
- confidence
|
||||
|
||||
Host-side fix worker flow:
|
||||
|
||||
1. receive bug packet
|
||||
2. create detached worktree
|
||||
3. launch coding agent in worktree
|
||||
4. write failing regression first when practical
|
||||
5. implement fix
|
||||
6. run scoped verification
|
||||
7. open PR
|
||||
|
||||
This should remain host-side at first because it needs:
|
||||
|
||||
- repo write access
|
||||
- worktree hygiene
|
||||
- git credentials
|
||||
- GitHub auth
|
||||
|
||||
Chosen direction:
|
||||
|
||||
- do not auto-open PRs in v1
|
||||
- emit Markdown reports and structured failure packets first
|
||||
- add host-side worktree + PR automation later
|
||||
|
||||
## Rollout plan
|
||||
|
||||
## Phase 0: bootstrap on existing synthetic ingress
|
||||
|
||||
Build a first QA runner without a new channel:
|
||||
|
||||
- use `chat.send` with admin-scoped synthetic originating-route fields
|
||||
- run deterministic scenarios against routing, memory, cron, subagents, and ACP
|
||||
- validate protocol format and artifact collection
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- scenario runner exists
|
||||
- structured protocol report exists
|
||||
- failure artifacts exist
|
||||
|
||||
## Phase 1: MVP `qa-channel`
|
||||
|
||||
Build the plugin and bus with:
|
||||
|
||||
- DM
|
||||
- channels
|
||||
- threads
|
||||
- read
|
||||
- reply
|
||||
- react
|
||||
- edit
|
||||
- delete
|
||||
- search
|
||||
|
||||
Target semantics:
|
||||
|
||||
- Slack-class transport behavior
|
||||
- not full Teams-class parity yet
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- OpenClaw in Docker can talk to `qa-bus`
|
||||
- harness can inject + inspect
|
||||
- one green end-to-end suite across message transport and agent behavior
|
||||
|
||||
## Phase 2: protocol expansion
|
||||
|
||||
Add:
|
||||
|
||||
- attachments
|
||||
- polls
|
||||
- pins
|
||||
- richer policy tests
|
||||
- quality lane with real provider/model matrix
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- scenario matrix covers major built-in features
|
||||
- deterministic and quality lanes are separated
|
||||
|
||||
## Phase 3: subagent-driven QA
|
||||
|
||||
Add:
|
||||
|
||||
- controller agent
|
||||
- worker subagents
|
||||
- scenario discovery from docs + capability discovery
|
||||
- parallel execution
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- one controller can fan out and synthesize a suite report
|
||||
|
||||
## Phase 4: auto-fix loop
|
||||
|
||||
Add:
|
||||
|
||||
- bug packet emission
|
||||
- host-side worktree runner
|
||||
- PR creation
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- selected failures can auto-produce draft PRs
|
||||
|
||||
## Risks
|
||||
|
||||
## Risk: too much magic in one layer
|
||||
|
||||
If the QA channel, bus, and orchestrator all become smart at once, debugging
|
||||
will be painful.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- keep `qa-channel` transport-focused
|
||||
- keep `qa-bus` state-focused
|
||||
- keep orchestrator separate
|
||||
|
||||
## Risk: flaky assertions from model variance
|
||||
|
||||
Mitigation:
|
||||
|
||||
- deterministic lane
|
||||
- quality lane
|
||||
- different pass criteria
|
||||
|
||||
## Risk: test-only branches leaking into core
|
||||
|
||||
Mitigation:
|
||||
|
||||
- no core special cases for `qa-channel`
|
||||
- use normal plugin seams
|
||||
- use admin synthetic ingress only as bootstrap
|
||||
|
||||
## Risk: auto-fix overreach
|
||||
|
||||
Mitigation:
|
||||
|
||||
- keep fix worker host-side
|
||||
- require explicit policy for when PRs can open automatically
|
||||
- gate with scoped tests
|
||||
|
||||
## Risk: building a fake platform nobody uses
|
||||
|
||||
Mitigation:
|
||||
|
||||
- emulate Slack/Discord/Teams semantics, not an abstract transport
|
||||
- prioritize features that stress shared OpenClaw boundaries
|
||||
|
||||
## MVP recommendation
|
||||
|
||||
If building this now, start with this exact order.
|
||||
|
||||
1. Host-side scenario runner using existing synthetic originating-route support.
|
||||
2. `qa-bus` sidecar with state, events, reset, and wait APIs.
|
||||
3. `extensions/qa-channel` MVP with DMs, channels, threads, reply, read, react,
|
||||
edit, delete, and search.
|
||||
4. Markdown report generator for suite + matrix output.
|
||||
5. One deterministic end-to-end suite:
|
||||
- inject inbound DM
|
||||
- verify reply
|
||||
- create thread
|
||||
- verify follow-up in thread
|
||||
- verify memory recall on later turn
|
||||
6. Add curated real-model matrix quality lane.
|
||||
7. Add controller subagent orchestration.
|
||||
8. Add host-side auto-fix worktree runner.
|
||||
|
||||
This order gets real value quickly without requiring the full grand design to
|
||||
land before the first useful signal appears.
|
||||
|
||||
## Current product decisions
|
||||
|
||||
- `qa-bus` lives inside this repo
|
||||
- the first controller is host-side
|
||||
- Slack-class behavior is the MVP target
|
||||
- the quality lane uses a curated matrix
|
||||
- first version produces Markdown reports, not PRs
|
||||
- OpenClaw-native orchestration is a later phase, not a v1 requirement
|
||||
- [Testing](/help/testing)
|
||||
- [QA Channel](/channels/qa-channel)
|
||||
- [Dashboard](/web/dashboard)
|
||||
|
||||
Reference in New Issue
Block a user