mirror of
https://github.com/moltbot/moltbot.git
synced 2026-03-07 22:44:16 +00:00
docs: canonicalize docs paths and align zh navigation (#11428)
* docs(navigation): canonicalize paths and align zh nav * chore(docs): remove stray .DS_Store * docs(scripts): add non-mint docs link audit * docs(nav): fix zh source paths and preserve legacy redirects (#11428) (thanks @sebslight) * chore(docs): satisfy lint for docs link audit script (#11428) (thanks @sebslight)
This commit is contained in:
162
docs/help/debugging.md
Normal file
162
docs/help/debugging.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
summary: "Debugging tools: watch mode, raw model streams, and tracing reasoning leakage"
|
||||
read_when:
|
||||
- You need to inspect raw model output for reasoning leakage
|
||||
- You want to run the Gateway in watch mode while iterating
|
||||
- You need a repeatable debugging workflow
|
||||
title: "Debugging"
|
||||
---
|
||||
|
||||
# Debugging
|
||||
|
||||
This page covers debugging helpers for streaming output, especially when a
|
||||
provider mixes reasoning into normal text.
|
||||
|
||||
## Runtime debug overrides
|
||||
|
||||
Use `/debug` in chat to set **runtime-only** config overrides (memory, not disk).
|
||||
`/debug` is disabled by default; enable with `commands.debug: true`.
|
||||
This is handy when you need to toggle obscure settings without editing `openclaw.json`.
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
/debug show
|
||||
/debug set messages.responsePrefix="[openclaw]"
|
||||
/debug unset messages.responsePrefix
|
||||
/debug reset
|
||||
```
|
||||
|
||||
`/debug reset` clears all overrides and returns to the on-disk config.
|
||||
|
||||
## Gateway watch mode
|
||||
|
||||
For fast iteration, run the gateway under the file watcher:
|
||||
|
||||
```bash
|
||||
pnpm gateway:watch --force
|
||||
```
|
||||
|
||||
This maps to:
|
||||
|
||||
```bash
|
||||
tsx watch src/entry.ts gateway --force
|
||||
```
|
||||
|
||||
Add any gateway CLI flags after `gateway:watch` and they will be passed through
|
||||
on each restart.
|
||||
|
||||
## Dev profile + dev gateway (--dev)
|
||||
|
||||
Use the dev profile to isolate state and spin up a safe, disposable setup for
|
||||
debugging. There are **two** `--dev` flags:
|
||||
|
||||
- **Global `--dev` (profile):** isolates state under `~/.openclaw-dev` and
|
||||
defaults the gateway port to `19001` (derived ports shift with it).
|
||||
- **`gateway --dev`: tells the Gateway to auto-create a default config +
|
||||
workspace** when missing (and skip BOOTSTRAP.md).
|
||||
|
||||
Recommended flow (dev profile + dev bootstrap):
|
||||
|
||||
```bash
|
||||
pnpm gateway:dev
|
||||
OPENCLAW_PROFILE=dev openclaw tui
|
||||
```
|
||||
|
||||
If you don’t have a global install yet, run the CLI via `pnpm openclaw ...`.
|
||||
|
||||
What this does:
|
||||
|
||||
1. **Profile isolation** (global `--dev`)
|
||||
- `OPENCLAW_PROFILE=dev`
|
||||
- `OPENCLAW_STATE_DIR=~/.openclaw-dev`
|
||||
- `OPENCLAW_CONFIG_PATH=~/.openclaw-dev/openclaw.json`
|
||||
- `OPENCLAW_GATEWAY_PORT=19001` (browser/canvas shift accordingly)
|
||||
|
||||
2. **Dev bootstrap** (`gateway --dev`)
|
||||
- Writes a minimal config if missing (`gateway.mode=local`, bind loopback).
|
||||
- Sets `agent.workspace` to the dev workspace.
|
||||
- Sets `agent.skipBootstrap=true` (no BOOTSTRAP.md).
|
||||
- Seeds the workspace files if missing:
|
||||
`AGENTS.md`, `SOUL.md`, `TOOLS.md`, `IDENTITY.md`, `USER.md`, `HEARTBEAT.md`.
|
||||
- Default identity: **C3‑PO** (protocol droid).
|
||||
- Skips channel providers in dev mode (`OPENCLAW_SKIP_CHANNELS=1`).
|
||||
|
||||
Reset flow (fresh start):
|
||||
|
||||
```bash
|
||||
pnpm gateway:dev:reset
|
||||
```
|
||||
|
||||
Note: `--dev` is a **global** profile flag and gets eaten by some runners.
|
||||
If you need to spell it out, use the env var form:
|
||||
|
||||
```bash
|
||||
OPENCLAW_PROFILE=dev openclaw gateway --dev --reset
|
||||
```
|
||||
|
||||
`--reset` wipes config, credentials, sessions, and the dev workspace (using
|
||||
`trash`, not `rm`), then recreates the default dev setup.
|
||||
|
||||
Tip: if a non‑dev gateway is already running (launchd/systemd), stop it first:
|
||||
|
||||
```bash
|
||||
openclaw gateway stop
|
||||
```
|
||||
|
||||
## Raw stream logging (OpenClaw)
|
||||
|
||||
OpenClaw can log the **raw assistant stream** before any filtering/formatting.
|
||||
This is the best way to see whether reasoning is arriving as plain text deltas
|
||||
(or as separate thinking blocks).
|
||||
|
||||
Enable it via CLI:
|
||||
|
||||
```bash
|
||||
pnpm gateway:watch --force --raw-stream
|
||||
```
|
||||
|
||||
Optional path override:
|
||||
|
||||
```bash
|
||||
pnpm gateway:watch --force --raw-stream --raw-stream-path ~/.openclaw/logs/raw-stream.jsonl
|
||||
```
|
||||
|
||||
Equivalent env vars:
|
||||
|
||||
```bash
|
||||
OPENCLAW_RAW_STREAM=1
|
||||
OPENCLAW_RAW_STREAM_PATH=~/.openclaw/logs/raw-stream.jsonl
|
||||
```
|
||||
|
||||
Default file:
|
||||
|
||||
`~/.openclaw/logs/raw-stream.jsonl`
|
||||
|
||||
## Raw chunk logging (pi-mono)
|
||||
|
||||
To capture **raw OpenAI-compat chunks** before they are parsed into blocks,
|
||||
pi-mono exposes a separate logger:
|
||||
|
||||
```bash
|
||||
PI_RAW_STREAM=1
|
||||
```
|
||||
|
||||
Optional path:
|
||||
|
||||
```bash
|
||||
PI_RAW_STREAM_PATH=~/.pi-mono/logs/raw-openai-completions.jsonl
|
||||
```
|
||||
|
||||
Default file:
|
||||
|
||||
`~/.pi-mono/logs/raw-openai-completions.jsonl`
|
||||
|
||||
> Note: this is only emitted by processes using pi-mono’s
|
||||
> `openai-completions` provider.
|
||||
|
||||
## Safety notes
|
||||
|
||||
- Raw stream logs can include full prompts, tool output, and user data.
|
||||
- Keep logs local and delete them after debugging.
|
||||
- If you share logs, scrub secrets and PII first.
|
||||
81
docs/help/environment.md
Normal file
81
docs/help/environment.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
summary: "Where OpenClaw loads environment variables and the precedence order"
|
||||
read_when:
|
||||
- You need to know which env vars are loaded, and in what order
|
||||
- You are debugging missing API keys in the Gateway
|
||||
- You are documenting provider auth or deployment environments
|
||||
title: "Environment Variables"
|
||||
---
|
||||
|
||||
# Environment variables
|
||||
|
||||
OpenClaw pulls environment variables from multiple sources. The rule is **never override existing values**.
|
||||
|
||||
## Precedence (highest → lowest)
|
||||
|
||||
1. **Process environment** (what the Gateway process already has from the parent shell/daemon).
|
||||
2. **`.env` in the current working directory** (dotenv default; does not override).
|
||||
3. **Global `.env`** at `~/.openclaw/.env` (aka `$OPENCLAW_STATE_DIR/.env`; does not override).
|
||||
4. **Config `env` block** in `~/.openclaw/openclaw.json` (applied only if missing).
|
||||
5. **Optional login-shell import** (`env.shellEnv.enabled` or `OPENCLAW_LOAD_SHELL_ENV=1`), applied only for missing expected keys.
|
||||
|
||||
If the config file is missing entirely, step 4 is skipped; shell import still runs if enabled.
|
||||
|
||||
## Config `env` block
|
||||
|
||||
Two equivalent ways to set inline env vars (both are non-overriding):
|
||||
|
||||
```json5
|
||||
{
|
||||
env: {
|
||||
OPENROUTER_API_KEY: "sk-or-...",
|
||||
vars: {
|
||||
GROQ_API_KEY: "gsk-...",
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Shell env import
|
||||
|
||||
`env.shellEnv` runs your login shell and imports only **missing** expected keys:
|
||||
|
||||
```json5
|
||||
{
|
||||
env: {
|
||||
shellEnv: {
|
||||
enabled: true,
|
||||
timeoutMs: 15000,
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Env var equivalents:
|
||||
|
||||
- `OPENCLAW_LOAD_SHELL_ENV=1`
|
||||
- `OPENCLAW_SHELL_ENV_TIMEOUT_MS=15000`
|
||||
|
||||
## Env var substitution in config
|
||||
|
||||
You can reference env vars directly in config string values using `${VAR_NAME}` syntax:
|
||||
|
||||
```json5
|
||||
{
|
||||
models: {
|
||||
providers: {
|
||||
"vercel-gateway": {
|
||||
apiKey: "${VERCEL_GATEWAY_API_KEY}",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
See [Configuration: Env var substitution](/gateway/configuration#env-var-substitution-in-config) for full details.
|
||||
|
||||
## Related
|
||||
|
||||
- [Gateway configuration](/gateway/configuration)
|
||||
- [FAQ: env vars and .env loading](/help/faq#env-vars-and-env-loading)
|
||||
- [Models overview](/concepts/models)
|
||||
@@ -707,7 +707,7 @@ See [Models](/cli/models) and [OAuth](/concepts/oauth).
|
||||
|
||||
### Is AWS Bedrock supported
|
||||
|
||||
Yes - via pi-ai's **Amazon Bedrock (Converse)** provider with **manual config**. You must supply AWS credentials/region on the gateway host and add a Bedrock provider entry in your models config. See [Amazon Bedrock](/bedrock) and [Model providers](/providers/models). If you prefer a managed key flow, an OpenAI-compatible proxy in front of Bedrock is still a valid option.
|
||||
Yes - via pi-ai's **Amazon Bedrock (Converse)** provider with **manual config**. You must supply AWS credentials/region on the gateway host and add a Bedrock provider entry in your models config. See [Amazon Bedrock](/providers/bedrock) and [Model providers](/providers/models). If you prefer a managed key flow, an OpenAI-compatible proxy in front of Bedrock is still a valid option.
|
||||
|
||||
### How does Codex auth work
|
||||
|
||||
@@ -1177,7 +1177,7 @@ Yes - if your private traffic is **DMs** and your public traffic is **groups**.
|
||||
|
||||
Use `agents.defaults.sandbox.mode: "non-main"` so group/channel sessions (non-main keys) run in Docker, while the main DM session stays on-host. Then restrict what tools are available in sandboxed sessions via `tools.sandbox.tools`.
|
||||
|
||||
Setup walkthrough + example config: [Groups: personal DMs + public groups](/concepts/groups#pattern-personal-dms-public-groups-single-agent)
|
||||
Setup walkthrough + example config: [Groups: personal DMs + public groups](/channels/groups#pattern-personal-dms-public-groups-single-agent)
|
||||
|
||||
Key config reference: [Gateway configuration](/gateway/configuration#agentsdefaultssandbox)
|
||||
|
||||
@@ -1427,7 +1427,7 @@ The common pattern is **one Gateway** (e.g. Raspberry Pi) plus **nodes** and **a
|
||||
- **Sub-agents:** spawn background work from a main agent when you want parallelism.
|
||||
- **TUI:** connect to the Gateway and switch agents/sessions.
|
||||
|
||||
Docs: [Nodes](/nodes), [Remote access](/gateway/remote), [Multi-Agent Routing](/concepts/multi-agent), [Sub-agents](/tools/subagents), [TUI](/tui).
|
||||
Docs: [Nodes](/nodes), [Remote access](/gateway/remote), [Multi-Agent Routing](/concepts/multi-agent), [Sub-agents](/tools/subagents), [TUI](/web/tui).
|
||||
|
||||
### Can the OpenClaw browser run headless
|
||||
|
||||
@@ -1681,7 +1681,7 @@ You can also define inline env vars in config (applied only if missing from the
|
||||
}
|
||||
```
|
||||
|
||||
See [/environment](/environment) for full precedence and sources.
|
||||
See [/environment](/help/environment) for full precedence and sources.
|
||||
|
||||
### I started the Gateway via the service and my env vars disappeared What now
|
||||
|
||||
@@ -1729,7 +1729,7 @@ openclaw models status
|
||||
```
|
||||
|
||||
Copilot tokens are read from `COPILOT_GITHUB_TOKEN` (also `GH_TOKEN` / `GITHUB_TOKEN`).
|
||||
See [/concepts/model-providers](/concepts/model-providers) and [/environment](/environment).
|
||||
See [/concepts/model-providers](/concepts/model-providers) and [/environment](/help/environment).
|
||||
|
||||
## Sessions and multiple chats
|
||||
|
||||
@@ -1902,11 +1902,11 @@ Two common causes:
|
||||
- Mention gating is on (default). You must @mention the bot (or match `mentionPatterns`).
|
||||
- You configured `channels.whatsapp.groups` without `"*"` and the group isn't allowlisted.
|
||||
|
||||
See [Groups](/concepts/groups) and [Group messages](/concepts/group-messages).
|
||||
See [Groups](/channels/groups) and [Group messages](/channels/group-messages).
|
||||
|
||||
### Do groupsthreads share context with DMs
|
||||
|
||||
Direct chats collapse to the main session by default. Groups/channels have their own session keys, and Telegram topics / Discord threads are separate sessions. See [Groups](/concepts/groups) and [Group messages](/concepts/group-messages).
|
||||
Direct chats collapse to the main session by default. Groups/channels have their own session keys, and Telegram topics / Discord threads are separate sessions. See [Groups](/channels/groups) and [Group messages](/channels/group-messages).
|
||||
|
||||
### How many workspaces and agents can I create
|
||||
|
||||
@@ -2609,7 +2609,7 @@ openclaw logs --follow
|
||||
In the TUI, use `/status` to see the current state. If you expect replies in a chat
|
||||
channel, make sure delivery is enabled (`/deliver on`).
|
||||
|
||||
Docs: [TUI](/tui), [Slash commands](/tools/slash-commands).
|
||||
Docs: [TUI](/web/tui), [Slash commands](/tools/slash-commands).
|
||||
|
||||
### How do I completely stop then start the Gateway
|
||||
|
||||
@@ -2701,7 +2701,7 @@ credentials or revoke access without impacting your personal accounts.
|
||||
Start small. Give access only to the tools and accounts you actually need, and expand
|
||||
later if required.
|
||||
|
||||
Docs: [Security](/gateway/security), [Pairing](/start/pairing).
|
||||
Docs: [Security](/gateway/security), [Pairing](/channels/pairing).
|
||||
|
||||
### Can I give it autonomy over my text messages and is that safe
|
||||
|
||||
|
||||
28
docs/help/scripts.md
Normal file
28
docs/help/scripts.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
summary: "Repository scripts: purpose, scope, and safety notes"
|
||||
read_when:
|
||||
- Running scripts from the repo
|
||||
- Adding or changing scripts under ./scripts
|
||||
title: "Scripts"
|
||||
---
|
||||
|
||||
# Scripts
|
||||
|
||||
The `scripts/` directory contains helper scripts for local workflows and ops tasks.
|
||||
Use these when a task is clearly tied to a script; otherwise prefer the CLI.
|
||||
|
||||
## Conventions
|
||||
|
||||
- Scripts are **optional** unless referenced in docs or release checklists.
|
||||
- Prefer CLI surfaces when they exist (example: auth monitoring uses `openclaw models status --check`).
|
||||
- Assume scripts are host‑specific; read them before running on a new machine.
|
||||
|
||||
## Auth monitoring scripts
|
||||
|
||||
Auth monitoring scripts are documented here:
|
||||
[/automation/auth-monitoring](/automation/auth-monitoring)
|
||||
|
||||
## When adding scripts
|
||||
|
||||
- Keep scripts focused and documented.
|
||||
- Add a short entry in the relevant doc (or create one if missing).
|
||||
368
docs/help/testing.md
Normal file
368
docs/help/testing.md
Normal file
@@ -0,0 +1,368 @@
|
||||
---
|
||||
summary: "Testing kit: unit/e2e/live suites, Docker runners, and what each test covers"
|
||||
read_when:
|
||||
- Running tests locally or in CI
|
||||
- Adding regressions for model/provider bugs
|
||||
- Debugging gateway + agent behavior
|
||||
title: "Testing"
|
||||
---
|
||||
|
||||
# Testing
|
||||
|
||||
OpenClaw has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners.
|
||||
|
||||
This doc is a “how we test” guide:
|
||||
|
||||
- What each suite covers (and what it deliberately does _not_ cover)
|
||||
- Which commands to run for common workflows (local, pre-push, debugging)
|
||||
- How live tests discover credentials and select models/providers
|
||||
- How to add regressions for real-world model/provider issues
|
||||
|
||||
## Quick start
|
||||
|
||||
Most days:
|
||||
|
||||
- Full gate (expected before push): `pnpm build && pnpm check && pnpm test`
|
||||
|
||||
When you touch tests or want extra confidence:
|
||||
|
||||
- Coverage gate: `pnpm test:coverage`
|
||||
- E2E suite: `pnpm test:e2e`
|
||||
|
||||
When debugging real providers/models (requires real creds):
|
||||
|
||||
- Live suite (models + gateway tool/image probes): `pnpm test:live`
|
||||
|
||||
Tip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.
|
||||
|
||||
## Test suites (what runs where)
|
||||
|
||||
Think of the suites as “increasing realism” (and increasing flakiness/cost):
|
||||
|
||||
### Unit / integration (default)
|
||||
|
||||
- Command: `pnpm test`
|
||||
- Config: `vitest.config.ts`
|
||||
- Files: `src/**/*.test.ts`
|
||||
- Scope:
|
||||
- Pure unit tests
|
||||
- In-process integration tests (gateway auth, routing, tooling, parsing, config)
|
||||
- Deterministic regressions for known bugs
|
||||
- Expectations:
|
||||
- Runs in CI
|
||||
- No real keys required
|
||||
- Should be fast and stable
|
||||
|
||||
### E2E (gateway smoke)
|
||||
|
||||
- Command: `pnpm test:e2e`
|
||||
- Config: `vitest.e2e.config.ts`
|
||||
- Files: `src/**/*.e2e.test.ts`
|
||||
- Scope:
|
||||
- Multi-instance gateway end-to-end behavior
|
||||
- WebSocket/HTTP surfaces, node pairing, and heavier networking
|
||||
- Expectations:
|
||||
- Runs in CI (when enabled in the pipeline)
|
||||
- No real keys required
|
||||
- More moving parts than unit tests (can be slower)
|
||||
|
||||
### Live (real providers + real models)
|
||||
|
||||
- Command: `pnpm test:live`
|
||||
- Config: `vitest.live.config.ts`
|
||||
- Files: `src/**/*.live.test.ts`
|
||||
- Default: **enabled** by `pnpm test:live` (sets `OPENCLAW_LIVE_TEST=1`)
|
||||
- Scope:
|
||||
- “Does this provider/model actually work _today_ with real creds?”
|
||||
- Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
|
||||
- Expectations:
|
||||
- Not CI-stable by design (real networks, real provider policies, quotas, outages)
|
||||
- Costs money / uses rate limits
|
||||
- Prefer running narrowed subsets instead of “everything”
|
||||
- Live runs will source `~/.profile` to pick up missing API keys
|
||||
- Anthropic key rotation: set `OPENCLAW_LIVE_ANTHROPIC_KEYS="sk-...,sk-..."` (or `OPENCLAW_LIVE_ANTHROPIC_KEY=sk-...`) or multiple `ANTHROPIC_API_KEY*` vars; tests will retry on rate limits
|
||||
|
||||
## Which suite should I run?
|
||||
|
||||
Use this decision table:
|
||||
|
||||
- Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)
|
||||
- Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`
|
||||
- Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
|
||||
|
||||
## Live: model smoke (profile keys)
|
||||
|
||||
Live tests are split into two layers so we can isolate failures:
|
||||
|
||||
- “Direct model” tells us the provider/model can answer at all with the given key.
|
||||
- “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).
|
||||
|
||||
### Layer 1: Direct model completion (no gateway)
|
||||
|
||||
- Test: `src/agents/models.profiles.live.test.ts`
|
||||
- Goal:
|
||||
- Enumerate discovered models
|
||||
- Use `getApiKeyForModel` to select models you have creds for
|
||||
- Run a small completion per model (and targeted regressions where needed)
|
||||
- How to enable:
|
||||
- `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)
|
||||
- Set `OPENCLAW_LIVE_MODELS=modern` (or `all`, alias for modern) to actually run this suite; otherwise it skips to keep `pnpm test:live` focused on gateway smoke
|
||||
- How to select models:
|
||||
- `OPENCLAW_LIVE_MODELS=modern` to run the modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)
|
||||
- `OPENCLAW_LIVE_MODELS=all` is an alias for the modern allowlist
|
||||
- or `OPENCLAW_LIVE_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-6,..."` (comma allowlist)
|
||||
- How to select providers:
|
||||
- `OPENCLAW_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli"` (comma allowlist)
|
||||
- Where keys come from:
|
||||
- By default: profile store and env fallbacks
|
||||
- Set `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to enforce **profile store** only
|
||||
- Why this exists:
|
||||
- Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”
|
||||
- Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)
|
||||
|
||||
### Layer 2: Gateway + dev agent smoke (what “@openclaw” actually does)
|
||||
|
||||
- Test: `src/gateway/gateway-models.profiles.live.test.ts`
|
||||
- Goal:
|
||||
- Spin up an in-process gateway
|
||||
- Create/patch a `agent:dev:*` session (model override per run)
|
||||
- Iterate models-with-keys and assert:
|
||||
- “meaningful” response (no tools)
|
||||
- a real tool invocation works (read probe)
|
||||
- optional extra tool probes (exec+read probe)
|
||||
- OpenAI regression paths (tool-call-only → follow-up) keep working
|
||||
- Probe details (so you can explain failures quickly):
|
||||
- `read` probe: the test writes a nonce file in the workspace and asks the agent to `read` it and echo the nonce back.
|
||||
- `exec+read` probe: the test asks the agent to `exec`-write a nonce into a temp file, then `read` it back.
|
||||
- image probe: the test attaches a generated PNG (cat + randomized code) and expects the model to return `cat <CODE>`.
|
||||
- Implementation reference: `src/gateway/gateway-models.profiles.live.test.ts` and `src/gateway/live-image-probe.ts`.
|
||||
- How to enable:
|
||||
- `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)
|
||||
- How to select models:
|
||||
- Default: modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)
|
||||
- `OPENCLAW_LIVE_GATEWAY_MODELS=all` is an alias for the modern allowlist
|
||||
- Or set `OPENCLAW_LIVE_GATEWAY_MODELS="provider/model"` (or comma list) to narrow
|
||||
- How to select providers (avoid “OpenRouter everything”):
|
||||
- `OPENCLAW_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax"` (comma allowlist)
|
||||
- Tool + image probes are always on in this live test:
|
||||
- `read` probe + `exec+read` probe (tool stress)
|
||||
- image probe runs when the model advertises image input support
|
||||
- Flow (high level):
|
||||
- Test generates a tiny PNG with “CAT” + random code (`src/gateway/live-image-probe.ts`)
|
||||
- Sends it via `agent` `attachments: [{ mimeType: "image/png", content: "<base64>" }]`
|
||||
- Gateway parses attachments into `images[]` (`src/gateway/server-methods/agent.ts` + `src/gateway/chat-attachments.ts`)
|
||||
- Embedded agent forwards a multimodal user message to the model
|
||||
- Assertion: reply contains `cat` + the code (OCR tolerance: minor mistakes allowed)
|
||||
|
||||
Tip: to see what you can test on your machine (and the exact `provider/model` ids), run:
|
||||
|
||||
```bash
|
||||
openclaw models list
|
||||
openclaw models list --json
|
||||
```
|
||||
|
||||
## Live: Anthropic setup-token smoke
|
||||
|
||||
- Test: `src/agents/anthropic.setup-token.live.test.ts`
|
||||
- Goal: verify Claude Code CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt.
|
||||
- Enable:
|
||||
- `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)
|
||||
- `OPENCLAW_LIVE_SETUP_TOKEN=1`
|
||||
- Token sources (pick one):
|
||||
- Profile: `OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test`
|
||||
- Raw token: `OPENCLAW_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...`
|
||||
- Model override (optional):
|
||||
- `OPENCLAW_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-6`
|
||||
|
||||
Setup example:
|
||||
|
||||
```bash
|
||||
openclaw models auth paste-token --provider anthropic --profile-id anthropic:setup-token-test
|
||||
OPENCLAW_LIVE_SETUP_TOKEN=1 OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test pnpm test:live src/agents/anthropic.setup-token.live.test.ts
|
||||
```
|
||||
|
||||
## Live: CLI backend smoke (Claude Code CLI or other local CLIs)
|
||||
|
||||
- Test: `src/gateway/gateway-cli-backend.live.test.ts`
|
||||
- Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.
|
||||
- Enable:
|
||||
- `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND=1`
|
||||
- Defaults:
|
||||
- Model: `claude-cli/claude-sonnet-4-5`
|
||||
- Command: `claude`
|
||||
- Args: `["-p","--output-format","json","--dangerously-skip-permissions"]`
|
||||
- Overrides (optional):
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-opus-4-6"`
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.3-codex"`
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_COMMAND="/full/path/to/claude"`
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_ARGS='["-p","--output-format","json","--permission-mode","bypassPermissions"]'`
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_CLEAR_ENV='["ANTHROPIC_API_KEY","ANTHROPIC_API_KEY_OLD"]'`
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1` to send a real image attachment (paths are injected into the prompt).
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG="--image"` to pass image file paths as CLI args instead of prompt injection.
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE="repeat"` (or `"list"`) to control how image args are passed when `IMAGE_ARG` is set.
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1` to send a second turn and validate resume flow.
|
||||
- `OPENCLAW_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0` to keep Claude Code CLI MCP config enabled (default disables MCP config with a temporary empty file).
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_CLI_BACKEND=1 \
|
||||
OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-sonnet-4-5" \
|
||||
pnpm test:live src/gateway/gateway-cli-backend.live.test.ts
|
||||
```
|
||||
|
||||
### Recommended live recipes
|
||||
|
||||
Narrow, explicit allowlists are fastest and least flaky:
|
||||
|
||||
- Single model, direct (no gateway):
|
||||
- `OPENCLAW_LIVE_MODELS="openai/gpt-5.2" pnpm test:live src/agents/models.profiles.live.test.ts`
|
||||
|
||||
- Single model, gateway smoke:
|
||||
- `OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
|
||||
- Tool calling across several providers:
|
||||
- `OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-6,google/gemini-3-flash-preview,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
|
||||
- Google focus (Gemini API key + Antigravity):
|
||||
- Gemini (API key): `OPENCLAW_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
- Antigravity (OAuth): `OPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
|
||||
Notes:
|
||||
|
||||
- `google/...` uses the Gemini API (API key).
|
||||
- `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
|
||||
- `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks).
|
||||
- Gemini API vs Gemini CLI:
|
||||
- API: OpenClaw calls Google’s hosted Gemini API over HTTP (API key / profile auth); this is what most users mean by “Gemini”.
|
||||
- CLI: OpenClaw shells out to a local `gemini` binary; it has its own auth and can behave differently (streaming/tool support/version skew).
|
||||
|
||||
## Live: model matrix (what we cover)
|
||||
|
||||
There is no fixed “CI model list” (live is opt-in), but these are the **recommended** models to cover regularly on a dev machine with keys.
|
||||
|
||||
### Modern smoke set (tool calling + image)
|
||||
|
||||
This is the “common models” run we expect to keep working:
|
||||
|
||||
- OpenAI (non-Codex): `openai/gpt-5.2` (optional: `openai/gpt-5.1`)
|
||||
- OpenAI Codex: `openai-codex/gpt-5.3-codex` (optional: `openai-codex/gpt-5.3-codex-codex`)
|
||||
- Anthropic: `anthropic/claude-opus-4-6` (or `anthropic/claude-sonnet-4-5`)
|
||||
- Google (Gemini API): `google/gemini-3-pro-preview` and `google/gemini-3-flash-preview` (avoid older Gemini 2.x models)
|
||||
- Google (Antigravity): `google-antigravity/claude-opus-4-6-thinking` and `google-antigravity/gemini-3-flash`
|
||||
- Z.AI (GLM): `zai/glm-4.7`
|
||||
- MiniMax: `minimax/minimax-m2.1`
|
||||
|
||||
Run gateway smoke with tools + image:
|
||||
`OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2,openai-codex/gpt-5.3-codex,anthropic/claude-opus-4-6,google/gemini-3-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
|
||||
### Baseline: tool calling (Read + optional Exec)
|
||||
|
||||
Pick at least one per provider family:
|
||||
|
||||
- OpenAI: `openai/gpt-5.2` (or `openai/gpt-5-mini`)
|
||||
- Anthropic: `anthropic/claude-opus-4-6` (or `anthropic/claude-sonnet-4-5`)
|
||||
- Google: `google/gemini-3-flash-preview` (or `google/gemini-3-pro-preview`)
|
||||
- Z.AI (GLM): `zai/glm-4.7`
|
||||
- MiniMax: `minimax/minimax-m2.1`
|
||||
|
||||
Optional additional coverage (nice to have):
|
||||
|
||||
- xAI: `xai/grok-4` (or latest available)
|
||||
- Mistral: `mistral/`… (pick one “tools” capable model you have enabled)
|
||||
- Cerebras: `cerebras/`… (if you have access)
|
||||
- LM Studio: `lmstudio/`… (local; tool calling depends on API mode)
|
||||
|
||||
### Vision: image send (attachment → multimodal message)
|
||||
|
||||
Include at least one image-capable model in `OPENCLAW_LIVE_GATEWAY_MODELS` (Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.
|
||||
|
||||
### Aggregators / alternate gateways
|
||||
|
||||
If you have keys enabled, we also support testing via:
|
||||
|
||||
- OpenRouter: `openrouter/...` (hundreds of models; use `openclaw models scan` to find tool+image capable candidates)
|
||||
- OpenCode Zen: `opencode/...` (auth via `OPENCODE_API_KEY` / `OPENCODE_ZEN_API_KEY`)
|
||||
|
||||
More providers you can include in the live matrix (if you have creds/config):
|
||||
|
||||
- Built-in: `openai`, `openai-codex`, `anthropic`, `google`, `google-vertex`, `google-antigravity`, `google-gemini-cli`, `zai`, `openrouter`, `opencode`, `xai`, `groq`, `cerebras`, `mistral`, `github-copilot`
|
||||
- Via `models.providers` (custom endpoints): `minimax` (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)
|
||||
|
||||
Tip: don’t try to hardcode “all models” in docs. The authoritative list is whatever `discoverModels(...)` returns on your machine + whatever keys are available.
|
||||
|
||||
## Credentials (never commit)
|
||||
|
||||
Live tests discover credentials the same way the CLI does. Practical implications:
|
||||
|
||||
- If the CLI works, live tests should find the same keys.
|
||||
- If a live test says “no creds”, debug the same way you’d debug `openclaw models list` / model selection.
|
||||
|
||||
- Profile store: `~/.openclaw/credentials/` (preferred; what “profile keys” means in the tests)
|
||||
- Config: `~/.openclaw/openclaw.json` (or `OPENCLAW_CONFIG_PATH`)
|
||||
|
||||
If you want to rely on env keys (e.g. exported in your `~/.profile`), run local tests after `source ~/.profile`, or use the Docker runners below (they can mount `~/.profile` into the container).
|
||||
|
||||
## Deepgram live (audio transcription)
|
||||
|
||||
- Test: `src/media-understanding/providers/deepgram/audio.live.test.ts`
|
||||
- Enable: `DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live src/media-understanding/providers/deepgram/audio.live.test.ts`
|
||||
|
||||
## Docker runners (optional “works in Linux” checks)
|
||||
|
||||
These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted):
|
||||
|
||||
- Direct models: `pnpm test:docker:live-models` (script: `scripts/test-live-models-docker.sh`)
|
||||
- Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)
|
||||
- Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)
|
||||
- Gateway networking (two containers, WS auth + health): `pnpm test:docker:gateway-network` (script: `scripts/e2e/gateway-network-docker.sh`)
|
||||
- Plugins (custom extension load + registry smoke): `pnpm test:docker:plugins` (script: `scripts/e2e/plugins-docker.sh`)
|
||||
|
||||
Useful env vars:
|
||||
|
||||
- `OPENCLAW_CONFIG_DIR=...` (default: `~/.openclaw`) mounted to `/home/node/.openclaw`
|
||||
- `OPENCLAW_WORKSPACE_DIR=...` (default: `~/.openclaw/workspace`) mounted to `/home/node/.openclaw/workspace`
|
||||
- `OPENCLAW_PROFILE_FILE=...` (default: `~/.profile`) mounted to `/home/node/.profile` and sourced before running tests
|
||||
- `OPENCLAW_LIVE_GATEWAY_MODELS=...` / `OPENCLAW_LIVE_MODELS=...` to narrow the run
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to ensure creds come from the profile store (not env)
|
||||
|
||||
## Docs sanity
|
||||
|
||||
Run docs checks after doc edits: `pnpm docs:list`.
|
||||
|
||||
## Offline regression (CI-safe)
|
||||
|
||||
These are “real pipeline” regressions without real providers:
|
||||
|
||||
- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.tool-calling.mock-openai.test.ts`
|
||||
- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.wizard.e2e.test.ts`
|
||||
|
||||
## Agent reliability evals (skills)
|
||||
|
||||
We already have a few CI-safe tests that behave like “agent reliability evals”:
|
||||
|
||||
- Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.tool-calling.mock-openai.test.ts`).
|
||||
- End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.wizard.e2e.test.ts`).
|
||||
|
||||
What’s still missing for skills (see [Skills](/tools/skills)):
|
||||
|
||||
- **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
|
||||
- **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args?
|
||||
- **Workflow contracts:** multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.
|
||||
|
||||
Future evals should stay deterministic first:
|
||||
|
||||
- A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.
|
||||
- A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).
|
||||
- Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.
|
||||
|
||||
## Adding regressions (guidance)
|
||||
|
||||
When you fix a provider/model issue discovered in live:
|
||||
|
||||
- Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
|
||||
- If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
|
||||
- Prefer targeting the smallest layer that catches the bug:
|
||||
- provider request conversion/replay bug → direct models test
|
||||
- gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test
|
||||
@@ -83,7 +83,7 @@ flowchart TD
|
||||
|
||||
- [/gateway/troubleshooting#no-replies](/gateway/troubleshooting#no-replies)
|
||||
- [/channels/troubleshooting](/channels/troubleshooting)
|
||||
- [/start/pairing](/start/pairing)
|
||||
- [/channels/pairing](/channels/pairing)
|
||||
|
||||
</Accordion>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user