mirror of
https://github.com/moltbot/moltbot.git
synced 2026-04-23 14:45:46 +00:00
docs: document music generation async flow
This commit is contained in:
@@ -66,6 +66,7 @@ These tools ship with OpenClaw and are available without installing any plugins:
|
||||
| `nodes` | Discover and target paired devices | |
|
||||
| `cron` / `gateway` | Manage scheduled jobs; inspect, patch, restart, or update the gateway | |
|
||||
| `image` / `image_generate` | Analyze or generate images | [Image Generation](/tools/image-generation) |
|
||||
| `music_generate` | Generate music tracks | [Music Generation](/tools/music-generation) |
|
||||
| `video_generate` | Generate videos | [Video Generation](/tools/video-generation) |
|
||||
| `tts` | One-shot text-to-speech conversion | [TTS](/tools/tts) |
|
||||
| `sessions_*` / `subagents` / `agents_list` | Session management, status, and sub-agent orchestration | [Sub-agents](/tools/subagents) |
|
||||
@@ -73,6 +74,8 @@ These tools ship with OpenClaw and are available without installing any plugins:
|
||||
|
||||
For image work, use `image` for analysis and `image_generate` for generation or editing. If you target `openai/*`, `google/*`, `fal/*`, or another non-default image provider, configure that provider's auth/API key first.
|
||||
|
||||
For music work, use `music_generate`. If you target `google/*`, `minimax/*`, or another non-default music provider, configure that provider's auth/API key first.
|
||||
|
||||
For video work, use `video_generate`. If you target `qwen/*` or another non-default video provider, configure that provider's auth/API key first.
|
||||
|
||||
For workflow-driven audio generation, use `music_generate` when a plugin such as
|
||||
@@ -128,12 +131,12 @@ config. Deny always wins over allow.
|
||||
`tools.profile` sets a base allowlist before `allow`/`deny` is applied.
|
||||
Per-agent override: `agents.list[].tools.profile`.
|
||||
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
| Profile | What it includes |
|
||||
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `full` | No restriction (same as unset) |
|
||||
| `coding` | `group:fs`, `group:runtime`, `group:web`, `group:sessions`, `group:memory`, `cron`, `image`, `image_generate`, `music_generate`, `video_generate` |
|
||||
| `messaging` | `group:messaging`, `sessions_list`, `sessions_history`, `sessions_send`, `session_status` |
|
||||
| `minimal` | `session_status` only |
|
||||
|
||||
### Tool groups
|
||||
|
||||
@@ -151,7 +154,7 @@ Use `group:*` shorthands in allow/deny lists:
|
||||
| `group:messaging` | message |
|
||||
| `group:nodes` | nodes |
|
||||
| `group:agents` | agents_list |
|
||||
| `group:media` | image, image_generate, video_generate, tts |
|
||||
| `group:media` | image, image_generate, music_generate, video_generate, tts |
|
||||
| `group:openclaw` | All built-in OpenClaw tools (excludes plugin tools) |
|
||||
|
||||
`sessions_history` returns a bounded, safety-filtered recall view. It strips
|
||||
|
||||
@@ -1,23 +1,68 @@
|
||||
---
|
||||
summary: "Generate music or audio with plugin-provided tools such as ComfyUI workflows"
|
||||
summary: "Generate music with shared providers or plugin-provided workflows"
|
||||
read_when:
|
||||
- Generating music or audio via the agent
|
||||
- Configuring plugin-provided music generation tools
|
||||
- Configuring music generation providers and models
|
||||
- Understanding the music_generate tool parameters
|
||||
title: "Music Generation"
|
||||
---
|
||||
|
||||
# Music Generation
|
||||
|
||||
The `music_generate` tool lets the agent create audio files when a plugin
|
||||
registers music generation support.
|
||||
The `music_generate` tool lets the agent create music or audio through either:
|
||||
|
||||
The bundled `comfy` plugin currently provides `music_generate` using a
|
||||
workflow-configured ComfyUI graph.
|
||||
- the shared music-generation capability with configured providers such as Google and MiniMax
|
||||
- plugin-provided tool surfaces such as a workflow-configured ComfyUI graph
|
||||
|
||||
For shared provider-backed agent sessions, OpenClaw starts music generation as a
|
||||
background task, tracks it in the task ledger, then wakes the agent again when
|
||||
the track is ready so the agent can post the finished audio back into the
|
||||
original channel.
|
||||
|
||||
<Note>
|
||||
The built-in shared tool only appears when at least one music-generation provider is available. If you don't see `music_generate` in your agent's tools, configure `agents.defaults.musicGenerationModel` or set up a provider API key.
|
||||
</Note>
|
||||
|
||||
<Note>
|
||||
Plugin-provided `music_generate` implementations can expose different parameters or runtime behavior. The async task/status flow below applies to the built-in shared provider-backed path.
|
||||
</Note>
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Configure `models.providers.comfy.music` with a workflow JSON and prompt/output nodes.
|
||||
### Shared provider-backed generation
|
||||
|
||||
1. Set an API key for at least one provider, for example `GEMINI_API_KEY` or
|
||||
`MINIMAX_API_KEY`.
|
||||
2. Optionally set your preferred model:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
3. Ask the agent: _"Generate an upbeat synthpop track about a night drive
|
||||
through a neon city."_
|
||||
|
||||
The agent calls `music_generate` automatically. No tool allow-listing needed.
|
||||
|
||||
For direct synchronous contexts without a session-backed agent run, the built-in
|
||||
tool still falls back to inline generation and returns the final media path in
|
||||
the tool result.
|
||||
|
||||
### Workflow-driven plugin generation
|
||||
|
||||
The bundled `comfy` plugin can also provide `music_generate` using a
|
||||
workflow-configured ComfyUI graph.
|
||||
|
||||
1. Configure `models.providers.comfy.music` with a workflow JSON and
|
||||
prompt/output nodes.
|
||||
2. If you use Comfy Cloud, set `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY`.
|
||||
3. Ask the agent for music or call the tool directly.
|
||||
|
||||
@@ -27,22 +72,102 @@ Example:
|
||||
/tool music_generate prompt="Warm ambient synth loop with soft tape texture"
|
||||
```
|
||||
|
||||
## Tool parameters
|
||||
## Shared bundled provider support
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ---------- | ------ | --------------------------------------------------- |
|
||||
| `prompt` | string | Music or audio generation prompt |
|
||||
| `action` | string | `"generate"` (default) or `"list"` |
|
||||
| `model` | string | Provider/model override. Currently `comfy/workflow` |
|
||||
| `filename` | string | Output filename hint for the saved audio file |
|
||||
| Provider | Default model | Reference inputs | Supported controls | API key |
|
||||
| -------- | ---------------------- | ---------------- | --------------------------------------------------------- | ---------------------------------- |
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
|
||||
|
||||
## Current provider support
|
||||
## Plugin-provided support
|
||||
|
||||
| Provider | Model | Notes |
|
||||
| -------- | ---------- | ------------------------------- |
|
||||
| ComfyUI | `workflow` | Workflow-defined music or audio |
|
||||
|
||||
## Live test
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
|
||||
```text
|
||||
/tool music_generate action=list
|
||||
```
|
||||
|
||||
## Built-in tool parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| ----------------- | -------- | ------------------------------------------------------------------------------------------------- |
|
||||
| `prompt` | string | Music generation prompt (required for `action: "generate"`) |
|
||||
| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers |
|
||||
| `model` | string | Provider/model override, e.g. `google/lyria-3-pro-preview` or `comfy/workflow` |
|
||||
| `lyrics` | string | Optional lyrics when the provider supports explicit lyric input |
|
||||
| `instrumental` | boolean | Request instrumental-only output when the provider supports it |
|
||||
| `image` | string | Single reference image path or URL |
|
||||
| `images` | string[] | Multiple reference images (up to 10) |
|
||||
| `durationSeconds` | number | Target duration in seconds when the provider supports duration hints |
|
||||
| `format` | string | Output format hint (`mp3` or `wav`) when the provider supports it |
|
||||
| `filename` | string | Output filename hint |
|
||||
|
||||
Not all providers or plugins support all parameters. The shared built-in tool
|
||||
validates provider capability limits before it submits the request.
|
||||
|
||||
## Async behavior for the shared provider-backed path
|
||||
|
||||
- Session-backed agent runs: `music_generate` creates a background task, returns a started/task response immediately, and posts the finished track later in a follow-up agent message.
|
||||
- Duplicate prevention: while that background task is still `queued` or `running`, later `music_generate` calls in the same session return task status instead of starting another generation.
|
||||
- Status lookup: use `action: "status"` to inspect the active session-backed music task without starting a new one.
|
||||
- Task tracking: use `openclaw tasks list` or `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
|
||||
- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
|
||||
- Prompt hint: later user/manual turns in the same session get a small runtime hint when a music task is already in flight so the model does not blindly call `music_generate` again.
|
||||
- No-session fallback: direct/local contexts without a real agent session still run inline and return the final audio result in the same turn.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Model selection
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
musicGenerationModel: {
|
||||
primary: "google/lyria-3-clip-preview",
|
||||
fallbacks: ["minimax/music-2.5+"],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Provider selection order
|
||||
|
||||
When generating music, OpenClaw tries providers in this order:
|
||||
|
||||
1. `model` parameter from the tool call, if the agent specifies one
|
||||
2. `musicGenerationModel.primary` from config
|
||||
3. `musicGenerationModel.fallbacks` in order
|
||||
4. Auto-detection using auth-backed provider defaults only:
|
||||
- current default provider first
|
||||
- remaining registered music-generation providers in provider-id order
|
||||
|
||||
If a provider fails, the next candidate is tried automatically. If all fail, the
|
||||
error includes details from each attempt.
|
||||
|
||||
## Provider notes
|
||||
|
||||
- Google uses Lyria 3 batch generation. The current bundled flow supports
|
||||
prompt, optional lyrics text, and optional reference images.
|
||||
- MiniMax uses the batch `music_generation` endpoint. The current bundled flow
|
||||
supports prompt, optional lyrics, instrumental mode, duration steering, and
|
||||
mp3 output.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
|
||||
## Live tests
|
||||
|
||||
Opt-in live coverage for the shared bundled providers:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
Opt-in live coverage for the bundled ComfyUI music path:
|
||||
|
||||
@@ -50,10 +175,15 @@ Opt-in live coverage for the bundled ComfyUI music path:
|
||||
OPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.ts
|
||||
```
|
||||
|
||||
The live file also covers comfy image and video workflows when those sections
|
||||
are configured.
|
||||
The Comfy live file also covers comfy image and video workflows when those
|
||||
sections are configured.
|
||||
|
||||
## Related
|
||||
|
||||
- [Background Tasks](/automation/tasks) - task tracking for detached `music_generate` runs
|
||||
- [Configuration Reference](/gateway/configuration-reference#agent-defaults) - `musicGenerationModel` config
|
||||
- [ComfyUI](/providers/comfy)
|
||||
- [Google (Gemini)](/providers/google)
|
||||
- [MiniMax](/providers/minimax)
|
||||
- [Models](/concepts/models) - model configuration and failover
|
||||
- [Tools Overview](/tools)
|
||||
|
||||
@@ -319,6 +319,7 @@ Common registration methods:
|
||||
| `registerRealtimeVoiceProvider` | Duplex realtime voice |
|
||||
| `registerMediaUnderstandingProvider` | Image/audio analysis |
|
||||
| `registerImageGenerationProvider` | Image generation |
|
||||
| `registerMusicGenerationProvider` | Music generation |
|
||||
| `registerVideoGenerationProvider` | Video generation |
|
||||
| `registerWebFetchProvider` | Web fetch / scrape provider |
|
||||
| `registerWebSearchProvider` | Web search |
|
||||
|
||||
Reference in New Issue
Block a user