mirror of
https://github.com/moltbot/moltbot.git
synced 2026-04-24 07:01:49 +00:00
feat: declare explicit media provider capabilities
This commit is contained in:
@@ -85,6 +85,17 @@ Example:
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `music_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `edit` | Edit limit | Shared live lanes |
|
||||
| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- |
|
||||
| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
|
||||
| Google | Yes | Yes | 10 images | `generate`, `edit` |
|
||||
| MiniMax | Yes | No | None | `generate` |
|
||||
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
|
||||
@@ -174,6 +185,36 @@ error includes details from each attempt.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
|
||||
## Provider capability modes
|
||||
|
||||
The shared music-generation contract now supports explicit mode declarations:
|
||||
|
||||
- `generate` for prompt-only generation
|
||||
- `edit` when the request includes one or more reference images
|
||||
|
||||
New provider implementations should prefer explicit mode blocks:
|
||||
|
||||
```typescript
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsFormat: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxTracks: 1,
|
||||
maxInputImages: 1,
|
||||
supportsFormat: true,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and
|
||||
`supportsFormat` are not enough to advertise edit support. Providers should
|
||||
declare `generate` and `edit` explicitly so live tests, contract tests, and
|
||||
the shared `music_generate` tool can validate mode support deterministically.
|
||||
|
||||
## Choosing the right path
|
||||
|
||||
- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
|
||||
@@ -188,6 +229,16 @@ Opt-in live coverage for the shared bundled providers:
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs both
|
||||
`generate` and declared `edit` coverage when the provider enables edit mode.
|
||||
|
||||
Today that means:
|
||||
|
||||
- `google`: `generate` plus `edit`
|
||||
- `minimax`: `generate` only
|
||||
- `comfy`: separate Comfy live coverage, not the shared provider sweep
|
||||
|
||||
Opt-in live coverage for the bundled ComfyUI music path:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -79,6 +79,26 @@ Some providers accept additional or alternate API key env vars. See individual [
|
||||
Run `video_generate action=list` to inspect available providers, models, and
|
||||
runtime modes at runtime.
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `video_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Vydra | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
### Required
|
||||
@@ -201,9 +221,34 @@ capabilities: {
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages` and `maxInputVideos` still work as
|
||||
backward-compatible aggregate caps, but they cannot express per-mode limits as
|
||||
precisely.
|
||||
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
|
||||
enough to advertise transform-mode support. Providers should declare
|
||||
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
|
||||
contract tests, and the shared `video_generate` tool can validate mode support
|
||||
deterministically.
|
||||
|
||||
## Live tests
|
||||
|
||||
Opt-in live coverage for the shared bundled providers:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs the
|
||||
declared modes it can exercise safely with local media:
|
||||
|
||||
- `generate` for every provider in the sweep
|
||||
- `imageToVideo` when `capabilities.imageToVideo.enabled`
|
||||
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
|
||||
accepts buffer-backed local video input in the shared sweep
|
||||
|
||||
Today the shared `videoToVideo` live lane covers:
|
||||
|
||||
- `google`
|
||||
- `openai`
|
||||
- `runway` only when you select `runway/gen4_aleph`
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user