mirror of
https://github.com/moltbot/moltbot.git
synced 2026-04-23 14:45:46 +00:00
feat: declare explicit media provider capabilities
This commit is contained in:
@@ -475,10 +475,45 @@ If you want to rely on env keys (e.g. exported in your `~/.profile`), run local
|
||||
- Exercises the shared bundled music-generation provider path
|
||||
- Currently covers Google and MiniMax
|
||||
- Loads provider env vars from your login shell (`~/.profile`) before probing
|
||||
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
|
||||
- Skips providers with no usable auth/profile/model
|
||||
- Runs both declared runtime modes when available:
|
||||
- `generate` with prompt-only input
|
||||
- `edit` when the provider declares `capabilities.edit.enabled`
|
||||
- Current shared-lane coverage:
|
||||
- `google`: `generate`, `edit`
|
||||
- `minimax`: `generate`
|
||||
- `comfy`: separate Comfy live file, not this shared sweep
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"`
|
||||
- `OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.5+"`
|
||||
- Optional auth behavior:
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
|
||||
|
||||
## Video generation live
|
||||
|
||||
- Test: `extensions/video-generation-providers.live.test.ts`
|
||||
- Enable: `OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts`
|
||||
- Scope:
|
||||
- Exercises the shared bundled video-generation provider path
|
||||
- Loads provider env vars from your login shell (`~/.profile`) before probing
|
||||
- Uses live/env API keys ahead of stored auth profiles by default, so stale test keys in `auth-profiles.json` do not mask real shell credentials
|
||||
- Skips providers with no usable auth/profile/model
|
||||
- Runs both declared runtime modes when available:
|
||||
- `generate` with prompt-only input
|
||||
- `imageToVideo` when the provider declares `capabilities.imageToVideo.enabled`
|
||||
- `videoToVideo` when the provider declares `capabilities.videoToVideo.enabled` and the selected provider/model accepts buffer-backed local video input in the shared sweep
|
||||
- Current `videoToVideo` live coverage:
|
||||
- `google`
|
||||
- `openai`
|
||||
- `runway` only when the selected model is `runway/gen4_aleph`
|
||||
- Current declared-but-skipped `videoToVideo` providers in the shared sweep:
|
||||
- `alibaba`, `qwen`, `xai` because those paths currently require remote `http(s)` / MP4 reference URLs
|
||||
- Optional narrowing:
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="google,openai,runway"`
|
||||
- `OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"`
|
||||
- Optional auth behavior:
|
||||
- `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to force profile-store auth and ignore env-only overrides
|
||||
|
||||
## Docker runners (optional "works in Linux" checks)
|
||||
|
||||
|
||||
@@ -643,10 +643,15 @@ API key auth, and dynamic model resolution.
|
||||
[Internals: Capability Ownership](/plugins/architecture#capability-ownership-model).
|
||||
|
||||
For video generation, prefer the mode-aware capability shape shown above:
|
||||
`generate`, `imageToVideo`, and `videoToVideo`. The older flat fields such
|
||||
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` still work
|
||||
as aggregate fallback caps, but they cannot describe per-mode limits or
|
||||
disabled transform modes as cleanly.
|
||||
`generate`, `imageToVideo`, and `videoToVideo`. Flat aggregate fields such
|
||||
as `maxInputImages`, `maxInputVideos`, and `maxDurationSeconds` are not
|
||||
enough to advertise transform-mode support or disabled modes cleanly.
|
||||
|
||||
Music-generation providers should follow the same pattern:
|
||||
`generate` for prompt-only generation and `edit` for reference-image-based
|
||||
generation. Flat aggregate fields such as `maxInputImages`,
|
||||
`supportsLyrics`, and `supportsFormat` are not enough to advertise edit
|
||||
support; explicit `generate` / `edit` blocks are the expected contract.
|
||||
|
||||
</Step>
|
||||
|
||||
|
||||
@@ -85,6 +85,17 @@ Example:
|
||||
| Google | `lyria-3-clip-preview` | Up to 10 images | `lyrics`, `instrumental`, `format` | `GEMINI_API_KEY`, `GOOGLE_API_KEY` |
|
||||
| MiniMax | `music-2.5+` | None | `lyrics`, `instrumental`, `durationSeconds`, `format=mp3` | `MINIMAX_API_KEY` |
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `music_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `edit` | Edit limit | Shared live lanes |
|
||||
| -------- | ---------- | ------ | ---------- | ------------------------------------------------------------------------- |
|
||||
| ComfyUI | Yes | Yes | 1 image | Not in the shared sweep; covered by `extensions/comfy/comfy.live.test.ts` |
|
||||
| Google | Yes | Yes | 10 images | `generate`, `edit` |
|
||||
| MiniMax | Yes | No | None | `generate` |
|
||||
|
||||
Use `action: "list"` to inspect available shared providers and models at
|
||||
runtime:
|
||||
|
||||
@@ -174,6 +185,36 @@ error includes details from each attempt.
|
||||
- ComfyUI support is workflow-driven and depends on the configured graph plus
|
||||
node mapping for prompt/output fields.
|
||||
|
||||
## Provider capability modes
|
||||
|
||||
The shared music-generation contract now supports explicit mode declarations:
|
||||
|
||||
- `generate` for prompt-only generation
|
||||
- `edit` when the request includes one or more reference images
|
||||
|
||||
New provider implementations should prefer explicit mode blocks:
|
||||
|
||||
```typescript
|
||||
capabilities: {
|
||||
generate: {
|
||||
maxTracks: 1,
|
||||
supportsLyrics: true,
|
||||
supportsFormat: true,
|
||||
},
|
||||
edit: {
|
||||
enabled: true,
|
||||
maxTracks: 1,
|
||||
maxInputImages: 1,
|
||||
supportsFormat: true,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages`, `supportsLyrics`, and
|
||||
`supportsFormat` are not enough to advertise edit support. Providers should
|
||||
declare `generate` and `edit` explicitly so live tests, contract tests, and
|
||||
the shared `music_generate` tool can validate mode support deterministically.
|
||||
|
||||
## Choosing the right path
|
||||
|
||||
- Use the shared provider-backed path when you want model selection, provider failover, and the built-in async task/status flow.
|
||||
@@ -188,6 +229,16 @@ Opt-in live coverage for the shared bundled providers:
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs both
|
||||
`generate` and declared `edit` coverage when the provider enables edit mode.
|
||||
|
||||
Today that means:
|
||||
|
||||
- `google`: `generate` plus `edit`
|
||||
- `minimax`: `generate` only
|
||||
- `comfy`: separate Comfy live coverage, not the shared provider sweep
|
||||
|
||||
Opt-in live coverage for the bundled ComfyUI music path:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -79,6 +79,26 @@ Some providers accept additional or alternate API key env vars. See individual [
|
||||
Run `video_generate action=list` to inspect available providers, models, and
|
||||
runtime modes at runtime.
|
||||
|
||||
### Declared capability matrix
|
||||
|
||||
This is the explicit mode contract used by `video_generate`, contract tests,
|
||||
and the shared live sweep.
|
||||
|
||||
| Provider | `generate` | `imageToVideo` | `videoToVideo` | Shared live lanes today |
|
||||
| -------- | ---------- | -------------- | -------------- | ---------------------------------------------------------------------------------------------------------- |
|
||||
| Alibaba | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| BytePlus | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| ComfyUI | Yes | Yes | No | Not in the shared sweep; workflow-specific coverage lives with Comfy tests |
|
||||
| fal | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Google | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| MiniMax | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| OpenAI | Yes | Yes | Yes | `generate`, `imageToVideo`, `videoToVideo` |
|
||||
| Qwen | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider needs remote `http(s)` video URLs |
|
||||
| Runway | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` runs only when the selected model is `runway/gen4_aleph` |
|
||||
| Together | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| Vydra | Yes | Yes | No | `generate`, `imageToVideo` |
|
||||
| xAI | Yes | Yes | Yes | `generate`, `imageToVideo`; `videoToVideo` skipped because this provider currently needs a remote MP4 URL |
|
||||
|
||||
## Tool parameters
|
||||
|
||||
### Required
|
||||
@@ -201,9 +221,34 @@ capabilities: {
|
||||
}
|
||||
```
|
||||
|
||||
Legacy flat fields such as `maxInputImages` and `maxInputVideos` still work as
|
||||
backward-compatible aggregate caps, but they cannot express per-mode limits as
|
||||
precisely.
|
||||
Flat aggregate fields such as `maxInputImages` and `maxInputVideos` are not
|
||||
enough to advertise transform-mode support. Providers should declare
|
||||
`generate`, `imageToVideo`, and `videoToVideo` explicitly so live tests,
|
||||
contract tests, and the shared `video_generate` tool can validate mode support
|
||||
deterministically.
|
||||
|
||||
## Live tests
|
||||
|
||||
Opt-in live coverage for the shared bundled providers:
|
||||
|
||||
```bash
|
||||
OPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.ts
|
||||
```
|
||||
|
||||
This live file loads missing provider env vars from `~/.profile`, prefers
|
||||
live/env API keys ahead of stored auth profiles by default, and runs the
|
||||
declared modes it can exercise safely with local media:
|
||||
|
||||
- `generate` for every provider in the sweep
|
||||
- `imageToVideo` when `capabilities.imageToVideo.enabled`
|
||||
- `videoToVideo` when `capabilities.videoToVideo.enabled` and the provider/model
|
||||
accepts buffer-backed local video input in the shared sweep
|
||||
|
||||
Today the shared `videoToVideo` live lane covers:
|
||||
|
||||
- `google`
|
||||
- `openai`
|
||||
- `runway` only when you select `runway/gen4_aleph`
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user