feat: add vydra media provider

2026-04-24 23:21:30 +00:00 · 2026-04-06 02:15:51 +01:00
parent 7d2dc7a9fb
commit 9b2b22f350
21 changed files with 1358 additions and 11 deletions
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -1272,6 +1272,7 @@
                  "providers/together",
                  "providers/venice",
                  "providers/vercel-ai-gateway",
+                  "providers/vydra",
                  "providers/vllm",
                  "providers/volcengine",
                  "providers/xai",
--- a/docs/providers/index.md
+++ b/docs/providers/index.md
@@ -62,6 +62,7 @@ Looking for chat channel docs (WhatsApp/Telegram/Discord/Slack/Mattermost (plugi
 - [Together AI](/providers/together)
 - [Venice (Venice AI, privacy-focused)](/providers/venice)
 - [Vercel AI Gateway](/providers/vercel-ai-gateway)
+- [Vydra](/providers/vydra)
 - [vLLM (local models)](/providers/vllm)
 - [Volcengine (Doubao)](/providers/volcengine)
 - [xAI](/providers/xai)
--- a/docs/providers/vydra.md
+++ b/docs/providers/vydra.md
@@ -0,0 +1,123 @@
+---
+summary: "Use Vydra image, video, and speech in OpenClaw"
+read_when:
+  - You want Vydra media generation in OpenClaw
+  - You need Vydra API key setup guidance
+title: "Vydra"
+---
+
+# Vydra
+
+The bundled Vydra plugin adds:
+
+- image generation via `vydra/grok-imagine`
+- video generation via `vydra/veo3` and `vydra/kling`
+- speech synthesis via Vydra's ElevenLabs-backed TTS route
+
+OpenClaw uses the same `VYDRA_API_KEY` for all three capabilities.
+
+## Important base URL
+
+Use `https://www.vydra.ai/api/v1`.
+
+Vydra's apex host (`https://vydra.ai/api/v1`) currently redirects to `www`. Some HTTP clients drop `Authorization` on that cross-host redirect, which turns a valid API key into a misleading auth failure. The bundled plugin uses the `www` base URL directly to avoid that.
+
+## Setup
+
+Interactive onboarding:
+
+```bash
+openclaw onboard --auth-choice vydra-api-key
+```
+
+Or set the env var directly:
+
+```bash
+export VYDRA_API_KEY="vydra_live_..."
+```
+
+## Image generation
+
+Default image model:
+
+- `vydra/grok-imagine`
+
+Set it as the default image provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      imageGenerationModel: {
+        primary: "vydra/grok-imagine",
+      },
+    },
+  },
+}
+```
+
+Current bundled support is text-to-image only. Vydra's hosted edit routes expect remote image URLs, and OpenClaw does not add a Vydra-specific upload bridge in the bundled plugin yet.
+
+See [Image Generation](/tools/image-generation) for shared tool behavior.
+
+## Video generation
+
+Registered video models:
+
+- `vydra/veo3` for text-to-video
+- `vydra/kling` for image-to-video
+
+Set Vydra as the default video provider:
+
+```json5
+{
+  agents: {
+    defaults: {
+      videoGenerationModel: {
+        primary: "vydra/veo3",
+      },
+    },
+  },
+}
+```
+
+Notes:
+
+- `vydra/veo3` is bundled as text-to-video only.
+- `vydra/kling` currently requires a remote image URL reference. Local file uploads are rejected up front.
+- The bundled plugin stays conservative and does not forward undocumented style knobs such as aspect ratio, resolution, watermark, or generated audio.
+
+See [Video Generation](/tools/video-generation) for shared tool behavior.
+
+## Speech synthesis
+
+Set Vydra as the speech provider:
+
+```json5
+{
+  messages: {
+    tts: {
+      provider: "vydra",
+      providers: {
+        vydra: {
+          apiKey: "${VYDRA_API_KEY}",
+          voiceId: "21m00Tcm4TlvDq8ikWAM",
+        },
+      },
+    },
+  },
+}
+```
+
+Defaults:
+
+- model: `elevenlabs/tts`
+- voice id: `21m00Tcm4TlvDq8ikWAM`
+
+The bundled plugin currently exposes one known-good default voice and returns MP3 audio files.
+
+## Related
+
+- [Provider Directory](/providers/index)
+- [Image Generation](/tools/image-generation)
+- [Video Generation](/tools/video-generation)
--- a/docs/tools/image-generation.md
+++ b/docs/tools/image-generation.md
@@ -1,5 +1,5 @@
 ---
-summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI)"
+summary: "Generate and edit images using configured providers (OpenAI, Google Gemini, fal, MiniMax, ComfyUI, Vydra)"
 read_when:
  - Generating images via the agent
  - Configuring image generation providers and models
@@ -45,6 +45,7 @@ The agent calls `image_generate` automatically. No tool allow-listing needed —
 | fal      | `fal-ai/flux/dev`                | Yes                                | `FAL_KEY`                                             |
 | MiniMax  | `image-01`                       | Yes (subject reference)            | `MINIMAX_API_KEY` or MiniMax OAuth (`minimax-portal`) |
 | ComfyUI  | `workflow`                       | Yes (1 image, workflow-configured) | `COMFY_API_KEY` or `COMFY_CLOUD_API_KEY` for cloud    |
+| Vydra    | `grok-imagine`                   | No                                 | `VYDRA_API_KEY`                                       |

 Use `action: "list"` to inspect available providers and models at runtime:

@@ -123,13 +124,13 @@ MiniMax image generation is available through both bundled MiniMax auth paths:

 ## Provider capabilities

-| Capability            | OpenAI               | Google               | fal                 | MiniMax                    | ComfyUI                            |
-| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- |
-| Generate              | Yes (up to 4)        | Yes (up to 4)        | Yes (up to 4)       | Yes (up to 9)              | Yes (workflow-defined outputs)     |
-| Edit/reference        | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image)       | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) |
-| Size control          | Yes                  | Yes                  | Yes                 | No                         | No                                 |
-| Aspect ratio          | No                   | Yes                  | Yes (generate only) | Yes                        | No                                 |
-| Resolution (1K/2K/4K) | No                   | Yes                  | Yes                 | No                         | No                                 |
+| Capability            | OpenAI               | Google               | fal                 | MiniMax                    | ComfyUI                            | Vydra |
+| --------------------- | -------------------- | -------------------- | ------------------- | -------------------------- | ---------------------------------- | ----- |
+| Generate              | Yes (up to 4)        | Yes (up to 4)        | Yes (up to 4)       | Yes (up to 9)              | Yes (workflow-defined outputs)     | Yes (1) |
+| Edit/reference        | Yes (up to 5 images) | Yes (up to 5 images) | Yes (1 image)       | Yes (1 image, subject ref) | Yes (1 image, workflow-configured) | No |
+| Size control          | Yes                  | Yes                  | Yes                 | No                         | No                                 | No |
+| Aspect ratio          | No                   | Yes                  | Yes (generate only) | Yes                        | No                                 | No |
+| Resolution (1K/2K/4K) | No                   | Yes                  | Yes                 | No                         | No                                 | No |

 ## Related

@@ -139,5 +140,6 @@ MiniMax image generation is available through both bundled MiniMax auth paths:
 - [Google (Gemini)](/providers/google) — Gemini image provider setup
 - [MiniMax](/providers/minimax) — MiniMax image provider setup
 - [OpenAI](/providers/openai) — OpenAI Images provider setup
+- [Vydra](/providers/vydra) — Vydra image, video, and speech setup
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults) — `imageGenerationModel` config
 - [Models](/concepts/models) — model configuration and failover
--- a/docs/tools/video-generation.md
+++ b/docs/tools/video-generation.md
@@ -1,5 +1,5 @@
 ---
-summary: "Generate videos from text, images, or existing videos using 11 provider backends"
+summary: "Generate videos from text, images, or existing videos using 12 provider backends"
 read_when:
  - Generating videos via the agent
  - Configuring video generation providers and models
@@ -9,7 +9,7 @@ title: "Video Generation"

 # Video Generation

-OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Eleven provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.
+OpenClaw agents can generate videos from text prompts, reference images, or existing videos. Twelve provider backends are supported, each with different model options, input modes, and feature sets. The agent picks the right provider automatically based on your configuration and available API keys.

 <Note>
 The `video_generate` tool only appears when at least one video-generation provider is available. If you do not see it in your agent tools, set a provider API key or configure `agents.defaults.videoGenerationModel`.
@@ -62,6 +62,7 @@ Outside of session-backed agent runs (for example, direct tool invocations), the
 | Qwen     | `wan2.6-t2v`                    | Yes  | Yes (remote URL) | Yes (remote URL) | `QWEN_API_KEY`                           |
 | Runway   | `gen4.5`                        | Yes  | 1 image          | 1 video          | `RUNWAYML_API_SECRET`                    |
 | Together | `Wan-AI/Wan2.2-T2V-A14B`        | Yes  | 1 image          | No               | `TOGETHER_API_KEY`                       |
+| Vydra    | `veo3`                          | Yes  | 1 image (`kling`) | No              | `VYDRA_API_KEY`                          |
 | xAI      | `grok-imagine-video`            | Yes  | 1 image          | 1 video          | `XAI_API_KEY`                            |

 Some providers accept additional or alternate API key env vars. See individual [provider pages](#related) for details.
@@ -109,7 +110,7 @@ Not all providers support all parameters. Unsupported overrides are ignored on a
 ## Actions

 - **generate** (default) -- create a video from the given prompt and optional reference inputs.
- **status** -- check the state of the in-flight video task for the current session without starting a new one.
+- **status** -- check the state of the in-flight video task for the current session without starting another generation.
 - **list** -- show available providers, models, and their capabilities.

 ## Model selection
@@ -150,6 +151,7 @@ If a provider fails, the next candidate is tried automatically. If all candidate
 | Qwen     | Same DashScope backend as Alibaba. Reference inputs must be remote `http(s)` URLs; local files are rejected upfront.                     |
 | Runway   | Supports local files via data URIs. Video-to-video requires `runway/gen4_aleph`. Text-only runs expose `16:9` and `9:16` aspect ratios.  |
 | Together | Single image reference only.                                                                                                             |
+| Vydra    | Uses `https://www.vydra.ai/api/v1` directly to avoid auth-dropping redirects. `veo3` is bundled as text-to-video only; `kling` requires a remote image URL. |
 | xAI      | Supports text-to-video, image-to-video, and remote video edit/extend flows.                                                              |

 ## Configuration
@@ -189,6 +191,7 @@ openclaw config set agents.defaults.videoGenerationModel.primary "qwen/wan2.6-t2
 - [Qwen](/providers/qwen)
 - [Runway](/providers/runway)
 - [Together AI](/providers/together)
+- [Vydra](/providers/vydra)
 - [xAI](/providers/xai)
 - [Configuration Reference](/gateway/configuration-reference#agent-defaults)
 - [Models](/concepts/models)