fix(video): guard active async generation tasks

This commit is contained in:
Peter Steinberger
2026-04-06 01:02:25 +01:00
parent 6cdf5a43f2
commit e5cfdf437f
8 changed files with 425 additions and 23 deletions

View File

@@ -37,6 +37,7 @@ Docs: https://docs.openclaw.ai
- Tools/video generation: add a bundled Runway video provider (`runway/gen4.5`) with native async task polling, local image/video reference support via data URIs, provider docs, and live-test wiring.
- Agents/video generation: register `video_generate` runs in the task ledger with task/run ids and lifecycle updates so long-running generations can be tracked more reliably.
- Agents/video generation: make session-backed `video_generate` runs detach into background tasks, wake the same agent session on completion, and have the agent post the finished video back into the original channel as a follow-up reply.
- Agents/video generation: add active-task prompt hints plus a hard duplicate guard so session-backed `video_generate` returns task status for in-flight jobs instead of spawning the same video request twice, and expose `action=status` for explicit lookup.
- Providers/CLI: remove bundled CLI text-provider backends and the `agents.defaults.cliBackends` surface, while keeping ACP harness sessions and Gemini media understanding on the native bundled providers.
- Matrix/exec approvals: clarify unavailable-approval replies so Matrix no longer claims chat approvals are unsupported when native exec approvals are merely unconfigured. (#61424) Thanks @gumadeiras.
- Docs/IRC: replace public IRC hostname examples with `irc.example.com` and recommend private servers for bot coordination while listing common public networks for intentional use.

View File

@@ -83,6 +83,8 @@ Main-session cron tasks use `silent` notify policy by default — they create re
Session-backed `video_generate` runs also use `silent` notify policy. They still create task records, but completion is handed back to the original agent session as an internal wake so the agent can write the follow-up message and attach the finished video itself.
While a session-backed `video_generate` task is still active, the tool also acts as a guardrail: repeated `video_generate` calls in that same session return the active task status instead of starting a second concurrent generation. Use `action: "status"` when you want an explicit progress/status lookup from the agent side.
**What does not create tasks:**
- Heartbeat turns — main-session; see [Heartbeat](/gateway/heartbeat)

View File

@@ -65,30 +65,33 @@ Use `action: "list"` to inspect available providers and models at runtime:
## Tool parameters
| Parameter | Type | Description |
| ----------------- | -------- | -------------------------------------------------------------------------------------- |
| `prompt` | string | Video generation prompt (required for `action: "generate"`) |
| `action` | string | `"generate"` (default) or `"list"` to inspect providers |
| `model` | string | Provider/model override, e.g. `qwen/wan2.6-t2v` |
| `image` | string | Single reference image path or URL |
| `images` | string[] | Multiple reference images (up to 5) |
| `video` | string | Single reference video path or URL |
| `videos` | string[] | Multiple reference videos (up to 4) |
| `size` | string | Size hint when the provider supports it |
| `aspectRatio` | string | Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` |
| `resolution` | string | Resolution hint: `480P`, `720P`, or `1080P` |
| `durationSeconds` | number | Target duration in seconds. OpenClaw may round to the nearest provider-supported value |
| `audio` | boolean | Enable generated audio when the provider supports it |
| `watermark` | boolean | Toggle provider watermarking when supported |
| `filename` | string | Output filename hint |
| Parameter | Type | Description |
| ----------------- | -------- | ------------------------------------------------------------------------------------------------- |
| `prompt` | string | Video generation prompt (required for `action: "generate"`) |
| `action` | string | `"generate"` (default), `"status"` for the current session task, or `"list"` to inspect providers |
| `model` | string | Provider/model override, e.g. `qwen/wan2.6-t2v` |
| `image` | string | Single reference image path or URL |
| `images` | string[] | Multiple reference images (up to 5) |
| `video` | string | Single reference video path or URL |
| `videos` | string[] | Multiple reference videos (up to 4) |
| `size` | string | Size hint when the provider supports it |
| `aspectRatio` | string | Aspect ratio: `1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9` |
| `resolution` | string | Resolution hint: `480P`, `720P`, or `1080P` |
| `durationSeconds` | number | Target duration in seconds. OpenClaw may round to the nearest provider-supported value |
| `audio` | boolean | Enable generated audio when the provider supports it |
| `watermark` | boolean | Toggle provider watermarking when supported |
| `filename` | string | Output filename hint |
Not all providers support all parameters. Unsupported optional overrides are ignored on a best-effort basis and reported back in the tool result as a warning. Hard capability limits such as too many reference inputs still fail before submission. When a provider or model only supports a discrete set of video lengths, OpenClaw rounds `durationSeconds` to the nearest supported value and reports the normalized duration in the tool result.
## Async behavior
- Session-backed agent runs: `video_generate` creates a background task, returns a started/task response immediately, and posts the finished video later in a follow-up agent message.
- Duplicate prevention: while that background task is still `queued` or `running`, later `video_generate` calls in the same session return task status instead of starting another generation.
- Status lookup: use `action: "status"` to inspect the active session-backed video task without starting a new one.
- Task tracking: use `openclaw tasks list` / `openclaw tasks show <taskId>` to inspect queued, running, and terminal status for the generation.
- Completion wake: OpenClaw injects an internal completion event back into the same session so the model can write the user-facing follow-up itself.
- Prompt hint: later user/manual turns in the same session get a small runtime hint when a video task is already in flight so the model does not blindly call `video_generate` again.
- No-session fallback: direct/local contexts without a real agent session still run inline and return the final video result in the same turn.
## Configuration

View File

@@ -24,6 +24,7 @@ import { getGlobalHookRunner } from "../../../plugins/hook-runner-global.js";
import { resolveToolCallArgumentsEncoding } from "../../../plugins/provider-model-compat.js";
import { resolveProviderSystemPromptContribution } from "../../../plugins/provider-runtime.js";
import { isSubagentSessionKey } from "../../../routing/session-key.js";
import { joinPresentTextSegments } from "../../../shared/text/join-segments.js";
import { buildTtsSystemPromptHint } from "../../../tts/tts.js";
import { resolveUserPath } from "../../../utils.js";
import { normalizeMessageChannel } from "../../../utils/message-channel.js";
@@ -93,6 +94,7 @@ import { buildSystemPromptParams } from "../../system-prompt-params.js";
import { buildSystemPromptReport } from "../../system-prompt-report.js";
import { sanitizeToolCallIdsForCloudCodeAssist } from "../../tool-call-id.js";
import { resolveTranscriptPolicy } from "../../transcript-policy.js";
import { buildActiveVideoGenerationTaskPromptContextForSession } from "../../video-generation-task-status.js";
import { DEFAULT_BOOTSTRAP_FILENAME } from "../../workspace.js";
import { isRunnerAbortError } from "../abort.js";
import { isCacheTtlEligibleProvider } from "../cache-ttl.js";
@@ -1521,6 +1523,10 @@ export async function runEmbeddedAttempt(
hookRunner,
legacyBeforeAgentStartResult: params.legacyBeforeAgentStartResult,
});
const activeVideoTaskPromptContext =
params.trigger === "user" || params.trigger === "manual"
? buildActiveVideoGenerationTaskPromptContextForSession(params.sessionKey)
: undefined;
{
if (hookResult?.prependContext) {
effectivePrompt = `${hookResult.prependContext}\n\n${effectivePrompt}`;
@@ -1537,7 +1543,10 @@ export async function runEmbeddedAttempt(
}
const prependedOrAppendedSystemPrompt = composeSystemPromptWithHookContext({
baseSystemPrompt: systemPromptText,
prependSystemContext: hookResult?.prependSystemContext,
prependSystemContext: joinPresentTextSegments([
activeVideoTaskPromptContext,
hookResult?.prependSystemContext,
]),
appendSystemContext: hookResult?.appendSystemContext,
});
if (prependedOrAppendedSystemPrompt) {

View File

@@ -5,6 +5,10 @@ import * as videoGenerationRuntime from "../../video-generation/runtime.js";
import * as videoGenerateBackground from "./video-generate-background.js";
import { createVideoGenerateTool } from "./video-generate-tool.js";
const taskRuntimeInternalMocks = vi.hoisted(() => ({
listTasksForOwnerKey: vi.fn(),
}));
const taskExecutorMocks = vi.hoisted(() => ({
createRunningTaskRun: vi.fn(),
completeTaskRunByRunId: vi.fn(),
@@ -12,6 +16,7 @@ const taskExecutorMocks = vi.hoisted(() => ({
recordTaskRunProgressByRunId: vi.fn(),
}));
vi.mock("../../tasks/runtime-internal.js", () => taskRuntimeInternalMocks);
vi.mock("../../tasks/task-executor.js", () => taskExecutorMocks);
function asConfig(value: unknown): OpenClawConfig {
@@ -22,6 +27,8 @@ describe("createVideoGenerateTool", () => {
beforeEach(() => {
vi.restoreAllMocks();
vi.spyOn(videoGenerationRuntime, "listRuntimeVideoGenerationProviders").mockReturnValue([]);
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReset();
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([]);
taskExecutorMocks.createRunningTaskRun.mockReset();
taskExecutorMocks.completeTaskRunByRunId.mockReset();
taskExecutorMocks.failTaskRunByRunId.mockReset();
@@ -181,7 +188,8 @@ describe("createVideoGenerateTool", () => {
const result = await tool.execute("call-1", { prompt: "friendly lobster surfing" });
const text = (result.content?.[0] as { text: string } | undefined)?.text ?? "";
expect(text).toContain("Started video generation task task-123 in the background.");
expect(text).toContain("Background task started for video generation (task-123).");
expect(text).toContain("Do not call video_generate again for this request.");
expect(result.details).toMatchObject({
async: true,
status: "started",
@@ -213,6 +221,112 @@ describe("createVideoGenerateTool", () => {
);
});
it("returns active task status instead of starting a duplicate generation", async () => {
const generateSpy = vi.spyOn(videoGenerationRuntime, "generateVideo");
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([
{
taskId: "task-active",
runtime: "cli",
sourceId: "video_generate:openai",
requesterSessionKey: "agent:main:discord:direct:123",
ownerKey: "agent:main:discord:direct:123",
scopeKind: "session",
runId: "tool:video_generate:active",
task: "friendly lobster surfing",
status: "running",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
progressSummary: "Generating video",
},
]);
const tool = createVideoGenerateTool({
config: asConfig({
agents: {
defaults: {
videoGenerationModel: { primary: "openai/sora-2" },
},
},
}),
agentSessionKey: "agent:main:discord:direct:123",
});
if (!tool) {
throw new Error("expected video_generate tool");
}
const result = await tool.execute("call-dup", { prompt: "friendly lobster surfing" });
const text = (result.content?.[0] as { text: string } | undefined)?.text ?? "";
expect(text).toContain("Video generation task task-active is already running with openai.");
expect(text).toContain("Do not call video_generate again for this request.");
expect(result.details).toMatchObject({
action: "status",
duplicateGuard: true,
active: true,
existingTask: true,
status: "running",
provider: "openai",
task: {
taskId: "task-active",
runId: "tool:video_generate:active",
},
progressSummary: "Generating video",
});
expect(taskExecutorMocks.createRunningTaskRun).not.toHaveBeenCalled();
expect(generateSpy).not.toHaveBeenCalled();
});
it("reports active task status when action=status is requested", async () => {
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([
{
taskId: "task-active",
runtime: "cli",
sourceId: "video_generate:google",
requesterSessionKey: "agent:main:discord:direct:123",
ownerKey: "agent:main:discord:direct:123",
scopeKind: "session",
runId: "tool:video_generate:active",
task: "friendly lobster surfing",
status: "queued",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
progressSummary: "Queued video generation",
},
]);
const tool = createVideoGenerateTool({
config: asConfig({
agents: {
defaults: {
videoGenerationModel: { primary: "google/veo-3.1-fast-generate-preview" },
},
},
}),
agentSessionKey: "agent:main:discord:direct:123",
});
if (!tool) {
throw new Error("expected video_generate tool");
}
const result = await tool.execute("call-status", { action: "status" });
const text = (result.content?.[0] as { text: string } | undefined)?.text ?? "";
expect(text).toContain("Video generation task task-active is already queued with google.");
expect(result.details).toMatchObject({
action: "status",
active: true,
existingTask: true,
status: "queued",
provider: "google",
task: {
taskId: "task-active",
},
progressSummary: "Queued video generation",
});
});
it("surfaces provider generation failures inline when there is no detached session", async () => {
vi.spyOn(videoGenerationRuntime, "generateVideo").mockRejectedValue(new Error("queue boom"));

View File

@@ -21,6 +21,11 @@ import type {
VideoGenerationSourceAsset,
} from "../../video-generation/types.js";
import { normalizeProviderId } from "../provider-id.js";
import {
buildVideoGenerationTaskStatusDetails,
buildVideoGenerationTaskStatusText,
findActiveVideoGenerationTaskForSession,
} from "../video-generation-task-status.js";
import {
ToolInputError,
readNumberParam,
@@ -76,7 +81,7 @@ const VideoGenerateToolSchema = Type.Object({
action: Type.Optional(
Type.String({
description:
'Optional action: "generate" (default) or "list" to inspect available providers/models.',
'Optional action: "generate" (default), "status" to inspect the active session task, or "list" to inspect available providers/models.',
}),
),
prompt: Type.Optional(Type.String({ description: "Video generation prompt." })),
@@ -241,16 +246,16 @@ function isVideoGenerationProviderConfigured(params: {
return hasAuthForProvider({ provider: provider.id, agentDir: params.agentDir });
}
function resolveAction(args: Record<string, unknown>): "generate" | "list" {
function resolveAction(args: Record<string, unknown>): "generate" | "list" | "status" {
const raw = readStringParam(args, "action");
if (!raw) {
return "generate";
}
const normalized = raw.trim().toLowerCase();
if (normalized === "generate" || normalized === "list") {
if (normalized === "generate" || normalized === "list" || normalized === "status") {
return normalized;
}
throw new ToolInputError('action must be "generate" or "list"');
throw new ToolInputError('action must be "generate", "status", or "list"');
}
function normalizeResolution(raw: string | undefined): VideoGenerationResolution | undefined {
@@ -812,6 +817,53 @@ export function createVideoGenerateTool(options?: {
};
}
if (action === "status") {
const activeTask = findActiveVideoGenerationTaskForSession(options?.agentSessionKey);
if (!activeTask) {
return {
content: [
{
type: "text",
text: "No active video generation task is currently running for this session.",
},
],
details: {
action: "status",
active: false,
},
};
}
return {
content: [
{
type: "text",
text: buildVideoGenerationTaskStatusText(activeTask),
},
],
details: {
action: "status",
...buildVideoGenerationTaskStatusDetails(activeTask),
},
};
}
const activeTask = findActiveVideoGenerationTaskForSession(options?.agentSessionKey);
if (activeTask) {
return {
content: [
{
type: "text",
text: buildVideoGenerationTaskStatusText(activeTask, { duplicateGuard: true }),
},
],
details: {
action: "status",
duplicateGuard: true,
...buildVideoGenerationTaskStatusDetails(activeTask),
},
};
}
const prompt = readStringParam(args, "prompt", { required: true });
const model = readStringParam(args, "model");
const filename = readStringParam(args, "filename");
@@ -934,7 +986,7 @@ export function createVideoGenerateTool(options?: {
content: [
{
type: "text",
text: `Started video generation task ${taskHandle?.taskId ?? "unknown"} in the background. I'll post the finished video here when it's ready.`,
text: `Background task started for video generation (${taskHandle?.taskId ?? "unknown"}). Do not call video_generate again for this request. Wait for the completion event; I'll post the finished video here when it's ready.`,
},
],
details: {

View File

@@ -0,0 +1,127 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import {
buildActiveVideoGenerationTaskPromptContextForSession,
buildVideoGenerationTaskStatusDetails,
buildVideoGenerationTaskStatusText,
findActiveVideoGenerationTaskForSession,
getVideoGenerationTaskProviderId,
isActiveVideoGenerationTask,
} from "./video-generation-task-status.js";
const taskRuntimeInternalMocks = vi.hoisted(() => ({
listTasksForOwnerKey: vi.fn(),
}));
vi.mock("../tasks/runtime-internal.js", () => taskRuntimeInternalMocks);
describe("video generation task status", () => {
beforeEach(() => {
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReset();
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([]);
});
it("recognizes active session-backed video generation tasks", () => {
expect(
isActiveVideoGenerationTask({
taskId: "task-1",
runtime: "cli",
sourceId: "video_generate:openai",
requesterSessionKey: "agent:main",
ownerKey: "agent:main",
scopeKind: "session",
task: "make lobster video",
status: "running",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
}),
).toBe(true);
expect(
isActiveVideoGenerationTask({
taskId: "task-2",
runtime: "cron",
sourceId: "video_generate:openai",
requesterSessionKey: "agent:main",
ownerKey: "agent:main",
scopeKind: "session",
task: "make lobster video",
status: "running",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
}),
).toBe(false);
});
it("prefers a running task over queued session siblings", () => {
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([
{
taskId: "task-queued",
runtime: "cli",
sourceId: "video_generate:google",
requesterSessionKey: "agent:main",
ownerKey: "agent:main",
scopeKind: "session",
task: "queued task",
status: "queued",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
},
{
taskId: "task-running",
runtime: "cli",
sourceId: "video_generate:openai",
requesterSessionKey: "agent:main",
ownerKey: "agent:main",
scopeKind: "session",
task: "running task",
status: "running",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
progressSummary: "Generating video",
},
]);
const task = findActiveVideoGenerationTaskForSession("agent:main");
expect(task?.taskId).toBe("task-running");
expect(getVideoGenerationTaskProviderId(task!)).toBe("openai");
expect(buildVideoGenerationTaskStatusText(task!, { duplicateGuard: true })).toContain(
"Do not call video_generate again for this request.",
);
expect(buildVideoGenerationTaskStatusDetails(task!)).toMatchObject({
active: true,
existingTask: true,
status: "running",
provider: "openai",
progressSummary: "Generating video",
});
});
it("builds prompt context for active session work", () => {
taskRuntimeInternalMocks.listTasksForOwnerKey.mockReturnValue([
{
taskId: "task-running",
runtime: "cli",
sourceId: "video_generate:openai",
requesterSessionKey: "agent:main",
ownerKey: "agent:main",
scopeKind: "session",
task: "running task",
status: "running",
deliveryStatus: "not_applicable",
notifyPolicy: "silent",
createdAt: Date.now(),
progressSummary: "Generating video",
},
]);
const context = buildActiveVideoGenerationTaskPromptContextForSession("agent:main");
expect(context).toContain("An active video generation background task already exists");
expect(context).toContain("Task task-running is currently running via openai.");
expect(context).toContain('call `video_generate` with `action:"status"`');
});
});

View File

@@ -0,0 +1,94 @@
import { listTasksForOwnerKey } from "../tasks/runtime-internal.js";
import type { TaskRecord } from "../tasks/task-registry.types.js";
const ACTIVE_VIDEO_GENERATION_STATUSES = new Set(["queued", "running"]);
const VIDEO_GENERATION_SOURCE_PREFIX = "video_generate";
function isActiveStatus(status: string): boolean {
return ACTIVE_VIDEO_GENERATION_STATUSES.has(status);
}
export function isActiveVideoGenerationTask(task: TaskRecord): boolean {
const sourceId = task.sourceId?.trim() ?? "";
return (
task.runtime === "cli" &&
task.scopeKind === "session" &&
isActiveStatus(task.status) &&
(sourceId === VIDEO_GENERATION_SOURCE_PREFIX ||
sourceId.startsWith(`${VIDEO_GENERATION_SOURCE_PREFIX}:`))
);
}
export function getVideoGenerationTaskProviderId(task: TaskRecord): string | undefined {
const sourceId = task.sourceId?.trim() ?? "";
if (!sourceId.startsWith(`${VIDEO_GENERATION_SOURCE_PREFIX}:`)) {
return undefined;
}
const providerId = sourceId.slice(`${VIDEO_GENERATION_SOURCE_PREFIX}:`.length).trim();
return providerId || undefined;
}
export function findActiveVideoGenerationTaskForSession(sessionKey?: string): TaskRecord | null {
const normalizedSessionKey = sessionKey?.trim();
if (!normalizedSessionKey) {
return null;
}
const activeTasks = listTasksForOwnerKey(normalizedSessionKey).filter(
isActiveVideoGenerationTask,
);
if (activeTasks.length === 0) {
return null;
}
return activeTasks.find((task) => task.status === "running") ?? activeTasks[0] ?? null;
}
export function buildVideoGenerationTaskStatusDetails(task: TaskRecord): Record<string, unknown> {
const provider = getVideoGenerationTaskProviderId(task);
return {
async: true,
active: true,
existingTask: true,
status: task.status,
task: {
taskId: task.taskId,
...(task.runId ? { runId: task.runId } : {}),
},
...(task.progressSummary ? { progressSummary: task.progressSummary } : {}),
...(task.sourceId ? { sourceId: task.sourceId } : {}),
...(provider ? { provider } : {}),
};
}
export function buildVideoGenerationTaskStatusText(
task: TaskRecord,
params?: { duplicateGuard?: boolean },
): string {
const provider = getVideoGenerationTaskProviderId(task);
const lines = [
`Video generation task ${task.taskId} is already ${task.status}${provider ? ` with ${provider}` : ""}.`,
task.progressSummary ? `Progress: ${task.progressSummary}.` : null,
params?.duplicateGuard
? "Do not call video_generate again for this request. Wait for the completion event; I will post the finished video here."
: "Wait for the completion event; I will post the finished video here when it's ready.",
].filter((entry): entry is string => Boolean(entry));
return lines.join("\n");
}
export function buildActiveVideoGenerationTaskPromptContextForSession(
sessionKey?: string,
): string | undefined {
const task = findActiveVideoGenerationTaskForSession(sessionKey);
if (!task) {
return undefined;
}
const provider = getVideoGenerationTaskProviderId(task);
const lines = [
"An active video generation background task already exists for this session.",
`Task ${task.taskId} is currently ${task.status}${provider ? ` via ${provider}` : ""}.`,
task.progressSummary ? `Current progress: ${task.progressSummary}.` : null,
"Do not call `video_generate` again for the same request while that task is queued or running.",
'If the user asks for progress or whether the work is async, explain the active task state or call `video_generate` with `action:"status"` instead of starting a new generation.',
"Only start a new `video_generate` call if the user clearly asks for a different/new video.",
].filter((entry): entry is string => Boolean(entry));
return lines.join("\n");
}