diff --git a/CHANGELOG.md b/CHANGELOG.md index 36164bb7da6..f00ddfbf888 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -69,6 +69,7 @@ Docs: https://docs.openclaw.ai - Config: keep legacy audio transcription migration strict by rejecting non-string/unsafe command tokens while still migrating valid custom script executables. (#5042) Thanks @shayan919293. - Status/Sessions: stop clamping derived `totalTokens` to context-window size, keep prompt-token snapshots wired through session accounting, and surface context usage as unknown when fresh snapshot data is missing to avoid false 100% reports. (#15114) Thanks @echoVic. - Providers/MiniMax: switch implicit MiniMax API-key provider from `openai-completions` to `anthropic-messages` with the correct Anthropic-compatible base URL, fixing `invalid role: developer (2013)` errors on MiniMax M2.5. (#15275) Thanks @lailoo. +- Ollama/Agents: use resolved model/provider base URLs for native `/api/chat` streaming (including aliased providers), normalize `/v1` endpoints, and forward abort + `maxTokens` stream options for reliable cancellation and token caps. (#11853) Thanks @BrokenFinger98. ## 2026.2.12 diff --git a/docs/providers/ollama.md b/docs/providers/ollama.md index 463923fb7c2..c6a0e2372e6 100644 --- a/docs/providers/ollama.md +++ b/docs/providers/ollama.md @@ -8,7 +8,7 @@ title: "Ollama" # Ollama -Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's OpenAI-compatible API and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry. +Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's native API (`/api/chat`), supporting streaming and tool calling, and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry. ## Quick start @@ -101,10 +101,9 @@ Use explicit config when: models: { providers: { ollama: { - // Use a host that includes /v1 for OpenAI-compatible APIs - baseUrl: "http://ollama-host:11434/v1", + baseUrl: "http://ollama-host:11434", apiKey: "ollama-local", - api: "openai-completions", + api: "ollama", models: [ { id: "gpt-oss:20b", @@ -134,7 +133,7 @@ If Ollama is running on a different host or port (explicit config disables auto- providers: { ollama: { apiKey: "ollama-local", - baseUrl: "http://ollama-host:11434/v1", + baseUrl: "http://ollama-host:11434", }, }, }, @@ -174,45 +173,28 @@ Ollama is free and runs locally, so all model costs are set to $0. ### Streaming Configuration -Due to a [known issue](https://github.com/badlogic/pi-mono/issues/1205) in the underlying SDK with Ollama's response format, **streaming is disabled by default** for Ollama models. This prevents corrupted responses when using tool-capable models. +OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed. -When streaming is disabled, responses are delivered all at once (non-streaming mode), which avoids the issue where interleaved content/reasoning deltas cause garbled output. +#### Legacy OpenAI-Compatible Mode -#### Re-enable Streaming (Advanced) - -If you want to re-enable streaming for Ollama (may cause issues with tool-capable models): +If you need to use the OpenAI-compatible endpoint instead (e.g., behind a proxy that only supports OpenAI format), set `api: "openai-completions"` explicitly: ```json5 { - agents: { - defaults: { - models: { - "ollama/gpt-oss:20b": { - streaming: true, - }, - }, - }, - }, + models: { + providers: { + ollama: { + baseUrl: "http://ollama-host:11434/v1", + api: "openai-completions", + apiKey: "ollama-local", + models: [...] + } + } + } } ``` -#### Disable Streaming for Other Providers - -You can also disable streaming for any provider if needed: - -```json5 -{ - agents: { - defaults: { - models: { - "openai/gpt-4": { - streaming: false, - }, - }, - }, - }, -} -``` +Note: The OpenAI-compatible endpoint may not support streaming + tool calling simultaneously. You may need to disable streaming with `params: { streaming: false }` in model config. ### Context windows @@ -261,15 +243,6 @@ ps aux | grep ollama ollama serve ``` -### Corrupted responses or tool names in output - -If you see garbled responses containing tool names (like `sessions_send`, `memory_get`) or fragmented text when using Ollama models, this is due to an upstream SDK issue with streaming responses. **This is fixed by default** in the latest OpenClaw version by disabling streaming for Ollama models. - -If you manually enabled streaming and experience this issue: - -1. Remove the `streaming: true` configuration from your Ollama model entries, or -2. Explicitly set `streaming: false` for Ollama models (see [Streaming Configuration](#streaming-configuration)) - ## See Also - [Model Providers](/concepts/model-providers) - Overview of all providers diff --git a/src/agents/models-config.providers.ollama.e2e.test.ts b/src/agents/models-config.providers.ollama.e2e.test.ts index 3b9624a8eb6..263ef5574d4 100644 --- a/src/agents/models-config.providers.ollama.e2e.test.ts +++ b/src/agents/models-config.providers.ollama.e2e.test.ts @@ -29,25 +29,20 @@ describe("Ollama provider", () => { const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-")); const providers = await resolveImplicitProviders({ agentDir }); - // Ollama requires explicit configuration via OLLAMA_API_KEY env var or profile expect(providers?.ollama).toBeUndefined(); }); - it("should disable streaming by default for Ollama models", async () => { + it("should use native ollama api type", async () => { const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-")); process.env.OLLAMA_API_KEY = "test-key"; try { const providers = await resolveImplicitProviders({ agentDir }); - // Provider should be defined with OLLAMA_API_KEY set expect(providers?.ollama).toBeDefined(); expect(providers?.ollama?.apiKey).toBe("OLLAMA_API_KEY"); - - // Note: discoverOllamaModels() returns empty array in test environments (VITEST env var check) - // so we can't test the actual model discovery here. The streaming: false setting - // is applied in the model mapping within discoverOllamaModels(). - // The configuration structure itself is validated by TypeScript and the Zod schema. + expect(providers?.ollama?.api).toBe("ollama"); + expect(providers?.ollama?.baseUrl).toBe("http://127.0.0.1:11434"); } finally { delete process.env.OLLAMA_API_KEY; } @@ -69,15 +64,14 @@ describe("Ollama provider", () => { }, }); - expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434/v1"); + // Native API strips /v1 suffix via resolveOllamaApiBase() + expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434"); } finally { delete process.env.OLLAMA_API_KEY; } }); - it("should have correct model structure with streaming disabled (unit test)", () => { - // This test directly verifies the model configuration structure - // since discoverOllamaModels() returns empty array in test mode + it("should have correct model structure without streaming override", () => { const mockOllamaModel = { id: "llama3.3:latest", name: "llama3.3:latest", @@ -86,13 +80,9 @@ describe("Ollama provider", () => { cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 128000, maxTokens: 8192, - params: { - streaming: false, - }, }; - // Verify the model structure matches what discoverOllamaModels() would return - expect(mockOllamaModel.params?.streaming).toBe(false); - expect(mockOllamaModel.params).toHaveProperty("streaming"); + // Native Ollama provider does not need streaming: false workaround + expect(mockOllamaModel).not.toHaveProperty("params"); }); }); diff --git a/src/agents/models-config.providers.ts b/src/agents/models-config.providers.ts index aa6adfd434a..32835dc0f64 100644 --- a/src/agents/models-config.providers.ts +++ b/src/agents/models-config.providers.ts @@ -17,6 +17,7 @@ import { buildHuggingfaceModelDefinition, } from "./huggingface-models.js"; import { resolveAwsSdkEnvVarName, resolveEnvApiKey } from "./model-auth.js"; +import { OLLAMA_NATIVE_BASE_URL } from "./ollama-stream.js"; import { buildSyntheticModelDefinition, SYNTHETIC_BASE_URL, @@ -79,8 +80,8 @@ const QWEN_PORTAL_DEFAULT_COST = { cacheWrite: 0, }; -const OLLAMA_BASE_URL = "http://127.0.0.1:11434/v1"; -const OLLAMA_API_BASE_URL = "http://127.0.0.1:11434"; +const OLLAMA_BASE_URL = OLLAMA_NATIVE_BASE_URL; +const OLLAMA_API_BASE_URL = OLLAMA_BASE_URL; const OLLAMA_DEFAULT_CONTEXT_WINDOW = 128000; const OLLAMA_DEFAULT_MAX_TOKENS = 8192; const OLLAMA_DEFAULT_COST = { @@ -180,11 +181,6 @@ async function discoverOllamaModels(baseUrl?: string): Promise { async function buildOllamaProvider(configuredBaseUrl?: string): Promise { const models = await discoverOllamaModels(configuredBaseUrl); return { - baseUrl: configuredBaseUrl ?? OLLAMA_BASE_URL, - api: "openai-completions", + baseUrl: resolveOllamaApiBase(configuredBaseUrl), + api: "ollama", models, }; } diff --git a/src/agents/ollama-stream.test.ts b/src/agents/ollama-stream.test.ts new file mode 100644 index 00000000000..1589f2f25c8 --- /dev/null +++ b/src/agents/ollama-stream.test.ts @@ -0,0 +1,290 @@ +import { describe, expect, it, vi } from "vitest"; +import { + createOllamaStreamFn, + convertToOllamaMessages, + buildAssistantMessage, + parseNdjsonStream, +} from "./ollama-stream.js"; + +describe("convertToOllamaMessages", () => { + it("converts user text messages", () => { + const messages = [{ role: "user", content: "hello" }]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "user", content: "hello" }]); + }); + + it("converts user messages with content parts", () => { + const messages = [ + { + role: "user", + content: [ + { type: "text", text: "describe this" }, + { type: "image", data: "base64data" }, + ], + }, + ]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "user", content: "describe this", images: ["base64data"] }]); + }); + + it("prepends system message when provided", () => { + const messages = [{ role: "user", content: "hello" }]; + const result = convertToOllamaMessages(messages, "You are helpful."); + expect(result[0]).toEqual({ role: "system", content: "You are helpful." }); + expect(result[1]).toEqual({ role: "user", content: "hello" }); + }); + + it("converts assistant messages with toolCall content blocks", () => { + const messages = [ + { + role: "assistant", + content: [ + { type: "text", text: "Let me check." }, + { type: "toolCall", id: "call_1", name: "bash", arguments: { command: "ls" } }, + ], + }, + ]; + const result = convertToOllamaMessages(messages); + expect(result[0].role).toBe("assistant"); + expect(result[0].content).toBe("Let me check."); + expect(result[0].tool_calls).toEqual([ + { function: { name: "bash", arguments: { command: "ls" } } }, + ]); + }); + + it("converts tool result messages with 'tool' role", () => { + const messages = [{ role: "tool", content: "file1.txt\nfile2.txt" }]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "tool", content: "file1.txt\nfile2.txt" }]); + }); + + it("converts SDK 'toolResult' role to Ollama 'tool' role", () => { + const messages = [{ role: "toolResult", content: "command output here" }]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "tool", content: "command output here" }]); + }); + + it("includes tool_name from SDK toolResult messages", () => { + const messages = [{ role: "toolResult", content: "file contents here", toolName: "read" }]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "tool", content: "file contents here", tool_name: "read" }]); + }); + + it("omits tool_name when not provided in toolResult", () => { + const messages = [{ role: "toolResult", content: "output" }]; + const result = convertToOllamaMessages(messages); + expect(result).toEqual([{ role: "tool", content: "output" }]); + expect(result[0]).not.toHaveProperty("tool_name"); + }); + + it("handles empty messages array", () => { + const result = convertToOllamaMessages([]); + expect(result).toEqual([]); + }); +}); + +describe("buildAssistantMessage", () => { + const modelInfo = { api: "ollama", provider: "ollama", id: "qwen3:32b" }; + + it("builds text-only response", () => { + const response = { + model: "qwen3:32b", + created_at: "2026-01-01T00:00:00Z", + message: { role: "assistant" as const, content: "Hello!" }, + done: true, + prompt_eval_count: 10, + eval_count: 5, + }; + const result = buildAssistantMessage(response, modelInfo); + expect(result.role).toBe("assistant"); + expect(result.content).toEqual([{ type: "text", text: "Hello!" }]); + expect(result.stopReason).toBe("stop"); + expect(result.usage.input).toBe(10); + expect(result.usage.output).toBe(5); + expect(result.usage.totalTokens).toBe(15); + }); + + it("builds response with tool calls", () => { + const response = { + model: "qwen3:32b", + created_at: "2026-01-01T00:00:00Z", + message: { + role: "assistant" as const, + content: "", + tool_calls: [{ function: { name: "bash", arguments: { command: "ls -la" } } }], + }, + done: true, + prompt_eval_count: 20, + eval_count: 10, + }; + const result = buildAssistantMessage(response, modelInfo); + expect(result.stopReason).toBe("toolUse"); + expect(result.content.length).toBe(1); // toolCall only (empty content is skipped) + expect(result.content[0].type).toBe("toolCall"); + const toolCall = result.content[0] as { + type: "toolCall"; + id: string; + name: string; + arguments: Record; + }; + expect(toolCall.name).toBe("bash"); + expect(toolCall.arguments).toEqual({ command: "ls -la" }); + expect(toolCall.id).toMatch(/^ollama_call_[0-9a-f-]{36}$/); + }); + + it("sets all costs to zero for local models", () => { + const response = { + model: "qwen3:32b", + created_at: "2026-01-01T00:00:00Z", + message: { role: "assistant" as const, content: "ok" }, + done: true, + }; + const result = buildAssistantMessage(response, modelInfo); + expect(result.usage.cost).toEqual({ + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + total: 0, + }); + }); +}); + +// Helper: build a ReadableStreamDefaultReader from NDJSON lines +function mockNdjsonReader(lines: string[]): ReadableStreamDefaultReader { + const encoder = new TextEncoder(); + const payload = lines.join("\n") + "\n"; + let consumed = false; + return { + read: async () => { + if (consumed) { + return { done: true as const, value: undefined }; + } + consumed = true; + return { done: false as const, value: encoder.encode(payload) }; + }, + releaseLock: () => {}, + cancel: async () => {}, + closed: Promise.resolve(undefined), + } as unknown as ReadableStreamDefaultReader; +} + +describe("parseNdjsonStream", () => { + it("parses text-only streaming chunks", async () => { + const reader = mockNdjsonReader([ + '{"model":"m","created_at":"t","message":{"role":"assistant","content":"Hello"},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":" world"},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":5,"eval_count":2}', + ]); + const chunks = []; + for await (const chunk of parseNdjsonStream(reader)) { + chunks.push(chunk); + } + expect(chunks).toHaveLength(3); + expect(chunks[0].message.content).toBe("Hello"); + expect(chunks[1].message.content).toBe(" world"); + expect(chunks[2].done).toBe(true); + }); + + it("parses tool_calls from intermediate chunk (not final)", async () => { + // Ollama sends tool_calls in done:false chunk, final done:true has no tool_calls + const reader = mockNdjsonReader([ + '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":10,"eval_count":5}', + ]); + const chunks = []; + for await (const chunk of parseNdjsonStream(reader)) { + chunks.push(chunk); + } + expect(chunks).toHaveLength(2); + expect(chunks[0].done).toBe(false); + expect(chunks[0].message.tool_calls).toHaveLength(1); + expect(chunks[0].message.tool_calls![0].function.name).toBe("bash"); + expect(chunks[1].done).toBe(true); + expect(chunks[1].message.tool_calls).toBeUndefined(); + }); + + it("accumulates tool_calls across multiple intermediate chunks", async () => { + const reader = mockNdjsonReader([ + '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"read","arguments":{"path":"/tmp/a"}}}]},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true}', + ]); + + // Simulate the accumulation logic from createOllamaStreamFn + const accumulatedToolCalls: Array<{ + function: { name: string; arguments: Record }; + }> = []; + const chunks = []; + for await (const chunk of parseNdjsonStream(reader)) { + chunks.push(chunk); + if (chunk.message?.tool_calls) { + accumulatedToolCalls.push(...chunk.message.tool_calls); + } + } + expect(accumulatedToolCalls).toHaveLength(2); + expect(accumulatedToolCalls[0].function.name).toBe("read"); + expect(accumulatedToolCalls[1].function.name).toBe("bash"); + // Final done:true chunk has no tool_calls + expect(chunks[2].message.tool_calls).toBeUndefined(); + }); +}); + +describe("createOllamaStreamFn", () => { + it("normalizes /v1 baseUrl and maps maxTokens + signal", async () => { + const originalFetch = globalThis.fetch; + const fetchMock = vi.fn(async () => { + const payload = [ + '{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}', + '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":1,"eval_count":1}', + ].join("\n"); + return new Response(`${payload}\n`, { + status: 200, + headers: { "Content-Type": "application/x-ndjson" }, + }); + }); + globalThis.fetch = fetchMock as unknown as typeof fetch; + + try { + const streamFn = createOllamaStreamFn("http://ollama-host:11434/v1/"); + const signal = new AbortController().signal; + const stream = streamFn( + { + id: "qwen3:32b", + api: "ollama", + provider: "custom-ollama", + contextWindow: 131072, + } as unknown as Parameters[0], + { + messages: [{ role: "user", content: "hello" }], + } as unknown as Parameters[1], + { + maxTokens: 123, + signal, + } as unknown as Parameters[2], + ); + + const events = []; + for await (const event of stream) { + events.push(event); + } + expect(events.at(-1)?.type).toBe("done"); + + expect(fetchMock).toHaveBeenCalledTimes(1); + const [url, requestInit] = fetchMock.mock.calls[0] as [string, RequestInit]; + expect(url).toBe("http://ollama-host:11434/api/chat"); + expect(requestInit.signal).toBe(signal); + if (typeof requestInit.body !== "string") { + throw new Error("Expected string request body"); + } + + const requestBody = JSON.parse(requestInit.body) as { + options: { num_ctx?: number; num_predict?: number }; + }; + expect(requestBody.options.num_ctx).toBe(131072); + expect(requestBody.options.num_predict).toBe(123); + } finally { + globalThis.fetch = originalFetch; + } + }); +}); diff --git a/src/agents/ollama-stream.ts b/src/agents/ollama-stream.ts new file mode 100644 index 00000000000..76029e67cea --- /dev/null +++ b/src/agents/ollama-stream.ts @@ -0,0 +1,419 @@ +import type { StreamFn } from "@mariozechner/pi-agent-core"; +import type { + AssistantMessage, + StopReason, + TextContent, + ToolCall, + Tool, + Usage, +} from "@mariozechner/pi-ai"; +import { createAssistantMessageEventStream } from "@mariozechner/pi-ai"; +import { randomUUID } from "node:crypto"; + +export const OLLAMA_NATIVE_BASE_URL = "http://127.0.0.1:11434"; + +// ── Ollama /api/chat request types ────────────────────────────────────────── + +interface OllamaChatRequest { + model: string; + messages: OllamaChatMessage[]; + stream: boolean; + tools?: OllamaTool[]; + options?: Record; +} + +interface OllamaChatMessage { + role: "system" | "user" | "assistant" | "tool"; + content: string; + images?: string[]; + tool_calls?: OllamaToolCall[]; + tool_name?: string; +} + +interface OllamaTool { + type: "function"; + function: { + name: string; + description: string; + parameters: Record; + }; +} + +interface OllamaToolCall { + function: { + name: string; + arguments: Record; + }; +} + +// ── Ollama /api/chat response types ───────────────────────────────────────── + +interface OllamaChatResponse { + model: string; + created_at: string; + message: { + role: "assistant"; + content: string; + tool_calls?: OllamaToolCall[]; + }; + done: boolean; + done_reason?: string; + total_duration?: number; + load_duration?: number; + prompt_eval_count?: number; + prompt_eval_duration?: number; + eval_count?: number; + eval_duration?: number; +} + +// ── Message conversion ────────────────────────────────────────────────────── + +type InputContentPart = + | { type: "text"; text: string } + | { type: "image"; data: string } + | { type: "toolCall"; id: string; name: string; arguments: Record } + | { type: "tool_use"; id: string; name: string; input: Record }; + +function extractTextContent(content: unknown): string { + if (typeof content === "string") { + return content; + } + if (!Array.isArray(content)) { + return ""; + } + return (content as InputContentPart[]) + .filter((part): part is { type: "text"; text: string } => part.type === "text") + .map((part) => part.text) + .join(""); +} + +function extractOllamaImages(content: unknown): string[] { + if (!Array.isArray(content)) { + return []; + } + return (content as InputContentPart[]) + .filter((part): part is { type: "image"; data: string } => part.type === "image") + .map((part) => part.data); +} + +function extractToolCalls(content: unknown): OllamaToolCall[] { + if (!Array.isArray(content)) { + return []; + } + const parts = content as InputContentPart[]; + const result: OllamaToolCall[] = []; + for (const part of parts) { + if (part.type === "toolCall") { + result.push({ function: { name: part.name, arguments: part.arguments } }); + } else if (part.type === "tool_use") { + result.push({ function: { name: part.name, arguments: part.input } }); + } + } + return result; +} + +export function convertToOllamaMessages( + messages: Array<{ role: string; content: unknown }>, + system?: string, +): OllamaChatMessage[] { + const result: OllamaChatMessage[] = []; + + if (system) { + result.push({ role: "system", content: system }); + } + + for (const msg of messages) { + const { role } = msg; + + if (role === "user") { + const text = extractTextContent(msg.content); + const images = extractOllamaImages(msg.content); + result.push({ + role: "user", + content: text, + ...(images.length > 0 ? { images } : {}), + }); + } else if (role === "assistant") { + const text = extractTextContent(msg.content); + const toolCalls = extractToolCalls(msg.content); + result.push({ + role: "assistant", + content: text, + ...(toolCalls.length > 0 ? { tool_calls: toolCalls } : {}), + }); + } else if (role === "tool" || role === "toolResult") { + // SDK uses "toolResult" (camelCase) for tool result messages. + // Ollama API expects "tool" role with tool_name per the native spec. + const text = extractTextContent(msg.content); + const toolName = + typeof (msg as { toolName?: unknown }).toolName === "string" + ? (msg as { toolName?: string }).toolName + : undefined; + result.push({ + role: "tool", + content: text, + ...(toolName ? { tool_name: toolName } : {}), + }); + } + } + + return result; +} + +// ── Tool extraction ───────────────────────────────────────────────────────── + +function extractOllamaTools(tools: Tool[] | undefined): OllamaTool[] { + if (!tools || !Array.isArray(tools)) { + return []; + } + const result: OllamaTool[] = []; + for (const tool of tools) { + if (typeof tool.name !== "string" || !tool.name) { + continue; + } + result.push({ + type: "function", + function: { + name: tool.name, + description: typeof tool.description === "string" ? tool.description : "", + parameters: (tool.parameters ?? {}) as Record, + }, + }); + } + return result; +} + +// ── Response conversion ───────────────────────────────────────────────────── + +export function buildAssistantMessage( + response: OllamaChatResponse, + modelInfo: { api: string; provider: string; id: string }, +): AssistantMessage { + const content: (TextContent | ToolCall)[] = []; + + if (response.message.content) { + content.push({ type: "text", text: response.message.content }); + } + + const toolCalls = response.message.tool_calls; + if (toolCalls && toolCalls.length > 0) { + for (const tc of toolCalls) { + content.push({ + type: "toolCall", + id: `ollama_call_${randomUUID()}`, + name: tc.function.name, + arguments: tc.function.arguments, + }); + } + } + + const hasToolCalls = toolCalls && toolCalls.length > 0; + const stopReason: StopReason = hasToolCalls ? "toolUse" : "stop"; + + const usage: Usage = { + input: response.prompt_eval_count ?? 0, + output: response.eval_count ?? 0, + cacheRead: 0, + cacheWrite: 0, + totalTokens: (response.prompt_eval_count ?? 0) + (response.eval_count ?? 0), + cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 }, + }; + + return { + role: "assistant", + content, + stopReason, + api: modelInfo.api, + provider: modelInfo.provider, + model: modelInfo.id, + usage, + timestamp: Date.now(), + }; +} + +// ── NDJSON streaming parser ───────────────────────────────────────────────── + +export async function* parseNdjsonStream( + reader: ReadableStreamDefaultReader, +): AsyncGenerator { + const decoder = new TextDecoder(); + let buffer = ""; + + while (true) { + const { done, value } = await reader.read(); + if (done) { + break; + } + buffer += decoder.decode(value, { stream: true }); + const lines = buffer.split("\n"); + buffer = lines.pop() ?? ""; + + for (const line of lines) { + const trimmed = line.trim(); + if (!trimmed) { + continue; + } + try { + yield JSON.parse(trimmed) as OllamaChatResponse; + } catch { + console.warn("[ollama-stream] Skipping malformed NDJSON line:", trimmed.slice(0, 120)); + } + } + } + + if (buffer.trim()) { + try { + yield JSON.parse(buffer.trim()) as OllamaChatResponse; + } catch { + console.warn( + "[ollama-stream] Skipping malformed trailing data:", + buffer.trim().slice(0, 120), + ); + } + } +} + +// ── Main StreamFn factory ─────────────────────────────────────────────────── + +function resolveOllamaChatUrl(baseUrl: string): string { + const trimmed = baseUrl.trim().replace(/\/+$/, ""); + const normalizedBase = trimmed.replace(/\/v1$/i, ""); + const apiBase = normalizedBase || OLLAMA_NATIVE_BASE_URL; + return `${apiBase}/api/chat`; +} + +export function createOllamaStreamFn(baseUrl: string): StreamFn { + const chatUrl = resolveOllamaChatUrl(baseUrl); + + return (model, context, options) => { + const stream = createAssistantMessageEventStream(); + + const run = async () => { + try { + const ollamaMessages = convertToOllamaMessages( + context.messages ?? [], + context.systemPrompt, + ); + + const ollamaTools = extractOllamaTools(context.tools); + + // Ollama defaults to num_ctx=4096 which is too small for large + // system prompts + many tool definitions. Use model's contextWindow. + const ollamaOptions: Record = { num_ctx: model.contextWindow ?? 65536 }; + if (typeof options?.temperature === "number") { + ollamaOptions.temperature = options.temperature; + } + if (typeof options?.maxTokens === "number") { + ollamaOptions.num_predict = options.maxTokens; + } + + const body: OllamaChatRequest = { + model: model.id, + messages: ollamaMessages, + stream: true, + ...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}), + options: ollamaOptions, + }; + + const headers: Record = { + "Content-Type": "application/json", + ...options?.headers, + }; + if (options?.apiKey) { + headers.Authorization = `Bearer ${options.apiKey}`; + } + + const response = await fetch(chatUrl, { + method: "POST", + headers, + body: JSON.stringify(body), + signal: options?.signal, + }); + + if (!response.ok) { + const errorText = await response.text().catch(() => "unknown error"); + throw new Error(`Ollama API error ${response.status}: ${errorText}`); + } + + if (!response.body) { + throw new Error("Ollama API returned empty response body"); + } + + const reader = response.body.getReader(); + let accumulatedContent = ""; + const accumulatedToolCalls: OllamaToolCall[] = []; + let finalResponse: OllamaChatResponse | undefined; + + for await (const chunk of parseNdjsonStream(reader)) { + if (chunk.message?.content) { + accumulatedContent += chunk.message.content; + } + + // Ollama sends tool_calls in intermediate (done:false) chunks, + // NOT in the final done:true chunk. Collect from all chunks. + if (chunk.message?.tool_calls) { + accumulatedToolCalls.push(...chunk.message.tool_calls); + } + + if (chunk.done) { + finalResponse = chunk; + break; + } + } + + if (!finalResponse) { + throw new Error("Ollama API stream ended without a final response"); + } + + finalResponse.message.content = accumulatedContent; + if (accumulatedToolCalls.length > 0) { + finalResponse.message.tool_calls = accumulatedToolCalls; + } + + const assistantMessage = buildAssistantMessage(finalResponse, { + api: model.api, + provider: model.provider, + id: model.id, + }); + + const reason: Extract = + assistantMessage.stopReason === "toolUse" ? "toolUse" : "stop"; + + stream.push({ + type: "done", + reason, + message: assistantMessage, + }); + } catch (err) { + const errorMessage = err instanceof Error ? err.message : String(err); + stream.push({ + type: "error", + reason: "error", + error: { + role: "assistant" as const, + content: [], + stopReason: "error" as StopReason, + errorMessage, + api: model.api, + provider: model.provider, + model: model.id, + usage: { + input: 0, + output: 0, + cacheRead: 0, + cacheWrite: 0, + totalTokens: 0, + cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 }, + }, + timestamp: Date.now(), + }, + }); + } finally { + stream.end(); + } + }; + + queueMicrotask(() => void run()); + return stream; + }; +} diff --git a/src/agents/pi-embedded-runner/run/attempt.ts b/src/agents/pi-embedded-runner/run/attempt.ts index 5f76d3bdff7..7b91249a4bb 100644 --- a/src/agents/pi-embedded-runner/run/attempt.ts +++ b/src/agents/pi-embedded-runner/run/attempt.ts @@ -31,6 +31,7 @@ import { resolveOpenClawDocsPath } from "../../docs-path.js"; import { isTimeoutError } from "../../failover-error.js"; import { resolveModelAuthMode } from "../../model-auth.js"; import { resolveDefaultModelForAgent } from "../../model-selection.js"; +import { createOllamaStreamFn, OLLAMA_NATIVE_BASE_URL } from "../../ollama-stream.js"; import { isCloudCodeAssistFormatError, resolveBootstrapMaxChars, @@ -584,8 +585,21 @@ export async function runEmbeddedAttempt( workspaceDir: params.workspaceDir, }); - // Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai. - activeSession.agent.streamFn = streamSimple; + // Ollama native API: bypass SDK's streamSimple and use direct /api/chat calls + // for reliable streaming + tool calling support (#11828). + if (params.model.api === "ollama") { + // Use the resolved model baseUrl first so custom provider aliases work. + const providerConfig = params.config?.models?.providers?.[params.model.provider]; + const modelBaseUrl = + typeof params.model.baseUrl === "string" ? params.model.baseUrl.trim() : ""; + const providerBaseUrl = + typeof providerConfig?.baseUrl === "string" ? providerConfig.baseUrl.trim() : ""; + const ollamaBaseUrl = modelBaseUrl || providerBaseUrl || OLLAMA_NATIVE_BASE_URL; + activeSession.agent.streamFn = createOllamaStreamFn(ollamaBaseUrl); + } else { + // Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai. + activeSession.agent.streamFn = streamSimple; + } applyExtraParamsToAgent( activeSession.agent, diff --git a/src/config/types.models.ts b/src/config/types.models.ts index 11b6c64cbc4..3d993cbf610 100644 --- a/src/config/types.models.ts +++ b/src/config/types.models.ts @@ -4,7 +4,8 @@ export type ModelApi = | "anthropic-messages" | "google-generative-ai" | "github-copilot" - | "bedrock-converse-stream"; + | "bedrock-converse-stream" + | "ollama"; export type ModelCompatConfig = { supportsStore?: boolean; diff --git a/src/config/zod-schema.core.ts b/src/config/zod-schema.core.ts index b7da9208a7a..2da5c357cb6 100644 --- a/src/config/zod-schema.core.ts +++ b/src/config/zod-schema.core.ts @@ -9,6 +9,7 @@ export const ModelApiSchema = z.union([ z.literal("google-generative-ai"), z.literal("github-copilot"), z.literal("bedrock-converse-stream"), + z.literal("ollama"), ]); export const ModelCompatSchema = z