feat(ollama): add native /api/chat provider for streaming + tool calling (#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0a723f98e6
Co-authored-by: BrokenFinger98 <115936166+BrokenFinger98@users.noreply.github.com>
Co-authored-by: steipete <58493+steipete@users.noreply.github.com>
Reviewed-by: @steipete
This commit is contained in:
Sunwoo Yu
2026-02-14 09:20:42 +09:00
committed by GitHub
parent 5378583da1
commit 11702290ff
9 changed files with 760 additions and 75 deletions

View File

@@ -69,6 +69,7 @@ Docs: https://docs.openclaw.ai
- Config: keep legacy audio transcription migration strict by rejecting non-string/unsafe command tokens while still migrating valid custom script executables. (#5042) Thanks @shayan919293.
- Status/Sessions: stop clamping derived `totalTokens` to context-window size, keep prompt-token snapshots wired through session accounting, and surface context usage as unknown when fresh snapshot data is missing to avoid false 100% reports. (#15114) Thanks @echoVic.
- Providers/MiniMax: switch implicit MiniMax API-key provider from `openai-completions` to `anthropic-messages` with the correct Anthropic-compatible base URL, fixing `invalid role: developer (2013)` errors on MiniMax M2.5. (#15275) Thanks @lailoo.
- Ollama/Agents: use resolved model/provider base URLs for native `/api/chat` streaming (including aliased providers), normalize `/v1` endpoints, and forward abort + `maxTokens` stream options for reliable cancellation and token caps. (#11853) Thanks @BrokenFinger98.
## 2026.2.12

View File

@@ -8,7 +8,7 @@ title: "Ollama"
# Ollama
Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's OpenAI-compatible API and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.
Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's native API (`/api/chat`), supporting streaming and tool calling, and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.
## Quick start
@@ -101,10 +101,9 @@ Use explicit config when:
models: {
providers: {
ollama: {
// Use a host that includes /v1 for OpenAI-compatible APIs
baseUrl: "http://ollama-host:11434/v1",
baseUrl: "http://ollama-host:11434",
apiKey: "ollama-local",
api: "openai-completions",
api: "ollama",
models: [
{
id: "gpt-oss:20b",
@@ -134,7 +133,7 @@ If Ollama is running on a different host or port (explicit config disables auto-
providers: {
ollama: {
apiKey: "ollama-local",
baseUrl: "http://ollama-host:11434/v1",
baseUrl: "http://ollama-host:11434",
},
},
},
@@ -174,45 +173,28 @@ Ollama is free and runs locally, so all model costs are set to $0.
### Streaming Configuration
Due to a [known issue](https://github.com/badlogic/pi-mono/issues/1205) in the underlying SDK with Ollama's response format, **streaming is disabled by default** for Ollama models. This prevents corrupted responses when using tool-capable models.
OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.
When streaming is disabled, responses are delivered all at once (non-streaming mode), which avoids the issue where interleaved content/reasoning deltas cause garbled output.
#### Legacy OpenAI-Compatible Mode
#### Re-enable Streaming (Advanced)
If you want to re-enable streaming for Ollama (may cause issues with tool-capable models):
If you need to use the OpenAI-compatible endpoint instead (e.g., behind a proxy that only supports OpenAI format), set `api: "openai-completions"` explicitly:
```json5
{
agents: {
defaults: {
models: {
"ollama/gpt-oss:20b": {
streaming: true,
},
},
},
},
models: {
providers: {
ollama: {
baseUrl: "http://ollama-host:11434/v1",
api: "openai-completions",
apiKey: "ollama-local",
models: [...]
}
}
}
}
```
#### Disable Streaming for Other Providers
You can also disable streaming for any provider if needed:
```json5
{
agents: {
defaults: {
models: {
"openai/gpt-4": {
streaming: false,
},
},
},
},
}
```
Note: The OpenAI-compatible endpoint may not support streaming + tool calling simultaneously. You may need to disable streaming with `params: { streaming: false }` in model config.
### Context windows
@@ -261,15 +243,6 @@ ps aux | grep ollama
ollama serve
```
### Corrupted responses or tool names in output
If you see garbled responses containing tool names (like `sessions_send`, `memory_get`) or fragmented text when using Ollama models, this is due to an upstream SDK issue with streaming responses. **This is fixed by default** in the latest OpenClaw version by disabling streaming for Ollama models.
If you manually enabled streaming and experience this issue:
1. Remove the `streaming: true` configuration from your Ollama model entries, or
2. Explicitly set `streaming: false` for Ollama models (see [Streaming Configuration](#streaming-configuration))
## See Also
- [Model Providers](/concepts/model-providers) - Overview of all providers

View File

@@ -29,25 +29,20 @@ describe("Ollama provider", () => {
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
const providers = await resolveImplicitProviders({ agentDir });
// Ollama requires explicit configuration via OLLAMA_API_KEY env var or profile
expect(providers?.ollama).toBeUndefined();
});
it("should disable streaming by default for Ollama models", async () => {
it("should use native ollama api type", async () => {
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
process.env.OLLAMA_API_KEY = "test-key";
try {
const providers = await resolveImplicitProviders({ agentDir });
// Provider should be defined with OLLAMA_API_KEY set
expect(providers?.ollama).toBeDefined();
expect(providers?.ollama?.apiKey).toBe("OLLAMA_API_KEY");
// Note: discoverOllamaModels() returns empty array in test environments (VITEST env var check)
// so we can't test the actual model discovery here. The streaming: false setting
// is applied in the model mapping within discoverOllamaModels().
// The configuration structure itself is validated by TypeScript and the Zod schema.
expect(providers?.ollama?.api).toBe("ollama");
expect(providers?.ollama?.baseUrl).toBe("http://127.0.0.1:11434");
} finally {
delete process.env.OLLAMA_API_KEY;
}
@@ -69,15 +64,14 @@ describe("Ollama provider", () => {
},
});
expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434/v1");
// Native API strips /v1 suffix via resolveOllamaApiBase()
expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434");
} finally {
delete process.env.OLLAMA_API_KEY;
}
});
it("should have correct model structure with streaming disabled (unit test)", () => {
// This test directly verifies the model configuration structure
// since discoverOllamaModels() returns empty array in test mode
it("should have correct model structure without streaming override", () => {
const mockOllamaModel = {
id: "llama3.3:latest",
name: "llama3.3:latest",
@@ -86,13 +80,9 @@ describe("Ollama provider", () => {
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 8192,
params: {
streaming: false,
},
};
// Verify the model structure matches what discoverOllamaModels() would return
expect(mockOllamaModel.params?.streaming).toBe(false);
expect(mockOllamaModel.params).toHaveProperty("streaming");
// Native Ollama provider does not need streaming: false workaround
expect(mockOllamaModel).not.toHaveProperty("params");
});
});

View File

@@ -17,6 +17,7 @@ import {
buildHuggingfaceModelDefinition,
} from "./huggingface-models.js";
import { resolveAwsSdkEnvVarName, resolveEnvApiKey } from "./model-auth.js";
import { OLLAMA_NATIVE_BASE_URL } from "./ollama-stream.js";
import {
buildSyntheticModelDefinition,
SYNTHETIC_BASE_URL,
@@ -79,8 +80,8 @@ const QWEN_PORTAL_DEFAULT_COST = {
cacheWrite: 0,
};
const OLLAMA_BASE_URL = "http://127.0.0.1:11434/v1";
const OLLAMA_API_BASE_URL = "http://127.0.0.1:11434";
const OLLAMA_BASE_URL = OLLAMA_NATIVE_BASE_URL;
const OLLAMA_API_BASE_URL = OLLAMA_BASE_URL;
const OLLAMA_DEFAULT_CONTEXT_WINDOW = 128000;
const OLLAMA_DEFAULT_MAX_TOKENS = 8192;
const OLLAMA_DEFAULT_COST = {
@@ -180,11 +181,6 @@ async function discoverOllamaModels(baseUrl?: string): Promise<ModelDefinitionCo
cost: OLLAMA_DEFAULT_COST,
contextWindow: OLLAMA_DEFAULT_CONTEXT_WINDOW,
maxTokens: OLLAMA_DEFAULT_MAX_TOKENS,
// Disable streaming by default for Ollama to avoid SDK issue #1205
// See: https://github.com/badlogic/pi-mono/issues/1205
params: {
streaming: false,
},
};
});
} catch (error) {
@@ -541,8 +537,8 @@ async function buildVeniceProvider(): Promise<ProviderConfig> {
async function buildOllamaProvider(configuredBaseUrl?: string): Promise<ProviderConfig> {
const models = await discoverOllamaModels(configuredBaseUrl);
return {
baseUrl: configuredBaseUrl ?? OLLAMA_BASE_URL,
api: "openai-completions",
baseUrl: resolveOllamaApiBase(configuredBaseUrl),
api: "ollama",
models,
};
}

View File

@@ -0,0 +1,290 @@
import { describe, expect, it, vi } from "vitest";
import {
createOllamaStreamFn,
convertToOllamaMessages,
buildAssistantMessage,
parseNdjsonStream,
} from "./ollama-stream.js";
describe("convertToOllamaMessages", () => {
it("converts user text messages", () => {
const messages = [{ role: "user", content: "hello" }];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "user", content: "hello" }]);
});
it("converts user messages with content parts", () => {
const messages = [
{
role: "user",
content: [
{ type: "text", text: "describe this" },
{ type: "image", data: "base64data" },
],
},
];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "user", content: "describe this", images: ["base64data"] }]);
});
it("prepends system message when provided", () => {
const messages = [{ role: "user", content: "hello" }];
const result = convertToOllamaMessages(messages, "You are helpful.");
expect(result[0]).toEqual({ role: "system", content: "You are helpful." });
expect(result[1]).toEqual({ role: "user", content: "hello" });
});
it("converts assistant messages with toolCall content blocks", () => {
const messages = [
{
role: "assistant",
content: [
{ type: "text", text: "Let me check." },
{ type: "toolCall", id: "call_1", name: "bash", arguments: { command: "ls" } },
],
},
];
const result = convertToOllamaMessages(messages);
expect(result[0].role).toBe("assistant");
expect(result[0].content).toBe("Let me check.");
expect(result[0].tool_calls).toEqual([
{ function: { name: "bash", arguments: { command: "ls" } } },
]);
});
it("converts tool result messages with 'tool' role", () => {
const messages = [{ role: "tool", content: "file1.txt\nfile2.txt" }];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "tool", content: "file1.txt\nfile2.txt" }]);
});
it("converts SDK 'toolResult' role to Ollama 'tool' role", () => {
const messages = [{ role: "toolResult", content: "command output here" }];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "tool", content: "command output here" }]);
});
it("includes tool_name from SDK toolResult messages", () => {
const messages = [{ role: "toolResult", content: "file contents here", toolName: "read" }];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "tool", content: "file contents here", tool_name: "read" }]);
});
it("omits tool_name when not provided in toolResult", () => {
const messages = [{ role: "toolResult", content: "output" }];
const result = convertToOllamaMessages(messages);
expect(result).toEqual([{ role: "tool", content: "output" }]);
expect(result[0]).not.toHaveProperty("tool_name");
});
it("handles empty messages array", () => {
const result = convertToOllamaMessages([]);
expect(result).toEqual([]);
});
});
describe("buildAssistantMessage", () => {
const modelInfo = { api: "ollama", provider: "ollama", id: "qwen3:32b" };
it("builds text-only response", () => {
const response = {
model: "qwen3:32b",
created_at: "2026-01-01T00:00:00Z",
message: { role: "assistant" as const, content: "Hello!" },
done: true,
prompt_eval_count: 10,
eval_count: 5,
};
const result = buildAssistantMessage(response, modelInfo);
expect(result.role).toBe("assistant");
expect(result.content).toEqual([{ type: "text", text: "Hello!" }]);
expect(result.stopReason).toBe("stop");
expect(result.usage.input).toBe(10);
expect(result.usage.output).toBe(5);
expect(result.usage.totalTokens).toBe(15);
});
it("builds response with tool calls", () => {
const response = {
model: "qwen3:32b",
created_at: "2026-01-01T00:00:00Z",
message: {
role: "assistant" as const,
content: "",
tool_calls: [{ function: { name: "bash", arguments: { command: "ls -la" } } }],
},
done: true,
prompt_eval_count: 20,
eval_count: 10,
};
const result = buildAssistantMessage(response, modelInfo);
expect(result.stopReason).toBe("toolUse");
expect(result.content.length).toBe(1); // toolCall only (empty content is skipped)
expect(result.content[0].type).toBe("toolCall");
const toolCall = result.content[0] as {
type: "toolCall";
id: string;
name: string;
arguments: Record<string, unknown>;
};
expect(toolCall.name).toBe("bash");
expect(toolCall.arguments).toEqual({ command: "ls -la" });
expect(toolCall.id).toMatch(/^ollama_call_[0-9a-f-]{36}$/);
});
it("sets all costs to zero for local models", () => {
const response = {
model: "qwen3:32b",
created_at: "2026-01-01T00:00:00Z",
message: { role: "assistant" as const, content: "ok" },
done: true,
};
const result = buildAssistantMessage(response, modelInfo);
expect(result.usage.cost).toEqual({
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
total: 0,
});
});
});
// Helper: build a ReadableStreamDefaultReader from NDJSON lines
function mockNdjsonReader(lines: string[]): ReadableStreamDefaultReader<Uint8Array> {
const encoder = new TextEncoder();
const payload = lines.join("\n") + "\n";
let consumed = false;
return {
read: async () => {
if (consumed) {
return { done: true as const, value: undefined };
}
consumed = true;
return { done: false as const, value: encoder.encode(payload) };
},
releaseLock: () => {},
cancel: async () => {},
closed: Promise.resolve(undefined),
} as unknown as ReadableStreamDefaultReader<Uint8Array>;
}
describe("parseNdjsonStream", () => {
it("parses text-only streaming chunks", async () => {
const reader = mockNdjsonReader([
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"Hello"},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":" world"},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":5,"eval_count":2}',
]);
const chunks = [];
for await (const chunk of parseNdjsonStream(reader)) {
chunks.push(chunk);
}
expect(chunks).toHaveLength(3);
expect(chunks[0].message.content).toBe("Hello");
expect(chunks[1].message.content).toBe(" world");
expect(chunks[2].done).toBe(true);
});
it("parses tool_calls from intermediate chunk (not final)", async () => {
// Ollama sends tool_calls in done:false chunk, final done:true has no tool_calls
const reader = mockNdjsonReader([
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":10,"eval_count":5}',
]);
const chunks = [];
for await (const chunk of parseNdjsonStream(reader)) {
chunks.push(chunk);
}
expect(chunks).toHaveLength(2);
expect(chunks[0].done).toBe(false);
expect(chunks[0].message.tool_calls).toHaveLength(1);
expect(chunks[0].message.tool_calls![0].function.name).toBe("bash");
expect(chunks[1].done).toBe(true);
expect(chunks[1].message.tool_calls).toBeUndefined();
});
it("accumulates tool_calls across multiple intermediate chunks", async () => {
const reader = mockNdjsonReader([
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"read","arguments":{"path":"/tmp/a"}}}]},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true}',
]);
// Simulate the accumulation logic from createOllamaStreamFn
const accumulatedToolCalls: Array<{
function: { name: string; arguments: Record<string, unknown> };
}> = [];
const chunks = [];
for await (const chunk of parseNdjsonStream(reader)) {
chunks.push(chunk);
if (chunk.message?.tool_calls) {
accumulatedToolCalls.push(...chunk.message.tool_calls);
}
}
expect(accumulatedToolCalls).toHaveLength(2);
expect(accumulatedToolCalls[0].function.name).toBe("read");
expect(accumulatedToolCalls[1].function.name).toBe("bash");
// Final done:true chunk has no tool_calls
expect(chunks[2].message.tool_calls).toBeUndefined();
});
});
describe("createOllamaStreamFn", () => {
it("normalizes /v1 baseUrl and maps maxTokens + signal", async () => {
const originalFetch = globalThis.fetch;
const fetchMock = vi.fn(async () => {
const payload = [
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":1,"eval_count":1}',
].join("\n");
return new Response(`${payload}\n`, {
status: 200,
headers: { "Content-Type": "application/x-ndjson" },
});
});
globalThis.fetch = fetchMock as unknown as typeof fetch;
try {
const streamFn = createOllamaStreamFn("http://ollama-host:11434/v1/");
const signal = new AbortController().signal;
const stream = streamFn(
{
id: "qwen3:32b",
api: "ollama",
provider: "custom-ollama",
contextWindow: 131072,
} as unknown as Parameters<typeof streamFn>[0],
{
messages: [{ role: "user", content: "hello" }],
} as unknown as Parameters<typeof streamFn>[1],
{
maxTokens: 123,
signal,
} as unknown as Parameters<typeof streamFn>[2],
);
const events = [];
for await (const event of stream) {
events.push(event);
}
expect(events.at(-1)?.type).toBe("done");
expect(fetchMock).toHaveBeenCalledTimes(1);
const [url, requestInit] = fetchMock.mock.calls[0] as [string, RequestInit];
expect(url).toBe("http://ollama-host:11434/api/chat");
expect(requestInit.signal).toBe(signal);
if (typeof requestInit.body !== "string") {
throw new Error("Expected string request body");
}
const requestBody = JSON.parse(requestInit.body) as {
options: { num_ctx?: number; num_predict?: number };
};
expect(requestBody.options.num_ctx).toBe(131072);
expect(requestBody.options.num_predict).toBe(123);
} finally {
globalThis.fetch = originalFetch;
}
});
});

419
src/agents/ollama-stream.ts Normal file
View File

@@ -0,0 +1,419 @@
import type { StreamFn } from "@mariozechner/pi-agent-core";
import type {
AssistantMessage,
StopReason,
TextContent,
ToolCall,
Tool,
Usage,
} from "@mariozechner/pi-ai";
import { createAssistantMessageEventStream } from "@mariozechner/pi-ai";
import { randomUUID } from "node:crypto";
export const OLLAMA_NATIVE_BASE_URL = "http://127.0.0.1:11434";
// ── Ollama /api/chat request types ──────────────────────────────────────────
interface OllamaChatRequest {
model: string;
messages: OllamaChatMessage[];
stream: boolean;
tools?: OllamaTool[];
options?: Record<string, unknown>;
}
interface OllamaChatMessage {
role: "system" | "user" | "assistant" | "tool";
content: string;
images?: string[];
tool_calls?: OllamaToolCall[];
tool_name?: string;
}
interface OllamaTool {
type: "function";
function: {
name: string;
description: string;
parameters: Record<string, unknown>;
};
}
interface OllamaToolCall {
function: {
name: string;
arguments: Record<string, unknown>;
};
}
// ── Ollama /api/chat response types ─────────────────────────────────────────
interface OllamaChatResponse {
model: string;
created_at: string;
message: {
role: "assistant";
content: string;
tool_calls?: OllamaToolCall[];
};
done: boolean;
done_reason?: string;
total_duration?: number;
load_duration?: number;
prompt_eval_count?: number;
prompt_eval_duration?: number;
eval_count?: number;
eval_duration?: number;
}
// ── Message conversion ──────────────────────────────────────────────────────
type InputContentPart =
| { type: "text"; text: string }
| { type: "image"; data: string }
| { type: "toolCall"; id: string; name: string; arguments: Record<string, unknown> }
| { type: "tool_use"; id: string; name: string; input: Record<string, unknown> };
function extractTextContent(content: unknown): string {
if (typeof content === "string") {
return content;
}
if (!Array.isArray(content)) {
return "";
}
return (content as InputContentPart[])
.filter((part): part is { type: "text"; text: string } => part.type === "text")
.map((part) => part.text)
.join("");
}
function extractOllamaImages(content: unknown): string[] {
if (!Array.isArray(content)) {
return [];
}
return (content as InputContentPart[])
.filter((part): part is { type: "image"; data: string } => part.type === "image")
.map((part) => part.data);
}
function extractToolCalls(content: unknown): OllamaToolCall[] {
if (!Array.isArray(content)) {
return [];
}
const parts = content as InputContentPart[];
const result: OllamaToolCall[] = [];
for (const part of parts) {
if (part.type === "toolCall") {
result.push({ function: { name: part.name, arguments: part.arguments } });
} else if (part.type === "tool_use") {
result.push({ function: { name: part.name, arguments: part.input } });
}
}
return result;
}
export function convertToOllamaMessages(
messages: Array<{ role: string; content: unknown }>,
system?: string,
): OllamaChatMessage[] {
const result: OllamaChatMessage[] = [];
if (system) {
result.push({ role: "system", content: system });
}
for (const msg of messages) {
const { role } = msg;
if (role === "user") {
const text = extractTextContent(msg.content);
const images = extractOllamaImages(msg.content);
result.push({
role: "user",
content: text,
...(images.length > 0 ? { images } : {}),
});
} else if (role === "assistant") {
const text = extractTextContent(msg.content);
const toolCalls = extractToolCalls(msg.content);
result.push({
role: "assistant",
content: text,
...(toolCalls.length > 0 ? { tool_calls: toolCalls } : {}),
});
} else if (role === "tool" || role === "toolResult") {
// SDK uses "toolResult" (camelCase) for tool result messages.
// Ollama API expects "tool" role with tool_name per the native spec.
const text = extractTextContent(msg.content);
const toolName =
typeof (msg as { toolName?: unknown }).toolName === "string"
? (msg as { toolName?: string }).toolName
: undefined;
result.push({
role: "tool",
content: text,
...(toolName ? { tool_name: toolName } : {}),
});
}
}
return result;
}
// ── Tool extraction ─────────────────────────────────────────────────────────
function extractOllamaTools(tools: Tool[] | undefined): OllamaTool[] {
if (!tools || !Array.isArray(tools)) {
return [];
}
const result: OllamaTool[] = [];
for (const tool of tools) {
if (typeof tool.name !== "string" || !tool.name) {
continue;
}
result.push({
type: "function",
function: {
name: tool.name,
description: typeof tool.description === "string" ? tool.description : "",
parameters: (tool.parameters ?? {}) as Record<string, unknown>,
},
});
}
return result;
}
// ── Response conversion ─────────────────────────────────────────────────────
export function buildAssistantMessage(
response: OllamaChatResponse,
modelInfo: { api: string; provider: string; id: string },
): AssistantMessage {
const content: (TextContent | ToolCall)[] = [];
if (response.message.content) {
content.push({ type: "text", text: response.message.content });
}
const toolCalls = response.message.tool_calls;
if (toolCalls && toolCalls.length > 0) {
for (const tc of toolCalls) {
content.push({
type: "toolCall",
id: `ollama_call_${randomUUID()}`,
name: tc.function.name,
arguments: tc.function.arguments,
});
}
}
const hasToolCalls = toolCalls && toolCalls.length > 0;
const stopReason: StopReason = hasToolCalls ? "toolUse" : "stop";
const usage: Usage = {
input: response.prompt_eval_count ?? 0,
output: response.eval_count ?? 0,
cacheRead: 0,
cacheWrite: 0,
totalTokens: (response.prompt_eval_count ?? 0) + (response.eval_count ?? 0),
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
};
return {
role: "assistant",
content,
stopReason,
api: modelInfo.api,
provider: modelInfo.provider,
model: modelInfo.id,
usage,
timestamp: Date.now(),
};
}
// ── NDJSON streaming parser ─────────────────────────────────────────────────
export async function* parseNdjsonStream(
reader: ReadableStreamDefaultReader<Uint8Array>,
): AsyncGenerator<OllamaChatResponse> {
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) {
break;
}
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
const trimmed = line.trim();
if (!trimmed) {
continue;
}
try {
yield JSON.parse(trimmed) as OllamaChatResponse;
} catch {
console.warn("[ollama-stream] Skipping malformed NDJSON line:", trimmed.slice(0, 120));
}
}
}
if (buffer.trim()) {
try {
yield JSON.parse(buffer.trim()) as OllamaChatResponse;
} catch {
console.warn(
"[ollama-stream] Skipping malformed trailing data:",
buffer.trim().slice(0, 120),
);
}
}
}
// ── Main StreamFn factory ───────────────────────────────────────────────────
function resolveOllamaChatUrl(baseUrl: string): string {
const trimmed = baseUrl.trim().replace(/\/+$/, "");
const normalizedBase = trimmed.replace(/\/v1$/i, "");
const apiBase = normalizedBase || OLLAMA_NATIVE_BASE_URL;
return `${apiBase}/api/chat`;
}
export function createOllamaStreamFn(baseUrl: string): StreamFn {
const chatUrl = resolveOllamaChatUrl(baseUrl);
return (model, context, options) => {
const stream = createAssistantMessageEventStream();
const run = async () => {
try {
const ollamaMessages = convertToOllamaMessages(
context.messages ?? [],
context.systemPrompt,
);
const ollamaTools = extractOllamaTools(context.tools);
// Ollama defaults to num_ctx=4096 which is too small for large
// system prompts + many tool definitions. Use model's contextWindow.
const ollamaOptions: Record<string, unknown> = { num_ctx: model.contextWindow ?? 65536 };
if (typeof options?.temperature === "number") {
ollamaOptions.temperature = options.temperature;
}
if (typeof options?.maxTokens === "number") {
ollamaOptions.num_predict = options.maxTokens;
}
const body: OllamaChatRequest = {
model: model.id,
messages: ollamaMessages,
stream: true,
...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
options: ollamaOptions,
};
const headers: Record<string, string> = {
"Content-Type": "application/json",
...options?.headers,
};
if (options?.apiKey) {
headers.Authorization = `Bearer ${options.apiKey}`;
}
const response = await fetch(chatUrl, {
method: "POST",
headers,
body: JSON.stringify(body),
signal: options?.signal,
});
if (!response.ok) {
const errorText = await response.text().catch(() => "unknown error");
throw new Error(`Ollama API error ${response.status}: ${errorText}`);
}
if (!response.body) {
throw new Error("Ollama API returned empty response body");
}
const reader = response.body.getReader();
let accumulatedContent = "";
const accumulatedToolCalls: OllamaToolCall[] = [];
let finalResponse: OllamaChatResponse | undefined;
for await (const chunk of parseNdjsonStream(reader)) {
if (chunk.message?.content) {
accumulatedContent += chunk.message.content;
}
// Ollama sends tool_calls in intermediate (done:false) chunks,
// NOT in the final done:true chunk. Collect from all chunks.
if (chunk.message?.tool_calls) {
accumulatedToolCalls.push(...chunk.message.tool_calls);
}
if (chunk.done) {
finalResponse = chunk;
break;
}
}
if (!finalResponse) {
throw new Error("Ollama API stream ended without a final response");
}
finalResponse.message.content = accumulatedContent;
if (accumulatedToolCalls.length > 0) {
finalResponse.message.tool_calls = accumulatedToolCalls;
}
const assistantMessage = buildAssistantMessage(finalResponse, {
api: model.api,
provider: model.provider,
id: model.id,
});
const reason: Extract<StopReason, "stop" | "length" | "toolUse"> =
assistantMessage.stopReason === "toolUse" ? "toolUse" : "stop";
stream.push({
type: "done",
reason,
message: assistantMessage,
});
} catch (err) {
const errorMessage = err instanceof Error ? err.message : String(err);
stream.push({
type: "error",
reason: "error",
error: {
role: "assistant" as const,
content: [],
stopReason: "error" as StopReason,
errorMessage,
api: model.api,
provider: model.provider,
model: model.id,
usage: {
input: 0,
output: 0,
cacheRead: 0,
cacheWrite: 0,
totalTokens: 0,
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
},
timestamp: Date.now(),
},
});
} finally {
stream.end();
}
};
queueMicrotask(() => void run());
return stream;
};
}

View File

@@ -31,6 +31,7 @@ import { resolveOpenClawDocsPath } from "../../docs-path.js";
import { isTimeoutError } from "../../failover-error.js";
import { resolveModelAuthMode } from "../../model-auth.js";
import { resolveDefaultModelForAgent } from "../../model-selection.js";
import { createOllamaStreamFn, OLLAMA_NATIVE_BASE_URL } from "../../ollama-stream.js";
import {
isCloudCodeAssistFormatError,
resolveBootstrapMaxChars,
@@ -584,8 +585,21 @@ export async function runEmbeddedAttempt(
workspaceDir: params.workspaceDir,
});
// Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
activeSession.agent.streamFn = streamSimple;
// Ollama native API: bypass SDK's streamSimple and use direct /api/chat calls
// for reliable streaming + tool calling support (#11828).
if (params.model.api === "ollama") {
// Use the resolved model baseUrl first so custom provider aliases work.
const providerConfig = params.config?.models?.providers?.[params.model.provider];
const modelBaseUrl =
typeof params.model.baseUrl === "string" ? params.model.baseUrl.trim() : "";
const providerBaseUrl =
typeof providerConfig?.baseUrl === "string" ? providerConfig.baseUrl.trim() : "";
const ollamaBaseUrl = modelBaseUrl || providerBaseUrl || OLLAMA_NATIVE_BASE_URL;
activeSession.agent.streamFn = createOllamaStreamFn(ollamaBaseUrl);
} else {
// Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
activeSession.agent.streamFn = streamSimple;
}
applyExtraParamsToAgent(
activeSession.agent,

View File

@@ -4,7 +4,8 @@ export type ModelApi =
| "anthropic-messages"
| "google-generative-ai"
| "github-copilot"
| "bedrock-converse-stream";
| "bedrock-converse-stream"
| "ollama";
export type ModelCompatConfig = {
supportsStore?: boolean;

View File

@@ -9,6 +9,7 @@ export const ModelApiSchema = z.union([
z.literal("google-generative-ai"),
z.literal("github-copilot"),
z.literal("bedrock-converse-stream"),
z.literal("ollama"),
]);
export const ModelCompatSchema = z