feat(ollama): add native /api/chat provider for streaming + tool calling (#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 0a723f98e6 Co-authored-by: BrokenFinger98 <115936166+BrokenFinger98@users.noreply.github.com> Co-authored-by: steipete <58493+steipete@users.noreply.github.com> Reviewed-by: @steipete
2026-03-07 22:44:16 +00:00 · 2026-02-14 09:20:42 +09:00
parent 5378583da1
commit 11702290ff
9 changed files with 760 additions and 75 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -69,6 +69,7 @@ Docs: https://docs.openclaw.ai
 - Config: keep legacy audio transcription migration strict by rejecting non-string/unsafe command tokens while still migrating valid custom script executables. (#5042) Thanks @shayan919293.
 - Status/Sessions: stop clamping derived `totalTokens` to context-window size, keep prompt-token snapshots wired through session accounting, and surface context usage as unknown when fresh snapshot data is missing to avoid false 100% reports. (#15114) Thanks @echoVic.
 - Providers/MiniMax: switch implicit MiniMax API-key provider from `openai-completions` to `anthropic-messages` with the correct Anthropic-compatible base URL, fixing `invalid role: developer (2013)` errors on MiniMax M2.5. (#15275) Thanks @lailoo.
+- Ollama/Agents: use resolved model/provider base URLs for native `/api/chat` streaming (including aliased providers), normalize `/v1` endpoints, and forward abort + `maxTokens` stream options for reliable cancellation and token caps. (#11853) Thanks @BrokenFinger98.

 ## 2026.2.12

--- a/docs/providers/ollama.md
+++ b/docs/providers/ollama.md
@@ -8,7 +8,7 @@ title: "Ollama"

 # Ollama

-Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's OpenAI-compatible API and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.
+Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's native API (`/api/chat`), supporting streaming and tool calling, and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.

 ## Quick start

@@ -101,10 +101,9 @@ Use explicit config when:
  models: {
    providers: {
      ollama: {
-        // Use a host that includes /v1 for OpenAI-compatible APIs
-        baseUrl: "http://ollama-host:11434/v1",
+        baseUrl: "http://ollama-host:11434",
        apiKey: "ollama-local",
-        api: "openai-completions",
+        api: "ollama",
        models: [
          {
            id: "gpt-oss:20b",
@@ -134,7 +133,7 @@ If Ollama is running on a different host or port (explicit config disables auto-
    providers: {
      ollama: {
        apiKey: "ollama-local",
-        baseUrl: "http://ollama-host:11434/v1",
+        baseUrl: "http://ollama-host:11434",
      },
    },
  },
@@ -174,45 +173,28 @@ Ollama is free and runs locally, so all model costs are set to $0.

 ### Streaming Configuration

-Due to a [known issue](https://github.com/badlogic/pi-mono/issues/1205) in the underlying SDK with Ollama's response format, **streaming is disabled by default** for Ollama models. This prevents corrupted responses when using tool-capable models.
+OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.

-When streaming is disabled, responses are delivered all at once (non-streaming mode), which avoids the issue where interleaved content/reasoning deltas cause garbled output.
+#### Legacy OpenAI-Compatible Mode

-#### Re-enable Streaming (Advanced)
-
-If you want to re-enable streaming for Ollama (may cause issues with tool-capable models):
+If you need to use the OpenAI-compatible endpoint instead (e.g., behind a proxy that only supports OpenAI format), set `api: "openai-completions"` explicitly:

 ```json5
 {
-  agents: {
-    defaults: {
-      models: {
-        "ollama/gpt-oss:20b": {
-          streaming: true,
-        },
-      },
-    },
-  },
+  models: {
+    providers: {
+      ollama: {
+        baseUrl: "http://ollama-host:11434/v1",
+        api: "openai-completions",
+        apiKey: "ollama-local",
+        models: [...]
+      }
+    }
+  }
 }
 ```

-#### Disable Streaming for Other Providers
-
-You can also disable streaming for any provider if needed:
-
-```json5
-{
-  agents: {
-    defaults: {
-      models: {
-        "openai/gpt-4": {
-          streaming: false,
-        },
-      },
-    },
-  },
-}
-```
+Note: The OpenAI-compatible endpoint may not support streaming + tool calling simultaneously. You may need to disable streaming with `params: { streaming: false }` in model config.

 ### Context windows

@@ -261,15 +243,6 @@ ps aux | grep ollama
 ollama serve
 ```

-### Corrupted responses or tool names in output
-
-If you see garbled responses containing tool names (like `sessions_send`, `memory_get`) or fragmented text when using Ollama models, this is due to an upstream SDK issue with streaming responses. **This is fixed by default** in the latest OpenClaw version by disabling streaming for Ollama models.
-
-If you manually enabled streaming and experience this issue:
-
-1. Remove the `streaming: true` configuration from your Ollama model entries, or
-2. Explicitly set `streaming: false` for Ollama models (see [Streaming Configuration](#streaming-configuration))
-
 ## See Also

 - [Model Providers](/concepts/model-providers) - Overview of all providers
--- a/src/agents/models-config.providers.ollama.e2e.test.ts
+++ b/src/agents/models-config.providers.ollama.e2e.test.ts
@@ -29,25 +29,20 @@ describe("Ollama provider", () => {
    const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
    const providers = await resolveImplicitProviders({ agentDir });

-    // Ollama requires explicit configuration via OLLAMA_API_KEY env var or profile
    expect(providers?.ollama).toBeUndefined();
  });

-  it("should disable streaming by default for Ollama models", async () => {
+  it("should use native ollama api type", async () => {
    const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
    process.env.OLLAMA_API_KEY = "test-key";

    try {
      const providers = await resolveImplicitProviders({ agentDir });

-      // Provider should be defined with OLLAMA_API_KEY set
      expect(providers?.ollama).toBeDefined();
      expect(providers?.ollama?.apiKey).toBe("OLLAMA_API_KEY");
-
-      // Note: discoverOllamaModels() returns empty array in test environments (VITEST env var check)
-      // so we can't test the actual model discovery here. The streaming: false setting
-      // is applied in the model mapping within discoverOllamaModels().
-      // The configuration structure itself is validated by TypeScript and the Zod schema.
+      expect(providers?.ollama?.api).toBe("ollama");
+      expect(providers?.ollama?.baseUrl).toBe("http://127.0.0.1:11434");
    } finally {
      delete process.env.OLLAMA_API_KEY;
    }
@@ -69,15 +64,14 @@ describe("Ollama provider", () => {
        },
      });

-      expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434/v1");
+      // Native API strips /v1 suffix via resolveOllamaApiBase()
+      expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434");
    } finally {
      delete process.env.OLLAMA_API_KEY;
    }
  });

-  it("should have correct model structure with streaming disabled (unit test)", () => {
-    // This test directly verifies the model configuration structure
-    // since discoverOllamaModels() returns empty array in test mode
+  it("should have correct model structure without streaming override", () => {
    const mockOllamaModel = {
      id: "llama3.3:latest",
      name: "llama3.3:latest",
@@ -86,13 +80,9 @@ describe("Ollama provider", () => {
      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
      contextWindow: 128000,
      maxTokens: 8192,
-      params: {
-        streaming: false,
-      },
    };

-    // Verify the model structure matches what discoverOllamaModels() would return
-    expect(mockOllamaModel.params?.streaming).toBe(false);
-    expect(mockOllamaModel.params).toHaveProperty("streaming");
+    // Native Ollama provider does not need streaming: false workaround
+    expect(mockOllamaModel).not.toHaveProperty("params");
  });
 });
--- a/src/agents/models-config.providers.ts
+++ b/src/agents/models-config.providers.ts
@@ -17,6 +17,7 @@ import {
  buildHuggingfaceModelDefinition,
 } from "./huggingface-models.js";
 import { resolveAwsSdkEnvVarName, resolveEnvApiKey } from "./model-auth.js";
+import { OLLAMA_NATIVE_BASE_URL } from "./ollama-stream.js";
 import {
  buildSyntheticModelDefinition,
  SYNTHETIC_BASE_URL,
@@ -79,8 +80,8 @@ const QWEN_PORTAL_DEFAULT_COST = {
  cacheWrite: 0,
 };

-const OLLAMA_BASE_URL = "http://127.0.0.1:11434/v1";
-const OLLAMA_API_BASE_URL = "http://127.0.0.1:11434";
+const OLLAMA_BASE_URL = OLLAMA_NATIVE_BASE_URL;
+const OLLAMA_API_BASE_URL = OLLAMA_BASE_URL;
 const OLLAMA_DEFAULT_CONTEXT_WINDOW = 128000;
 const OLLAMA_DEFAULT_MAX_TOKENS = 8192;
 const OLLAMA_DEFAULT_COST = {
@@ -180,11 +181,6 @@ async function discoverOllamaModels(baseUrl?: string): Promise<ModelDefinitionCo
        cost: OLLAMA_DEFAULT_COST,
        contextWindow: OLLAMA_DEFAULT_CONTEXT_WINDOW,
        maxTokens: OLLAMA_DEFAULT_MAX_TOKENS,
-        // Disable streaming by default for Ollama to avoid SDK issue #1205
-        // See: https://github.com/badlogic/pi-mono/issues/1205
-        params: {
-          streaming: false,
-        },
      };
    });
  } catch (error) {
@@ -541,8 +537,8 @@ async function buildVeniceProvider(): Promise<ProviderConfig> {
 async function buildOllamaProvider(configuredBaseUrl?: string): Promise<ProviderConfig> {
  const models = await discoverOllamaModels(configuredBaseUrl);
  return {
-    baseUrl: configuredBaseUrl ?? OLLAMA_BASE_URL,
-    api: "openai-completions",
+    baseUrl: resolveOllamaApiBase(configuredBaseUrl),
+    api: "ollama",
    models,
  };
 }
--- a/src/agents/ollama-stream.test.ts
+++ b/src/agents/ollama-stream.test.ts
@@ -0,0 +1,290 @@
+import { describe, expect, it, vi } from "vitest";
+import {
+  createOllamaStreamFn,
+  convertToOllamaMessages,
+  buildAssistantMessage,
+  parseNdjsonStream,
+} from "./ollama-stream.js";
+
+describe("convertToOllamaMessages", () => {
+  it("converts user text messages", () => {
+    const messages = [{ role: "user", content: "hello" }];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "user", content: "hello" }]);
+  });
+
+  it("converts user messages with content parts", () => {
+    const messages = [
+      {
+        role: "user",
+        content: [
+          { type: "text", text: "describe this" },
+          { type: "image", data: "base64data" },
+        ],
+      },
+    ];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "user", content: "describe this", images: ["base64data"] }]);
+  });
+
+  it("prepends system message when provided", () => {
+    const messages = [{ role: "user", content: "hello" }];
+    const result = convertToOllamaMessages(messages, "You are helpful.");
+    expect(result[0]).toEqual({ role: "system", content: "You are helpful." });
+    expect(result[1]).toEqual({ role: "user", content: "hello" });
+  });
+
+  it("converts assistant messages with toolCall content blocks", () => {
+    const messages = [
+      {
+        role: "assistant",
+        content: [
+          { type: "text", text: "Let me check." },
+          { type: "toolCall", id: "call_1", name: "bash", arguments: { command: "ls" } },
+        ],
+      },
+    ];
+    const result = convertToOllamaMessages(messages);
+    expect(result[0].role).toBe("assistant");
+    expect(result[0].content).toBe("Let me check.");
+    expect(result[0].tool_calls).toEqual([
+      { function: { name: "bash", arguments: { command: "ls" } } },
+    ]);
+  });
+
+  it("converts tool result messages with 'tool' role", () => {
+    const messages = [{ role: "tool", content: "file1.txt\nfile2.txt" }];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "tool", content: "file1.txt\nfile2.txt" }]);
+  });
+
+  it("converts SDK 'toolResult' role to Ollama 'tool' role", () => {
+    const messages = [{ role: "toolResult", content: "command output here" }];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "tool", content: "command output here" }]);
+  });
+
+  it("includes tool_name from SDK toolResult messages", () => {
+    const messages = [{ role: "toolResult", content: "file contents here", toolName: "read" }];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "tool", content: "file contents here", tool_name: "read" }]);
+  });
+
+  it("omits tool_name when not provided in toolResult", () => {
+    const messages = [{ role: "toolResult", content: "output" }];
+    const result = convertToOllamaMessages(messages);
+    expect(result).toEqual([{ role: "tool", content: "output" }]);
+    expect(result[0]).not.toHaveProperty("tool_name");
+  });
+
+  it("handles empty messages array", () => {
+    const result = convertToOllamaMessages([]);
+    expect(result).toEqual([]);
+  });
+});
+
+describe("buildAssistantMessage", () => {
+  const modelInfo = { api: "ollama", provider: "ollama", id: "qwen3:32b" };
+
+  it("builds text-only response", () => {
+    const response = {
+      model: "qwen3:32b",
+      created_at: "2026-01-01T00:00:00Z",
+      message: { role: "assistant" as const, content: "Hello!" },
+      done: true,
+      prompt_eval_count: 10,
+      eval_count: 5,
+    };
+    const result = buildAssistantMessage(response, modelInfo);
+    expect(result.role).toBe("assistant");
+    expect(result.content).toEqual([{ type: "text", text: "Hello!" }]);
+    expect(result.stopReason).toBe("stop");
+    expect(result.usage.input).toBe(10);
+    expect(result.usage.output).toBe(5);
+    expect(result.usage.totalTokens).toBe(15);
+  });
+
+  it("builds response with tool calls", () => {
+    const response = {
+      model: "qwen3:32b",
+      created_at: "2026-01-01T00:00:00Z",
+      message: {
+        role: "assistant" as const,
+        content: "",
+        tool_calls: [{ function: { name: "bash", arguments: { command: "ls -la" } } }],
+      },
+      done: true,
+      prompt_eval_count: 20,
+      eval_count: 10,
+    };
+    const result = buildAssistantMessage(response, modelInfo);
+    expect(result.stopReason).toBe("toolUse");
+    expect(result.content.length).toBe(1); // toolCall only (empty content is skipped)
+    expect(result.content[0].type).toBe("toolCall");
+    const toolCall = result.content[0] as {
+      type: "toolCall";
+      id: string;
+      name: string;
+      arguments: Record<string, unknown>;
+    };
+    expect(toolCall.name).toBe("bash");
+    expect(toolCall.arguments).toEqual({ command: "ls -la" });
+    expect(toolCall.id).toMatch(/^ollama_call_[0-9a-f-]{36}$/);
+  });
+
+  it("sets all costs to zero for local models", () => {
+    const response = {
+      model: "qwen3:32b",
+      created_at: "2026-01-01T00:00:00Z",
+      message: { role: "assistant" as const, content: "ok" },
+      done: true,
+    };
+    const result = buildAssistantMessage(response, modelInfo);
+    expect(result.usage.cost).toEqual({
+      input: 0,
+      output: 0,
+      cacheRead: 0,
+      cacheWrite: 0,
+      total: 0,
+    });
+  });
+});
+
+// Helper: build a ReadableStreamDefaultReader from NDJSON lines
+function mockNdjsonReader(lines: string[]): ReadableStreamDefaultReader<Uint8Array> {
+  const encoder = new TextEncoder();
+  const payload = lines.join("\n") + "\n";
+  let consumed = false;
+  return {
+    read: async () => {
+      if (consumed) {
+        return { done: true as const, value: undefined };
+      }
+      consumed = true;
+      return { done: false as const, value: encoder.encode(payload) };
+    },
+    releaseLock: () => {},
+    cancel: async () => {},
+    closed: Promise.resolve(undefined),
+  } as unknown as ReadableStreamDefaultReader<Uint8Array>;
+}
+
+describe("parseNdjsonStream", () => {
+  it("parses text-only streaming chunks", async () => {
+    const reader = mockNdjsonReader([
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":"Hello"},"done":false}',
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":" world"},"done":false}',
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":5,"eval_count":2}',
+    ]);
+    const chunks = [];
+    for await (const chunk of parseNdjsonStream(reader)) {
+      chunks.push(chunk);
+    }
+    expect(chunks).toHaveLength(3);
+    expect(chunks[0].message.content).toBe("Hello");
+    expect(chunks[1].message.content).toBe(" world");
+    expect(chunks[2].done).toBe(true);
+  });
+
+  it("parses tool_calls from intermediate chunk (not final)", async () => {
+    // Ollama sends tool_calls in done:false chunk, final done:true has no tool_calls
+    const reader = mockNdjsonReader([
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":10,"eval_count":5}',
+    ]);
+    const chunks = [];
+    for await (const chunk of parseNdjsonStream(reader)) {
+      chunks.push(chunk);
+    }
+    expect(chunks).toHaveLength(2);
+    expect(chunks[0].done).toBe(false);
+    expect(chunks[0].message.tool_calls).toHaveLength(1);
+    expect(chunks[0].message.tool_calls![0].function.name).toBe("bash");
+    expect(chunks[1].done).toBe(true);
+    expect(chunks[1].message.tool_calls).toBeUndefined();
+  });
+
+  it("accumulates tool_calls across multiple intermediate chunks", async () => {
+    const reader = mockNdjsonReader([
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"read","arguments":{"path":"/tmp/a"}}}]},"done":false}',
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
+      '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true}',
+    ]);
+
+    // Simulate the accumulation logic from createOllamaStreamFn
+    const accumulatedToolCalls: Array<{
+      function: { name: string; arguments: Record<string, unknown> };
+    }> = [];
+    const chunks = [];
+    for await (const chunk of parseNdjsonStream(reader)) {
+      chunks.push(chunk);
+      if (chunk.message?.tool_calls) {
+        accumulatedToolCalls.push(...chunk.message.tool_calls);
+      }
+    }
+    expect(accumulatedToolCalls).toHaveLength(2);
+    expect(accumulatedToolCalls[0].function.name).toBe("read");
+    expect(accumulatedToolCalls[1].function.name).toBe("bash");
+    // Final done:true chunk has no tool_calls
+    expect(chunks[2].message.tool_calls).toBeUndefined();
+  });
+});
+
+describe("createOllamaStreamFn", () => {
+  it("normalizes /v1 baseUrl and maps maxTokens + signal", async () => {
+    const originalFetch = globalThis.fetch;
+    const fetchMock = vi.fn(async () => {
+      const payload = [
+        '{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}',
+        '{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":1,"eval_count":1}',
+      ].join("\n");
+      return new Response(`${payload}\n`, {
+        status: 200,
+        headers: { "Content-Type": "application/x-ndjson" },
+      });
+    });
+    globalThis.fetch = fetchMock as unknown as typeof fetch;
+
+    try {
+      const streamFn = createOllamaStreamFn("http://ollama-host:11434/v1/");
+      const signal = new AbortController().signal;
+      const stream = streamFn(
+        {
+          id: "qwen3:32b",
+          api: "ollama",
+          provider: "custom-ollama",
+          contextWindow: 131072,
+        } as unknown as Parameters<typeof streamFn>[0],
+        {
+          messages: [{ role: "user", content: "hello" }],
+        } as unknown as Parameters<typeof streamFn>[1],
+        {
+          maxTokens: 123,
+          signal,
+        } as unknown as Parameters<typeof streamFn>[2],
+      );
+
+      const events = [];
+      for await (const event of stream) {
+        events.push(event);
+      }
+      expect(events.at(-1)?.type).toBe("done");
+
+      expect(fetchMock).toHaveBeenCalledTimes(1);
+      const [url, requestInit] = fetchMock.mock.calls[0] as [string, RequestInit];
+      expect(url).toBe("http://ollama-host:11434/api/chat");
+      expect(requestInit.signal).toBe(signal);
+      if (typeof requestInit.body !== "string") {
+        throw new Error("Expected string request body");
+      }
+
+      const requestBody = JSON.parse(requestInit.body) as {
+        options: { num_ctx?: number; num_predict?: number };
+      };
+      expect(requestBody.options.num_ctx).toBe(131072);
+      expect(requestBody.options.num_predict).toBe(123);
+    } finally {
+      globalThis.fetch = originalFetch;
+    }
+  });
+});
--- a/src/agents/ollama-stream.ts
+++ b/src/agents/ollama-stream.ts
@@ -0,0 +1,419 @@
+import type { StreamFn } from "@mariozechner/pi-agent-core";
+import type {
+  AssistantMessage,
+  StopReason,
+  TextContent,
+  ToolCall,
+  Tool,
+  Usage,
+} from "@mariozechner/pi-ai";
+import { createAssistantMessageEventStream } from "@mariozechner/pi-ai";
+import { randomUUID } from "node:crypto";
+
+export const OLLAMA_NATIVE_BASE_URL = "http://127.0.0.1:11434";
+
+// ── Ollama /api/chat request types ──────────────────────────────────────────
+
+interface OllamaChatRequest {
+  model: string;
+  messages: OllamaChatMessage[];
+  stream: boolean;
+  tools?: OllamaTool[];
+  options?: Record<string, unknown>;
+}
+
+interface OllamaChatMessage {
+  role: "system" | "user" | "assistant" | "tool";
+  content: string;
+  images?: string[];
+  tool_calls?: OllamaToolCall[];
+  tool_name?: string;
+}
+
+interface OllamaTool {
+  type: "function";
+  function: {
+    name: string;
+    description: string;
+    parameters: Record<string, unknown>;
+  };
+}
+
+interface OllamaToolCall {
+  function: {
+    name: string;
+    arguments: Record<string, unknown>;
+  };
+}
+
+// ── Ollama /api/chat response types ─────────────────────────────────────────
+
+interface OllamaChatResponse {
+  model: string;
+  created_at: string;
+  message: {
+    role: "assistant";
+    content: string;
+    tool_calls?: OllamaToolCall[];
+  };
+  done: boolean;
+  done_reason?: string;
+  total_duration?: number;
+  load_duration?: number;
+  prompt_eval_count?: number;
+  prompt_eval_duration?: number;
+  eval_count?: number;
+  eval_duration?: number;
+}
+
+// ── Message conversion ──────────────────────────────────────────────────────
+
+type InputContentPart =
+  | { type: "text"; text: string }
+  | { type: "image"; data: string }
+  | { type: "toolCall"; id: string; name: string; arguments: Record<string, unknown> }
+  | { type: "tool_use"; id: string; name: string; input: Record<string, unknown> };
+
+function extractTextContent(content: unknown): string {
+  if (typeof content === "string") {
+    return content;
+  }
+  if (!Array.isArray(content)) {
+    return "";
+  }
+  return (content as InputContentPart[])
+    .filter((part): part is { type: "text"; text: string } => part.type === "text")
+    .map((part) => part.text)
+    .join("");
+}
+
+function extractOllamaImages(content: unknown): string[] {
+  if (!Array.isArray(content)) {
+    return [];
+  }
+  return (content as InputContentPart[])
+    .filter((part): part is { type: "image"; data: string } => part.type === "image")
+    .map((part) => part.data);
+}
+
+function extractToolCalls(content: unknown): OllamaToolCall[] {
+  if (!Array.isArray(content)) {
+    return [];
+  }
+  const parts = content as InputContentPart[];
+  const result: OllamaToolCall[] = [];
+  for (const part of parts) {
+    if (part.type === "toolCall") {
+      result.push({ function: { name: part.name, arguments: part.arguments } });
+    } else if (part.type === "tool_use") {
+      result.push({ function: { name: part.name, arguments: part.input } });
+    }
+  }
+  return result;
+}
+
+export function convertToOllamaMessages(
+  messages: Array<{ role: string; content: unknown }>,
+  system?: string,
+): OllamaChatMessage[] {
+  const result: OllamaChatMessage[] = [];
+
+  if (system) {
+    result.push({ role: "system", content: system });
+  }
+
+  for (const msg of messages) {
+    const { role } = msg;
+
+    if (role === "user") {
+      const text = extractTextContent(msg.content);
+      const images = extractOllamaImages(msg.content);
+      result.push({
+        role: "user",
+        content: text,
+        ...(images.length > 0 ? { images } : {}),
+      });
+    } else if (role === "assistant") {
+      const text = extractTextContent(msg.content);
+      const toolCalls = extractToolCalls(msg.content);
+      result.push({
+        role: "assistant",
+        content: text,
+        ...(toolCalls.length > 0 ? { tool_calls: toolCalls } : {}),
+      });
+    } else if (role === "tool" || role === "toolResult") {
+      // SDK uses "toolResult" (camelCase) for tool result messages.
+      // Ollama API expects "tool" role with tool_name per the native spec.
+      const text = extractTextContent(msg.content);
+      const toolName =
+        typeof (msg as { toolName?: unknown }).toolName === "string"
+          ? (msg as { toolName?: string }).toolName
+          : undefined;
+      result.push({
+        role: "tool",
+        content: text,
+        ...(toolName ? { tool_name: toolName } : {}),
+      });
+    }
+  }
+
+  return result;
+}
+
+// ── Tool extraction ─────────────────────────────────────────────────────────
+
+function extractOllamaTools(tools: Tool[] | undefined): OllamaTool[] {
+  if (!tools || !Array.isArray(tools)) {
+    return [];
+  }
+  const result: OllamaTool[] = [];
+  for (const tool of tools) {
+    if (typeof tool.name !== "string" || !tool.name) {
+      continue;
+    }
+    result.push({
+      type: "function",
+      function: {
+        name: tool.name,
+        description: typeof tool.description === "string" ? tool.description : "",
+        parameters: (tool.parameters ?? {}) as Record<string, unknown>,
+      },
+    });
+  }
+  return result;
+}
+
+// ── Response conversion ─────────────────────────────────────────────────────
+
+export function buildAssistantMessage(
+  response: OllamaChatResponse,
+  modelInfo: { api: string; provider: string; id: string },
+): AssistantMessage {
+  const content: (TextContent | ToolCall)[] = [];
+
+  if (response.message.content) {
+    content.push({ type: "text", text: response.message.content });
+  }
+
+  const toolCalls = response.message.tool_calls;
+  if (toolCalls && toolCalls.length > 0) {
+    for (const tc of toolCalls) {
+      content.push({
+        type: "toolCall",
+        id: `ollama_call_${randomUUID()}`,
+        name: tc.function.name,
+        arguments: tc.function.arguments,
+      });
+    }
+  }
+
+  const hasToolCalls = toolCalls && toolCalls.length > 0;
+  const stopReason: StopReason = hasToolCalls ? "toolUse" : "stop";
+
+  const usage: Usage = {
+    input: response.prompt_eval_count ?? 0,
+    output: response.eval_count ?? 0,
+    cacheRead: 0,
+    cacheWrite: 0,
+    totalTokens: (response.prompt_eval_count ?? 0) + (response.eval_count ?? 0),
+    cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
+  };
+
+  return {
+    role: "assistant",
+    content,
+    stopReason,
+    api: modelInfo.api,
+    provider: modelInfo.provider,
+    model: modelInfo.id,
+    usage,
+    timestamp: Date.now(),
+  };
+}
+
+// ── NDJSON streaming parser ─────────────────────────────────────────────────
+
+export async function* parseNdjsonStream(
+  reader: ReadableStreamDefaultReader<Uint8Array>,
+): AsyncGenerator<OllamaChatResponse> {
+  const decoder = new TextDecoder();
+  let buffer = "";
+
+  while (true) {
+    const { done, value } = await reader.read();
+    if (done) {
+      break;
+    }
+    buffer += decoder.decode(value, { stream: true });
+    const lines = buffer.split("\n");
+    buffer = lines.pop() ?? "";
+
+    for (const line of lines) {
+      const trimmed = line.trim();
+      if (!trimmed) {
+        continue;
+      }
+      try {
+        yield JSON.parse(trimmed) as OllamaChatResponse;
+      } catch {
+        console.warn("[ollama-stream] Skipping malformed NDJSON line:", trimmed.slice(0, 120));
+      }
+    }
+  }
+
+  if (buffer.trim()) {
+    try {
+      yield JSON.parse(buffer.trim()) as OllamaChatResponse;
+    } catch {
+      console.warn(
+        "[ollama-stream] Skipping malformed trailing data:",
+        buffer.trim().slice(0, 120),
+      );
+    }
+  }
+}
+
+// ── Main StreamFn factory ───────────────────────────────────────────────────
+
+function resolveOllamaChatUrl(baseUrl: string): string {
+  const trimmed = baseUrl.trim().replace(/\/+$/, "");
+  const normalizedBase = trimmed.replace(/\/v1$/i, "");
+  const apiBase = normalizedBase || OLLAMA_NATIVE_BASE_URL;
+  return `${apiBase}/api/chat`;
+}
+
+export function createOllamaStreamFn(baseUrl: string): StreamFn {
+  const chatUrl = resolveOllamaChatUrl(baseUrl);
+
+  return (model, context, options) => {
+    const stream = createAssistantMessageEventStream();
+
+    const run = async () => {
+      try {
+        const ollamaMessages = convertToOllamaMessages(
+          context.messages ?? [],
+          context.systemPrompt,
+        );
+
+        const ollamaTools = extractOllamaTools(context.tools);
+
+        // Ollama defaults to num_ctx=4096 which is too small for large
+        // system prompts + many tool definitions. Use model's contextWindow.
+        const ollamaOptions: Record<string, unknown> = { num_ctx: model.contextWindow ?? 65536 };
+        if (typeof options?.temperature === "number") {
+          ollamaOptions.temperature = options.temperature;
+        }
+        if (typeof options?.maxTokens === "number") {
+          ollamaOptions.num_predict = options.maxTokens;
+        }
+
+        const body: OllamaChatRequest = {
+          model: model.id,
+          messages: ollamaMessages,
+          stream: true,
+          ...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
+          options: ollamaOptions,
+        };
+
+        const headers: Record<string, string> = {
+          "Content-Type": "application/json",
+          ...options?.headers,
+        };
+        if (options?.apiKey) {
+          headers.Authorization = `Bearer ${options.apiKey}`;
+        }
+
+        const response = await fetch(chatUrl, {
+          method: "POST",
+          headers,
+          body: JSON.stringify(body),
+          signal: options?.signal,
+        });
+
+        if (!response.ok) {
+          const errorText = await response.text().catch(() => "unknown error");
+          throw new Error(`Ollama API error ${response.status}: ${errorText}`);
+        }
+
+        if (!response.body) {
+          throw new Error("Ollama API returned empty response body");
+        }
+
+        const reader = response.body.getReader();
+        let accumulatedContent = "";
+        const accumulatedToolCalls: OllamaToolCall[] = [];
+        let finalResponse: OllamaChatResponse | undefined;
+
+        for await (const chunk of parseNdjsonStream(reader)) {
+          if (chunk.message?.content) {
+            accumulatedContent += chunk.message.content;
+          }
+
+          // Ollama sends tool_calls in intermediate (done:false) chunks,
+          // NOT in the final done:true chunk. Collect from all chunks.
+          if (chunk.message?.tool_calls) {
+            accumulatedToolCalls.push(...chunk.message.tool_calls);
+          }
+
+          if (chunk.done) {
+            finalResponse = chunk;
+            break;
+          }
+        }
+
+        if (!finalResponse) {
+          throw new Error("Ollama API stream ended without a final response");
+        }
+
+        finalResponse.message.content = accumulatedContent;
+        if (accumulatedToolCalls.length > 0) {
+          finalResponse.message.tool_calls = accumulatedToolCalls;
+        }
+
+        const assistantMessage = buildAssistantMessage(finalResponse, {
+          api: model.api,
+          provider: model.provider,
+          id: model.id,
+        });
+
+        const reason: Extract<StopReason, "stop" | "length" | "toolUse"> =
+          assistantMessage.stopReason === "toolUse" ? "toolUse" : "stop";
+
+        stream.push({
+          type: "done",
+          reason,
+          message: assistantMessage,
+        });
+      } catch (err) {
+        const errorMessage = err instanceof Error ? err.message : String(err);
+        stream.push({
+          type: "error",
+          reason: "error",
+          error: {
+            role: "assistant" as const,
+            content: [],
+            stopReason: "error" as StopReason,
+            errorMessage,
+            api: model.api,
+            provider: model.provider,
+            model: model.id,
+            usage: {
+              input: 0,
+              output: 0,
+              cacheRead: 0,
+              cacheWrite: 0,
+              totalTokens: 0,
+              cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
+            },
+            timestamp: Date.now(),
+          },
+        });
+      } finally {
+        stream.end();
+      }
+    };
+
+    queueMicrotask(() => void run());
+    return stream;
+  };
+}
--- a/src/agents/pi-embedded-runner/run/attempt.ts
+++ b/src/agents/pi-embedded-runner/run/attempt.ts
@@ -31,6 +31,7 @@ import { resolveOpenClawDocsPath } from "../../docs-path.js";
 import { isTimeoutError } from "../../failover-error.js";
 import { resolveModelAuthMode } from "../../model-auth.js";
 import { resolveDefaultModelForAgent } from "../../model-selection.js";
+import { createOllamaStreamFn, OLLAMA_NATIVE_BASE_URL } from "../../ollama-stream.js";
 import {
  isCloudCodeAssistFormatError,
  resolveBootstrapMaxChars,
@@ -584,8 +585,21 @@ export async function runEmbeddedAttempt(
        workspaceDir: params.workspaceDir,
      });

-      // Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
-      activeSession.agent.streamFn = streamSimple;
+      // Ollama native API: bypass SDK's streamSimple and use direct /api/chat calls
+      // for reliable streaming + tool calling support (#11828).
+      if (params.model.api === "ollama") {
+        // Use the resolved model baseUrl first so custom provider aliases work.
+        const providerConfig = params.config?.models?.providers?.[params.model.provider];
+        const modelBaseUrl =
+          typeof params.model.baseUrl === "string" ? params.model.baseUrl.trim() : "";
+        const providerBaseUrl =
+          typeof providerConfig?.baseUrl === "string" ? providerConfig.baseUrl.trim() : "";
+        const ollamaBaseUrl = modelBaseUrl || providerBaseUrl || OLLAMA_NATIVE_BASE_URL;
+        activeSession.agent.streamFn = createOllamaStreamFn(ollamaBaseUrl);
+      } else {
+        // Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
+        activeSession.agent.streamFn = streamSimple;
+      }

      applyExtraParamsToAgent(
        activeSession.agent,
--- a/src/config/types.models.ts
+++ b/src/config/types.models.ts
@@ -4,7 +4,8 @@ export type ModelApi =
  | "anthropic-messages"
  | "google-generative-ai"
  | "github-copilot"
-  | "bedrock-converse-stream";
+  | "bedrock-converse-stream"
+  | "ollama";

 export type ModelCompatConfig = {
  supportsStore?: boolean;
--- a/src/config/zod-schema.core.ts
+++ b/src/config/zod-schema.core.ts
@@ -9,6 +9,7 @@ export const ModelApiSchema = z.union([
  z.literal("google-generative-ai"),
  z.literal("github-copilot"),
  z.literal("bedrock-converse-stream"),
+  z.literal("ollama"),
 ]);

 export const ModelCompatSchema = z