mirror of
https://github.com/moltbot/moltbot.git
synced 2026-03-07 22:44:16 +00:00
feat(ollama): add native /api/chat provider for streaming + tool calling (#11853)
Merged via /review-pr -> /prepare-pr -> /merge-pr.
Prepared head SHA: 0a723f98e6
Co-authored-by: BrokenFinger98 <115936166+BrokenFinger98@users.noreply.github.com>
Co-authored-by: steipete <58493+steipete@users.noreply.github.com>
Reviewed-by: @steipete
This commit is contained in:
@@ -69,6 +69,7 @@ Docs: https://docs.openclaw.ai
|
||||
- Config: keep legacy audio transcription migration strict by rejecting non-string/unsafe command tokens while still migrating valid custom script executables. (#5042) Thanks @shayan919293.
|
||||
- Status/Sessions: stop clamping derived `totalTokens` to context-window size, keep prompt-token snapshots wired through session accounting, and surface context usage as unknown when fresh snapshot data is missing to avoid false 100% reports. (#15114) Thanks @echoVic.
|
||||
- Providers/MiniMax: switch implicit MiniMax API-key provider from `openai-completions` to `anthropic-messages` with the correct Anthropic-compatible base URL, fixing `invalid role: developer (2013)` errors on MiniMax M2.5. (#15275) Thanks @lailoo.
|
||||
- Ollama/Agents: use resolved model/provider base URLs for native `/api/chat` streaming (including aliased providers), normalize `/v1` endpoints, and forward abort + `maxTokens` stream options for reliable cancellation and token caps. (#11853) Thanks @BrokenFinger98.
|
||||
|
||||
## 2026.2.12
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ title: "Ollama"
|
||||
|
||||
# Ollama
|
||||
|
||||
Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's OpenAI-compatible API and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.
|
||||
Ollama is a local LLM runtime that makes it easy to run open-source models on your machine. OpenClaw integrates with Ollama's native API (`/api/chat`), supporting streaming and tool calling, and can **auto-discover tool-capable models** when you opt in with `OLLAMA_API_KEY` (or an auth profile) and do not define an explicit `models.providers.ollama` entry.
|
||||
|
||||
## Quick start
|
||||
|
||||
@@ -101,10 +101,9 @@ Use explicit config when:
|
||||
models: {
|
||||
providers: {
|
||||
ollama: {
|
||||
// Use a host that includes /v1 for OpenAI-compatible APIs
|
||||
baseUrl: "http://ollama-host:11434/v1",
|
||||
baseUrl: "http://ollama-host:11434",
|
||||
apiKey: "ollama-local",
|
||||
api: "openai-completions",
|
||||
api: "ollama",
|
||||
models: [
|
||||
{
|
||||
id: "gpt-oss:20b",
|
||||
@@ -134,7 +133,7 @@ If Ollama is running on a different host or port (explicit config disables auto-
|
||||
providers: {
|
||||
ollama: {
|
||||
apiKey: "ollama-local",
|
||||
baseUrl: "http://ollama-host:11434/v1",
|
||||
baseUrl: "http://ollama-host:11434",
|
||||
},
|
||||
},
|
||||
},
|
||||
@@ -174,45 +173,28 @@ Ollama is free and runs locally, so all model costs are set to $0.
|
||||
|
||||
### Streaming Configuration
|
||||
|
||||
Due to a [known issue](https://github.com/badlogic/pi-mono/issues/1205) in the underlying SDK with Ollama's response format, **streaming is disabled by default** for Ollama models. This prevents corrupted responses when using tool-capable models.
|
||||
OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.
|
||||
|
||||
When streaming is disabled, responses are delivered all at once (non-streaming mode), which avoids the issue where interleaved content/reasoning deltas cause garbled output.
|
||||
#### Legacy OpenAI-Compatible Mode
|
||||
|
||||
#### Re-enable Streaming (Advanced)
|
||||
|
||||
If you want to re-enable streaming for Ollama (may cause issues with tool-capable models):
|
||||
If you need to use the OpenAI-compatible endpoint instead (e.g., behind a proxy that only supports OpenAI format), set `api: "openai-completions"` explicitly:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
models: {
|
||||
"ollama/gpt-oss:20b": {
|
||||
streaming: true,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
models: {
|
||||
providers: {
|
||||
ollama: {
|
||||
baseUrl: "http://ollama-host:11434/v1",
|
||||
api: "openai-completions",
|
||||
apiKey: "ollama-local",
|
||||
models: [...]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Disable Streaming for Other Providers
|
||||
|
||||
You can also disable streaming for any provider if needed:
|
||||
|
||||
```json5
|
||||
{
|
||||
agents: {
|
||||
defaults: {
|
||||
models: {
|
||||
"openai/gpt-4": {
|
||||
streaming: false,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
Note: The OpenAI-compatible endpoint may not support streaming + tool calling simultaneously. You may need to disable streaming with `params: { streaming: false }` in model config.
|
||||
|
||||
### Context windows
|
||||
|
||||
@@ -261,15 +243,6 @@ ps aux | grep ollama
|
||||
ollama serve
|
||||
```
|
||||
|
||||
### Corrupted responses or tool names in output
|
||||
|
||||
If you see garbled responses containing tool names (like `sessions_send`, `memory_get`) or fragmented text when using Ollama models, this is due to an upstream SDK issue with streaming responses. **This is fixed by default** in the latest OpenClaw version by disabling streaming for Ollama models.
|
||||
|
||||
If you manually enabled streaming and experience this issue:
|
||||
|
||||
1. Remove the `streaming: true` configuration from your Ollama model entries, or
|
||||
2. Explicitly set `streaming: false` for Ollama models (see [Streaming Configuration](#streaming-configuration))
|
||||
|
||||
## See Also
|
||||
|
||||
- [Model Providers](/concepts/model-providers) - Overview of all providers
|
||||
|
||||
@@ -29,25 +29,20 @@ describe("Ollama provider", () => {
|
||||
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
|
||||
const providers = await resolveImplicitProviders({ agentDir });
|
||||
|
||||
// Ollama requires explicit configuration via OLLAMA_API_KEY env var or profile
|
||||
expect(providers?.ollama).toBeUndefined();
|
||||
});
|
||||
|
||||
it("should disable streaming by default for Ollama models", async () => {
|
||||
it("should use native ollama api type", async () => {
|
||||
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
|
||||
process.env.OLLAMA_API_KEY = "test-key";
|
||||
|
||||
try {
|
||||
const providers = await resolveImplicitProviders({ agentDir });
|
||||
|
||||
// Provider should be defined with OLLAMA_API_KEY set
|
||||
expect(providers?.ollama).toBeDefined();
|
||||
expect(providers?.ollama?.apiKey).toBe("OLLAMA_API_KEY");
|
||||
|
||||
// Note: discoverOllamaModels() returns empty array in test environments (VITEST env var check)
|
||||
// so we can't test the actual model discovery here. The streaming: false setting
|
||||
// is applied in the model mapping within discoverOllamaModels().
|
||||
// The configuration structure itself is validated by TypeScript and the Zod schema.
|
||||
expect(providers?.ollama?.api).toBe("ollama");
|
||||
expect(providers?.ollama?.baseUrl).toBe("http://127.0.0.1:11434");
|
||||
} finally {
|
||||
delete process.env.OLLAMA_API_KEY;
|
||||
}
|
||||
@@ -69,15 +64,14 @@ describe("Ollama provider", () => {
|
||||
},
|
||||
});
|
||||
|
||||
expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434/v1");
|
||||
// Native API strips /v1 suffix via resolveOllamaApiBase()
|
||||
expect(providers?.ollama?.baseUrl).toBe("http://192.168.20.14:11434");
|
||||
} finally {
|
||||
delete process.env.OLLAMA_API_KEY;
|
||||
}
|
||||
});
|
||||
|
||||
it("should have correct model structure with streaming disabled (unit test)", () => {
|
||||
// This test directly verifies the model configuration structure
|
||||
// since discoverOllamaModels() returns empty array in test mode
|
||||
it("should have correct model structure without streaming override", () => {
|
||||
const mockOllamaModel = {
|
||||
id: "llama3.3:latest",
|
||||
name: "llama3.3:latest",
|
||||
@@ -86,13 +80,9 @@ describe("Ollama provider", () => {
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 128000,
|
||||
maxTokens: 8192,
|
||||
params: {
|
||||
streaming: false,
|
||||
},
|
||||
};
|
||||
|
||||
// Verify the model structure matches what discoverOllamaModels() would return
|
||||
expect(mockOllamaModel.params?.streaming).toBe(false);
|
||||
expect(mockOllamaModel.params).toHaveProperty("streaming");
|
||||
// Native Ollama provider does not need streaming: false workaround
|
||||
expect(mockOllamaModel).not.toHaveProperty("params");
|
||||
});
|
||||
});
|
||||
|
||||
@@ -17,6 +17,7 @@ import {
|
||||
buildHuggingfaceModelDefinition,
|
||||
} from "./huggingface-models.js";
|
||||
import { resolveAwsSdkEnvVarName, resolveEnvApiKey } from "./model-auth.js";
|
||||
import { OLLAMA_NATIVE_BASE_URL } from "./ollama-stream.js";
|
||||
import {
|
||||
buildSyntheticModelDefinition,
|
||||
SYNTHETIC_BASE_URL,
|
||||
@@ -79,8 +80,8 @@ const QWEN_PORTAL_DEFAULT_COST = {
|
||||
cacheWrite: 0,
|
||||
};
|
||||
|
||||
const OLLAMA_BASE_URL = "http://127.0.0.1:11434/v1";
|
||||
const OLLAMA_API_BASE_URL = "http://127.0.0.1:11434";
|
||||
const OLLAMA_BASE_URL = OLLAMA_NATIVE_BASE_URL;
|
||||
const OLLAMA_API_BASE_URL = OLLAMA_BASE_URL;
|
||||
const OLLAMA_DEFAULT_CONTEXT_WINDOW = 128000;
|
||||
const OLLAMA_DEFAULT_MAX_TOKENS = 8192;
|
||||
const OLLAMA_DEFAULT_COST = {
|
||||
@@ -180,11 +181,6 @@ async function discoverOllamaModels(baseUrl?: string): Promise<ModelDefinitionCo
|
||||
cost: OLLAMA_DEFAULT_COST,
|
||||
contextWindow: OLLAMA_DEFAULT_CONTEXT_WINDOW,
|
||||
maxTokens: OLLAMA_DEFAULT_MAX_TOKENS,
|
||||
// Disable streaming by default for Ollama to avoid SDK issue #1205
|
||||
// See: https://github.com/badlogic/pi-mono/issues/1205
|
||||
params: {
|
||||
streaming: false,
|
||||
},
|
||||
};
|
||||
});
|
||||
} catch (error) {
|
||||
@@ -541,8 +537,8 @@ async function buildVeniceProvider(): Promise<ProviderConfig> {
|
||||
async function buildOllamaProvider(configuredBaseUrl?: string): Promise<ProviderConfig> {
|
||||
const models = await discoverOllamaModels(configuredBaseUrl);
|
||||
return {
|
||||
baseUrl: configuredBaseUrl ?? OLLAMA_BASE_URL,
|
||||
api: "openai-completions",
|
||||
baseUrl: resolveOllamaApiBase(configuredBaseUrl),
|
||||
api: "ollama",
|
||||
models,
|
||||
};
|
||||
}
|
||||
|
||||
290
src/agents/ollama-stream.test.ts
Normal file
290
src/agents/ollama-stream.test.ts
Normal file
@@ -0,0 +1,290 @@
|
||||
import { describe, expect, it, vi } from "vitest";
|
||||
import {
|
||||
createOllamaStreamFn,
|
||||
convertToOllamaMessages,
|
||||
buildAssistantMessage,
|
||||
parseNdjsonStream,
|
||||
} from "./ollama-stream.js";
|
||||
|
||||
describe("convertToOllamaMessages", () => {
|
||||
it("converts user text messages", () => {
|
||||
const messages = [{ role: "user", content: "hello" }];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "user", content: "hello" }]);
|
||||
});
|
||||
|
||||
it("converts user messages with content parts", () => {
|
||||
const messages = [
|
||||
{
|
||||
role: "user",
|
||||
content: [
|
||||
{ type: "text", text: "describe this" },
|
||||
{ type: "image", data: "base64data" },
|
||||
],
|
||||
},
|
||||
];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "user", content: "describe this", images: ["base64data"] }]);
|
||||
});
|
||||
|
||||
it("prepends system message when provided", () => {
|
||||
const messages = [{ role: "user", content: "hello" }];
|
||||
const result = convertToOllamaMessages(messages, "You are helpful.");
|
||||
expect(result[0]).toEqual({ role: "system", content: "You are helpful." });
|
||||
expect(result[1]).toEqual({ role: "user", content: "hello" });
|
||||
});
|
||||
|
||||
it("converts assistant messages with toolCall content blocks", () => {
|
||||
const messages = [
|
||||
{
|
||||
role: "assistant",
|
||||
content: [
|
||||
{ type: "text", text: "Let me check." },
|
||||
{ type: "toolCall", id: "call_1", name: "bash", arguments: { command: "ls" } },
|
||||
],
|
||||
},
|
||||
];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result[0].role).toBe("assistant");
|
||||
expect(result[0].content).toBe("Let me check.");
|
||||
expect(result[0].tool_calls).toEqual([
|
||||
{ function: { name: "bash", arguments: { command: "ls" } } },
|
||||
]);
|
||||
});
|
||||
|
||||
it("converts tool result messages with 'tool' role", () => {
|
||||
const messages = [{ role: "tool", content: "file1.txt\nfile2.txt" }];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "tool", content: "file1.txt\nfile2.txt" }]);
|
||||
});
|
||||
|
||||
it("converts SDK 'toolResult' role to Ollama 'tool' role", () => {
|
||||
const messages = [{ role: "toolResult", content: "command output here" }];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "tool", content: "command output here" }]);
|
||||
});
|
||||
|
||||
it("includes tool_name from SDK toolResult messages", () => {
|
||||
const messages = [{ role: "toolResult", content: "file contents here", toolName: "read" }];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "tool", content: "file contents here", tool_name: "read" }]);
|
||||
});
|
||||
|
||||
it("omits tool_name when not provided in toolResult", () => {
|
||||
const messages = [{ role: "toolResult", content: "output" }];
|
||||
const result = convertToOllamaMessages(messages);
|
||||
expect(result).toEqual([{ role: "tool", content: "output" }]);
|
||||
expect(result[0]).not.toHaveProperty("tool_name");
|
||||
});
|
||||
|
||||
it("handles empty messages array", () => {
|
||||
const result = convertToOllamaMessages([]);
|
||||
expect(result).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
describe("buildAssistantMessage", () => {
|
||||
const modelInfo = { api: "ollama", provider: "ollama", id: "qwen3:32b" };
|
||||
|
||||
it("builds text-only response", () => {
|
||||
const response = {
|
||||
model: "qwen3:32b",
|
||||
created_at: "2026-01-01T00:00:00Z",
|
||||
message: { role: "assistant" as const, content: "Hello!" },
|
||||
done: true,
|
||||
prompt_eval_count: 10,
|
||||
eval_count: 5,
|
||||
};
|
||||
const result = buildAssistantMessage(response, modelInfo);
|
||||
expect(result.role).toBe("assistant");
|
||||
expect(result.content).toEqual([{ type: "text", text: "Hello!" }]);
|
||||
expect(result.stopReason).toBe("stop");
|
||||
expect(result.usage.input).toBe(10);
|
||||
expect(result.usage.output).toBe(5);
|
||||
expect(result.usage.totalTokens).toBe(15);
|
||||
});
|
||||
|
||||
it("builds response with tool calls", () => {
|
||||
const response = {
|
||||
model: "qwen3:32b",
|
||||
created_at: "2026-01-01T00:00:00Z",
|
||||
message: {
|
||||
role: "assistant" as const,
|
||||
content: "",
|
||||
tool_calls: [{ function: { name: "bash", arguments: { command: "ls -la" } } }],
|
||||
},
|
||||
done: true,
|
||||
prompt_eval_count: 20,
|
||||
eval_count: 10,
|
||||
};
|
||||
const result = buildAssistantMessage(response, modelInfo);
|
||||
expect(result.stopReason).toBe("toolUse");
|
||||
expect(result.content.length).toBe(1); // toolCall only (empty content is skipped)
|
||||
expect(result.content[0].type).toBe("toolCall");
|
||||
const toolCall = result.content[0] as {
|
||||
type: "toolCall";
|
||||
id: string;
|
||||
name: string;
|
||||
arguments: Record<string, unknown>;
|
||||
};
|
||||
expect(toolCall.name).toBe("bash");
|
||||
expect(toolCall.arguments).toEqual({ command: "ls -la" });
|
||||
expect(toolCall.id).toMatch(/^ollama_call_[0-9a-f-]{36}$/);
|
||||
});
|
||||
|
||||
it("sets all costs to zero for local models", () => {
|
||||
const response = {
|
||||
model: "qwen3:32b",
|
||||
created_at: "2026-01-01T00:00:00Z",
|
||||
message: { role: "assistant" as const, content: "ok" },
|
||||
done: true,
|
||||
};
|
||||
const result = buildAssistantMessage(response, modelInfo);
|
||||
expect(result.usage.cost).toEqual({
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
total: 0,
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
// Helper: build a ReadableStreamDefaultReader from NDJSON lines
|
||||
function mockNdjsonReader(lines: string[]): ReadableStreamDefaultReader<Uint8Array> {
|
||||
const encoder = new TextEncoder();
|
||||
const payload = lines.join("\n") + "\n";
|
||||
let consumed = false;
|
||||
return {
|
||||
read: async () => {
|
||||
if (consumed) {
|
||||
return { done: true as const, value: undefined };
|
||||
}
|
||||
consumed = true;
|
||||
return { done: false as const, value: encoder.encode(payload) };
|
||||
},
|
||||
releaseLock: () => {},
|
||||
cancel: async () => {},
|
||||
closed: Promise.resolve(undefined),
|
||||
} as unknown as ReadableStreamDefaultReader<Uint8Array>;
|
||||
}
|
||||
|
||||
describe("parseNdjsonStream", () => {
|
||||
it("parses text-only streaming chunks", async () => {
|
||||
const reader = mockNdjsonReader([
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"Hello"},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":" world"},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":5,"eval_count":2}',
|
||||
]);
|
||||
const chunks = [];
|
||||
for await (const chunk of parseNdjsonStream(reader)) {
|
||||
chunks.push(chunk);
|
||||
}
|
||||
expect(chunks).toHaveLength(3);
|
||||
expect(chunks[0].message.content).toBe("Hello");
|
||||
expect(chunks[1].message.content).toBe(" world");
|
||||
expect(chunks[2].done).toBe(true);
|
||||
});
|
||||
|
||||
it("parses tool_calls from intermediate chunk (not final)", async () => {
|
||||
// Ollama sends tool_calls in done:false chunk, final done:true has no tool_calls
|
||||
const reader = mockNdjsonReader([
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":10,"eval_count":5}',
|
||||
]);
|
||||
const chunks = [];
|
||||
for await (const chunk of parseNdjsonStream(reader)) {
|
||||
chunks.push(chunk);
|
||||
}
|
||||
expect(chunks).toHaveLength(2);
|
||||
expect(chunks[0].done).toBe(false);
|
||||
expect(chunks[0].message.tool_calls).toHaveLength(1);
|
||||
expect(chunks[0].message.tool_calls![0].function.name).toBe("bash");
|
||||
expect(chunks[1].done).toBe(true);
|
||||
expect(chunks[1].message.tool_calls).toBeUndefined();
|
||||
});
|
||||
|
||||
it("accumulates tool_calls across multiple intermediate chunks", async () => {
|
||||
const reader = mockNdjsonReader([
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"read","arguments":{"path":"/tmp/a"}}}]},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"","tool_calls":[{"function":{"name":"bash","arguments":{"command":"ls"}}}]},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true}',
|
||||
]);
|
||||
|
||||
// Simulate the accumulation logic from createOllamaStreamFn
|
||||
const accumulatedToolCalls: Array<{
|
||||
function: { name: string; arguments: Record<string, unknown> };
|
||||
}> = [];
|
||||
const chunks = [];
|
||||
for await (const chunk of parseNdjsonStream(reader)) {
|
||||
chunks.push(chunk);
|
||||
if (chunk.message?.tool_calls) {
|
||||
accumulatedToolCalls.push(...chunk.message.tool_calls);
|
||||
}
|
||||
}
|
||||
expect(accumulatedToolCalls).toHaveLength(2);
|
||||
expect(accumulatedToolCalls[0].function.name).toBe("read");
|
||||
expect(accumulatedToolCalls[1].function.name).toBe("bash");
|
||||
// Final done:true chunk has no tool_calls
|
||||
expect(chunks[2].message.tool_calls).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe("createOllamaStreamFn", () => {
|
||||
it("normalizes /v1 baseUrl and maps maxTokens + signal", async () => {
|
||||
const originalFetch = globalThis.fetch;
|
||||
const fetchMock = vi.fn(async () => {
|
||||
const payload = [
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}',
|
||||
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":1,"eval_count":1}',
|
||||
].join("\n");
|
||||
return new Response(`${payload}\n`, {
|
||||
status: 200,
|
||||
headers: { "Content-Type": "application/x-ndjson" },
|
||||
});
|
||||
});
|
||||
globalThis.fetch = fetchMock as unknown as typeof fetch;
|
||||
|
||||
try {
|
||||
const streamFn = createOllamaStreamFn("http://ollama-host:11434/v1/");
|
||||
const signal = new AbortController().signal;
|
||||
const stream = streamFn(
|
||||
{
|
||||
id: "qwen3:32b",
|
||||
api: "ollama",
|
||||
provider: "custom-ollama",
|
||||
contextWindow: 131072,
|
||||
} as unknown as Parameters<typeof streamFn>[0],
|
||||
{
|
||||
messages: [{ role: "user", content: "hello" }],
|
||||
} as unknown as Parameters<typeof streamFn>[1],
|
||||
{
|
||||
maxTokens: 123,
|
||||
signal,
|
||||
} as unknown as Parameters<typeof streamFn>[2],
|
||||
);
|
||||
|
||||
const events = [];
|
||||
for await (const event of stream) {
|
||||
events.push(event);
|
||||
}
|
||||
expect(events.at(-1)?.type).toBe("done");
|
||||
|
||||
expect(fetchMock).toHaveBeenCalledTimes(1);
|
||||
const [url, requestInit] = fetchMock.mock.calls[0] as [string, RequestInit];
|
||||
expect(url).toBe("http://ollama-host:11434/api/chat");
|
||||
expect(requestInit.signal).toBe(signal);
|
||||
if (typeof requestInit.body !== "string") {
|
||||
throw new Error("Expected string request body");
|
||||
}
|
||||
|
||||
const requestBody = JSON.parse(requestInit.body) as {
|
||||
options: { num_ctx?: number; num_predict?: number };
|
||||
};
|
||||
expect(requestBody.options.num_ctx).toBe(131072);
|
||||
expect(requestBody.options.num_predict).toBe(123);
|
||||
} finally {
|
||||
globalThis.fetch = originalFetch;
|
||||
}
|
||||
});
|
||||
});
|
||||
419
src/agents/ollama-stream.ts
Normal file
419
src/agents/ollama-stream.ts
Normal file
@@ -0,0 +1,419 @@
|
||||
import type { StreamFn } from "@mariozechner/pi-agent-core";
|
||||
import type {
|
||||
AssistantMessage,
|
||||
StopReason,
|
||||
TextContent,
|
||||
ToolCall,
|
||||
Tool,
|
||||
Usage,
|
||||
} from "@mariozechner/pi-ai";
|
||||
import { createAssistantMessageEventStream } from "@mariozechner/pi-ai";
|
||||
import { randomUUID } from "node:crypto";
|
||||
|
||||
export const OLLAMA_NATIVE_BASE_URL = "http://127.0.0.1:11434";
|
||||
|
||||
// ── Ollama /api/chat request types ──────────────────────────────────────────
|
||||
|
||||
interface OllamaChatRequest {
|
||||
model: string;
|
||||
messages: OllamaChatMessage[];
|
||||
stream: boolean;
|
||||
tools?: OllamaTool[];
|
||||
options?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
interface OllamaChatMessage {
|
||||
role: "system" | "user" | "assistant" | "tool";
|
||||
content: string;
|
||||
images?: string[];
|
||||
tool_calls?: OllamaToolCall[];
|
||||
tool_name?: string;
|
||||
}
|
||||
|
||||
interface OllamaTool {
|
||||
type: "function";
|
||||
function: {
|
||||
name: string;
|
||||
description: string;
|
||||
parameters: Record<string, unknown>;
|
||||
};
|
||||
}
|
||||
|
||||
interface OllamaToolCall {
|
||||
function: {
|
||||
name: string;
|
||||
arguments: Record<string, unknown>;
|
||||
};
|
||||
}
|
||||
|
||||
// ── Ollama /api/chat response types ─────────────────────────────────────────
|
||||
|
||||
interface OllamaChatResponse {
|
||||
model: string;
|
||||
created_at: string;
|
||||
message: {
|
||||
role: "assistant";
|
||||
content: string;
|
||||
tool_calls?: OllamaToolCall[];
|
||||
};
|
||||
done: boolean;
|
||||
done_reason?: string;
|
||||
total_duration?: number;
|
||||
load_duration?: number;
|
||||
prompt_eval_count?: number;
|
||||
prompt_eval_duration?: number;
|
||||
eval_count?: number;
|
||||
eval_duration?: number;
|
||||
}
|
||||
|
||||
// ── Message conversion ──────────────────────────────────────────────────────
|
||||
|
||||
type InputContentPart =
|
||||
| { type: "text"; text: string }
|
||||
| { type: "image"; data: string }
|
||||
| { type: "toolCall"; id: string; name: string; arguments: Record<string, unknown> }
|
||||
| { type: "tool_use"; id: string; name: string; input: Record<string, unknown> };
|
||||
|
||||
function extractTextContent(content: unknown): string {
|
||||
if (typeof content === "string") {
|
||||
return content;
|
||||
}
|
||||
if (!Array.isArray(content)) {
|
||||
return "";
|
||||
}
|
||||
return (content as InputContentPart[])
|
||||
.filter((part): part is { type: "text"; text: string } => part.type === "text")
|
||||
.map((part) => part.text)
|
||||
.join("");
|
||||
}
|
||||
|
||||
function extractOllamaImages(content: unknown): string[] {
|
||||
if (!Array.isArray(content)) {
|
||||
return [];
|
||||
}
|
||||
return (content as InputContentPart[])
|
||||
.filter((part): part is { type: "image"; data: string } => part.type === "image")
|
||||
.map((part) => part.data);
|
||||
}
|
||||
|
||||
function extractToolCalls(content: unknown): OllamaToolCall[] {
|
||||
if (!Array.isArray(content)) {
|
||||
return [];
|
||||
}
|
||||
const parts = content as InputContentPart[];
|
||||
const result: OllamaToolCall[] = [];
|
||||
for (const part of parts) {
|
||||
if (part.type === "toolCall") {
|
||||
result.push({ function: { name: part.name, arguments: part.arguments } });
|
||||
} else if (part.type === "tool_use") {
|
||||
result.push({ function: { name: part.name, arguments: part.input } });
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
export function convertToOllamaMessages(
|
||||
messages: Array<{ role: string; content: unknown }>,
|
||||
system?: string,
|
||||
): OllamaChatMessage[] {
|
||||
const result: OllamaChatMessage[] = [];
|
||||
|
||||
if (system) {
|
||||
result.push({ role: "system", content: system });
|
||||
}
|
||||
|
||||
for (const msg of messages) {
|
||||
const { role } = msg;
|
||||
|
||||
if (role === "user") {
|
||||
const text = extractTextContent(msg.content);
|
||||
const images = extractOllamaImages(msg.content);
|
||||
result.push({
|
||||
role: "user",
|
||||
content: text,
|
||||
...(images.length > 0 ? { images } : {}),
|
||||
});
|
||||
} else if (role === "assistant") {
|
||||
const text = extractTextContent(msg.content);
|
||||
const toolCalls = extractToolCalls(msg.content);
|
||||
result.push({
|
||||
role: "assistant",
|
||||
content: text,
|
||||
...(toolCalls.length > 0 ? { tool_calls: toolCalls } : {}),
|
||||
});
|
||||
} else if (role === "tool" || role === "toolResult") {
|
||||
// SDK uses "toolResult" (camelCase) for tool result messages.
|
||||
// Ollama API expects "tool" role with tool_name per the native spec.
|
||||
const text = extractTextContent(msg.content);
|
||||
const toolName =
|
||||
typeof (msg as { toolName?: unknown }).toolName === "string"
|
||||
? (msg as { toolName?: string }).toolName
|
||||
: undefined;
|
||||
result.push({
|
||||
role: "tool",
|
||||
content: text,
|
||||
...(toolName ? { tool_name: toolName } : {}),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// ── Tool extraction ─────────────────────────────────────────────────────────
|
||||
|
||||
function extractOllamaTools(tools: Tool[] | undefined): OllamaTool[] {
|
||||
if (!tools || !Array.isArray(tools)) {
|
||||
return [];
|
||||
}
|
||||
const result: OllamaTool[] = [];
|
||||
for (const tool of tools) {
|
||||
if (typeof tool.name !== "string" || !tool.name) {
|
||||
continue;
|
||||
}
|
||||
result.push({
|
||||
type: "function",
|
||||
function: {
|
||||
name: tool.name,
|
||||
description: typeof tool.description === "string" ? tool.description : "",
|
||||
parameters: (tool.parameters ?? {}) as Record<string, unknown>,
|
||||
},
|
||||
});
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
// ── Response conversion ─────────────────────────────────────────────────────
|
||||
|
||||
export function buildAssistantMessage(
|
||||
response: OllamaChatResponse,
|
||||
modelInfo: { api: string; provider: string; id: string },
|
||||
): AssistantMessage {
|
||||
const content: (TextContent | ToolCall)[] = [];
|
||||
|
||||
if (response.message.content) {
|
||||
content.push({ type: "text", text: response.message.content });
|
||||
}
|
||||
|
||||
const toolCalls = response.message.tool_calls;
|
||||
if (toolCalls && toolCalls.length > 0) {
|
||||
for (const tc of toolCalls) {
|
||||
content.push({
|
||||
type: "toolCall",
|
||||
id: `ollama_call_${randomUUID()}`,
|
||||
name: tc.function.name,
|
||||
arguments: tc.function.arguments,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
const hasToolCalls = toolCalls && toolCalls.length > 0;
|
||||
const stopReason: StopReason = hasToolCalls ? "toolUse" : "stop";
|
||||
|
||||
const usage: Usage = {
|
||||
input: response.prompt_eval_count ?? 0,
|
||||
output: response.eval_count ?? 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
totalTokens: (response.prompt_eval_count ?? 0) + (response.eval_count ?? 0),
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
|
||||
};
|
||||
|
||||
return {
|
||||
role: "assistant",
|
||||
content,
|
||||
stopReason,
|
||||
api: modelInfo.api,
|
||||
provider: modelInfo.provider,
|
||||
model: modelInfo.id,
|
||||
usage,
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
}
|
||||
|
||||
// ── NDJSON streaming parser ─────────────────────────────────────────────────
|
||||
|
||||
export async function* parseNdjsonStream(
|
||||
reader: ReadableStreamDefaultReader<Uint8Array>,
|
||||
): AsyncGenerator<OllamaChatResponse> {
|
||||
const decoder = new TextDecoder();
|
||||
let buffer = "";
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read();
|
||||
if (done) {
|
||||
break;
|
||||
}
|
||||
buffer += decoder.decode(value, { stream: true });
|
||||
const lines = buffer.split("\n");
|
||||
buffer = lines.pop() ?? "";
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
if (!trimmed) {
|
||||
continue;
|
||||
}
|
||||
try {
|
||||
yield JSON.parse(trimmed) as OllamaChatResponse;
|
||||
} catch {
|
||||
console.warn("[ollama-stream] Skipping malformed NDJSON line:", trimmed.slice(0, 120));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (buffer.trim()) {
|
||||
try {
|
||||
yield JSON.parse(buffer.trim()) as OllamaChatResponse;
|
||||
} catch {
|
||||
console.warn(
|
||||
"[ollama-stream] Skipping malformed trailing data:",
|
||||
buffer.trim().slice(0, 120),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ── Main StreamFn factory ───────────────────────────────────────────────────
|
||||
|
||||
function resolveOllamaChatUrl(baseUrl: string): string {
|
||||
const trimmed = baseUrl.trim().replace(/\/+$/, "");
|
||||
const normalizedBase = trimmed.replace(/\/v1$/i, "");
|
||||
const apiBase = normalizedBase || OLLAMA_NATIVE_BASE_URL;
|
||||
return `${apiBase}/api/chat`;
|
||||
}
|
||||
|
||||
export function createOllamaStreamFn(baseUrl: string): StreamFn {
|
||||
const chatUrl = resolveOllamaChatUrl(baseUrl);
|
||||
|
||||
return (model, context, options) => {
|
||||
const stream = createAssistantMessageEventStream();
|
||||
|
||||
const run = async () => {
|
||||
try {
|
||||
const ollamaMessages = convertToOllamaMessages(
|
||||
context.messages ?? [],
|
||||
context.systemPrompt,
|
||||
);
|
||||
|
||||
const ollamaTools = extractOllamaTools(context.tools);
|
||||
|
||||
// Ollama defaults to num_ctx=4096 which is too small for large
|
||||
// system prompts + many tool definitions. Use model's contextWindow.
|
||||
const ollamaOptions: Record<string, unknown> = { num_ctx: model.contextWindow ?? 65536 };
|
||||
if (typeof options?.temperature === "number") {
|
||||
ollamaOptions.temperature = options.temperature;
|
||||
}
|
||||
if (typeof options?.maxTokens === "number") {
|
||||
ollamaOptions.num_predict = options.maxTokens;
|
||||
}
|
||||
|
||||
const body: OllamaChatRequest = {
|
||||
model: model.id,
|
||||
messages: ollamaMessages,
|
||||
stream: true,
|
||||
...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
|
||||
options: ollamaOptions,
|
||||
};
|
||||
|
||||
const headers: Record<string, string> = {
|
||||
"Content-Type": "application/json",
|
||||
...options?.headers,
|
||||
};
|
||||
if (options?.apiKey) {
|
||||
headers.Authorization = `Bearer ${options.apiKey}`;
|
||||
}
|
||||
|
||||
const response = await fetch(chatUrl, {
|
||||
method: "POST",
|
||||
headers,
|
||||
body: JSON.stringify(body),
|
||||
signal: options?.signal,
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text().catch(() => "unknown error");
|
||||
throw new Error(`Ollama API error ${response.status}: ${errorText}`);
|
||||
}
|
||||
|
||||
if (!response.body) {
|
||||
throw new Error("Ollama API returned empty response body");
|
||||
}
|
||||
|
||||
const reader = response.body.getReader();
|
||||
let accumulatedContent = "";
|
||||
const accumulatedToolCalls: OllamaToolCall[] = [];
|
||||
let finalResponse: OllamaChatResponse | undefined;
|
||||
|
||||
for await (const chunk of parseNdjsonStream(reader)) {
|
||||
if (chunk.message?.content) {
|
||||
accumulatedContent += chunk.message.content;
|
||||
}
|
||||
|
||||
// Ollama sends tool_calls in intermediate (done:false) chunks,
|
||||
// NOT in the final done:true chunk. Collect from all chunks.
|
||||
if (chunk.message?.tool_calls) {
|
||||
accumulatedToolCalls.push(...chunk.message.tool_calls);
|
||||
}
|
||||
|
||||
if (chunk.done) {
|
||||
finalResponse = chunk;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!finalResponse) {
|
||||
throw new Error("Ollama API stream ended without a final response");
|
||||
}
|
||||
|
||||
finalResponse.message.content = accumulatedContent;
|
||||
if (accumulatedToolCalls.length > 0) {
|
||||
finalResponse.message.tool_calls = accumulatedToolCalls;
|
||||
}
|
||||
|
||||
const assistantMessage = buildAssistantMessage(finalResponse, {
|
||||
api: model.api,
|
||||
provider: model.provider,
|
||||
id: model.id,
|
||||
});
|
||||
|
||||
const reason: Extract<StopReason, "stop" | "length" | "toolUse"> =
|
||||
assistantMessage.stopReason === "toolUse" ? "toolUse" : "stop";
|
||||
|
||||
stream.push({
|
||||
type: "done",
|
||||
reason,
|
||||
message: assistantMessage,
|
||||
});
|
||||
} catch (err) {
|
||||
const errorMessage = err instanceof Error ? err.message : String(err);
|
||||
stream.push({
|
||||
type: "error",
|
||||
reason: "error",
|
||||
error: {
|
||||
role: "assistant" as const,
|
||||
content: [],
|
||||
stopReason: "error" as StopReason,
|
||||
errorMessage,
|
||||
api: model.api,
|
||||
provider: model.provider,
|
||||
model: model.id,
|
||||
usage: {
|
||||
input: 0,
|
||||
output: 0,
|
||||
cacheRead: 0,
|
||||
cacheWrite: 0,
|
||||
totalTokens: 0,
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
|
||||
},
|
||||
timestamp: Date.now(),
|
||||
},
|
||||
});
|
||||
} finally {
|
||||
stream.end();
|
||||
}
|
||||
};
|
||||
|
||||
queueMicrotask(() => void run());
|
||||
return stream;
|
||||
};
|
||||
}
|
||||
@@ -31,6 +31,7 @@ import { resolveOpenClawDocsPath } from "../../docs-path.js";
|
||||
import { isTimeoutError } from "../../failover-error.js";
|
||||
import { resolveModelAuthMode } from "../../model-auth.js";
|
||||
import { resolveDefaultModelForAgent } from "../../model-selection.js";
|
||||
import { createOllamaStreamFn, OLLAMA_NATIVE_BASE_URL } from "../../ollama-stream.js";
|
||||
import {
|
||||
isCloudCodeAssistFormatError,
|
||||
resolveBootstrapMaxChars,
|
||||
@@ -584,8 +585,21 @@ export async function runEmbeddedAttempt(
|
||||
workspaceDir: params.workspaceDir,
|
||||
});
|
||||
|
||||
// Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
|
||||
activeSession.agent.streamFn = streamSimple;
|
||||
// Ollama native API: bypass SDK's streamSimple and use direct /api/chat calls
|
||||
// for reliable streaming + tool calling support (#11828).
|
||||
if (params.model.api === "ollama") {
|
||||
// Use the resolved model baseUrl first so custom provider aliases work.
|
||||
const providerConfig = params.config?.models?.providers?.[params.model.provider];
|
||||
const modelBaseUrl =
|
||||
typeof params.model.baseUrl === "string" ? params.model.baseUrl.trim() : "";
|
||||
const providerBaseUrl =
|
||||
typeof providerConfig?.baseUrl === "string" ? providerConfig.baseUrl.trim() : "";
|
||||
const ollamaBaseUrl = modelBaseUrl || providerBaseUrl || OLLAMA_NATIVE_BASE_URL;
|
||||
activeSession.agent.streamFn = createOllamaStreamFn(ollamaBaseUrl);
|
||||
} else {
|
||||
// Force a stable streamFn reference so vitest can reliably mock @mariozechner/pi-ai.
|
||||
activeSession.agent.streamFn = streamSimple;
|
||||
}
|
||||
|
||||
applyExtraParamsToAgent(
|
||||
activeSession.agent,
|
||||
|
||||
@@ -4,7 +4,8 @@ export type ModelApi =
|
||||
| "anthropic-messages"
|
||||
| "google-generative-ai"
|
||||
| "github-copilot"
|
||||
| "bedrock-converse-stream";
|
||||
| "bedrock-converse-stream"
|
||||
| "ollama";
|
||||
|
||||
export type ModelCompatConfig = {
|
||||
supportsStore?: boolean;
|
||||
|
||||
@@ -9,6 +9,7 @@ export const ModelApiSchema = z.union([
|
||||
z.literal("google-generative-ai"),
|
||||
z.literal("github-copilot"),
|
||||
z.literal("bedrock-converse-stream"),
|
||||
z.literal("ollama"),
|
||||
]);
|
||||
|
||||
export const ModelCompatSchema = z
|
||||
|
||||
Reference in New Issue
Block a user