diff --git a/docs/docs.json b/docs/docs.json index de4275bd9c5..4dfbf73684d 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -994,6 +994,7 @@ "brave-search", "perplexity", "tools/diffs", + "tools/pdf", "tools/elevated", "tools/exec", "tools/exec-approvals", diff --git a/docs/start/hubs.md b/docs/start/hubs.md index 50cf0d0188c..cad1e41e114 100644 --- a/docs/start/hubs.md +++ b/docs/start/hubs.md @@ -109,6 +109,7 @@ Use these hubs to discover every page, including deep dives and reference docs t - [OpenProse](/prose) - [CLI reference](/cli) - [Exec tool](/tools/exec) +- [PDF tool](/tools/pdf) - [Elevated mode](/tools/elevated) - [Cron jobs](/automation/cron-jobs) - [Cron vs Heartbeat](/automation/cron-vs-heartbeat) diff --git a/docs/tools/index.md b/docs/tools/index.md index 0d3a7094870..fdbc0250833 100644 --- a/docs/tools/index.md +++ b/docs/tools/index.md @@ -401,21 +401,7 @@ Notes: Analyze one or more PDF documents. -Core parameters: - -- `pdf` (single path or URL) -- `pdfs` (multiple paths or URLs, up to 10) -- `prompt` (optional, defaults to "Analyze this PDF document.") -- `pages` (optional page range like `1-5` or `1,3,7-9`) -- `model` (optional model override) -- `maxBytesMb` (optional size cap) - -Notes: - -- Native PDF provider mode is supported for Anthropic and Google models. -- Non-native models use PDF extraction fallback, text first, then rasterized page images when needed. -- `pages` filtering is only supported in extraction fallback mode. Native providers return a clear error when `pages` is set. -- Defaults are configurable via `agents.defaults.pdfModel`, `agents.defaults.pdfMaxBytesMb`, and `agents.defaults.pdfMaxPages`. +For full behavior, limits, config, and examples, see [PDF tool](/tools/pdf). ### `message` diff --git a/docs/tools/pdf.md b/docs/tools/pdf.md new file mode 100644 index 00000000000..e0b90144693 --- /dev/null +++ b/docs/tools/pdf.md @@ -0,0 +1,156 @@ +--- +title: "PDF Tool" +summary: "Analyze one or more PDF documents with native provider support and extraction fallback" +read_when: + - You want to analyze PDFs from agents + - You need exact pdf tool parameters and limits + - You are debugging native PDF mode vs extraction fallback +--- + +# PDF tool + +`pdf` analyzes one or more PDF documents and returns text. + +Quick behavior: + +- Native provider mode for Anthropic and Google model providers. +- Extraction fallback mode for other providers (extract text first, then page images when needed). +- Supports single (`pdf`) or multi (`pdfs`) input, max 10 PDFs per call. + +## Availability + +The tool is only registered when OpenClaw can resolve a PDF-capable model config for the agent: + +1. `agents.defaults.pdfModel` +2. fallback to `agents.defaults.imageModel` +3. fallback to best effort provider defaults based on available auth + +If no usable model can be resolved, the `pdf` tool is not exposed. + +## Input reference + +- `pdf` (`string`): one PDF path or URL +- `pdfs` (`string[]`): multiple PDF paths or URLs, up to 10 total +- `prompt` (`string`): analysis prompt, default `Analyze this PDF document.` +- `pages` (`string`): page filter like `1-5` or `1,3,7-9` +- `model` (`string`): optional model override (`provider/model`) +- `maxBytesMb` (`number`): per-PDF size cap in MB + +Input notes: + +- `pdf` and `pdfs` are merged and deduplicated before loading. +- If no PDF input is provided, the tool errors. +- `pages` is parsed as 1-based page numbers, deduped, sorted, and clamped to the configured max pages. +- `maxBytesMb` defaults to `agents.defaults.pdfMaxBytesMb` or `10`. + +## Supported PDF references + +- local file path (including `~` expansion) +- `file://` URL +- `http://` and `https://` URL + +Reference notes: + +- Other URI schemes (for example `ftp://`) are rejected with `unsupported_pdf_reference`. +- In sandbox mode, remote `http(s)` URLs are rejected. +- With workspace-only file policy enabled, local file paths outside allowed roots are rejected. + +## Execution modes + +### Native provider mode + +Native mode is used for provider `anthropic` and `google`. +The tool sends raw PDF bytes directly to provider APIs. + +Native mode limits: + +- `pages` is not supported. If set, the tool returns an error. + +### Extraction fallback mode + +Fallback mode is used for non-native providers. + +Flow: + +1. Extract text from selected pages (up to `agents.defaults.pdfMaxPages`, default `20`). +2. If extracted text length is below `200` chars, render selected pages to PNG images and include them. +3. Send extracted content plus prompt to the selected model. + +Fallback details: + +- Page image extraction uses a pixel budget of `4,000,000`. +- If the target model does not support image input and there is no extractable text, the tool errors. +- Extraction fallback requires `pdfjs-dist` (and `@napi-rs/canvas` for image rendering). + +## Config + +```json5 +{ + agents: { + defaults: { + pdfModel: { + primary: "anthropic/claude-opus-4-6", + fallbacks: ["openai/gpt-5-mini"], + }, + pdfMaxBytesMb: 10, + pdfMaxPages: 20, + }, + }, +} +``` + +See [Configuration Reference](/gateway/configuration-reference) for full field details. + +## Output details + +The tool returns text in `content[0].text` and structured metadata in `details`. + +Common `details` fields: + +- `model`: resolved model ref (`provider/model`) +- `native`: `true` for native provider mode, `false` for fallback +- `attempts`: fallback attempts that failed before success + +Path fields: + +- single PDF input: `details.pdf` +- multiple PDF inputs: `details.pdfs[]` with `pdf` entries +- sandbox path rewrite metadata (when applicable): `rewrittenFrom` + +## Error behavior + +- Missing PDF input: throws `pdf required: provide a path or URL to a PDF document` +- Too many PDFs: returns structured error in `details.error = "too_many_pdfs"` +- Unsupported reference scheme: returns `details.error = "unsupported_pdf_reference"` +- Native mode with `pages`: throws clear `pages is not supported with native PDF providers` error + +## Examples + +Single PDF: + +```json +{ + "pdf": "/tmp/report.pdf", + "prompt": "Summarize this report in 5 bullets" +} +``` + +Multiple PDFs: + +```json +{ + "pdfs": ["/tmp/q1.pdf", "/tmp/q2.pdf"], + "prompt": "Compare risks and timeline changes across both documents" +} +``` + +Page-filtered fallback model: + +```json +{ + "pdf": "https://example.com/report.pdf", + "pages": "1-3,7", + "model": "openai/gpt-5-mini", + "prompt": "Extract only customer-impacting incidents" +} +```