mirror of
https://github.com/moltbot/moltbot.git
synced 2026-05-04 04:31:59 +00:00
feat: add PDF analysis tool with native provider support (#31319)
* feat: add PDF analysis tool with native provider support New `pdf` tool for analyzing PDF documents with model-powered analysis. Architecture: - Native PDF path: sends raw PDF bytes directly to providers that support inline document input (Anthropic via DocumentBlockParam, Google Gemini via inlineData with application/pdf MIME type) - Extraction fallback: for providers without native PDF support, extracts text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas, then sends through the standard vision/text completion path Key features: - Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10) - Page range selection (`pages` param, e.g. "1-5", "1,3,7-9") - Model override (`model` param) and file size limits (`maxBytesMb`) - Auto-detects provider capability and falls back gracefully - Same security patterns as image tool (SSRF guards, sandbox support, local path roots, workspace-only policy) Config (agents.defaults): - pdfModel: primary/fallbacks (defaults to imageModel, then session model) - pdfMaxBytesMb: max PDF file size (default: 10) - pdfMaxPages: max pages to process (default: 20) Model catalog: - Extended ModelInputType to include "document" alongside "text"/"image" - Added modelSupportsDocument() capability check Files: - src/agents/tools/pdf-tool.ts - main tool factory - src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.) - src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google - src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths - Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help * fix: prepare pdf tool for merge (#31319) (thanks @tyler6204)
This commit is contained in:
@@ -397,6 +397,26 @@ Notes:
|
||||
- Only available when `agents.defaults.imageModel` is configured (primary or fallbacks), or when an implicit image model can be inferred from your default model + configured auth (best-effort pairing).
|
||||
- Uses the image model directly (independent of the main chat model).
|
||||
|
||||
### `pdf`
|
||||
|
||||
Analyze one or more PDF documents.
|
||||
|
||||
Core parameters:
|
||||
|
||||
- `pdf` (single path or URL)
|
||||
- `pdfs` (multiple paths or URLs, up to 10)
|
||||
- `prompt` (optional, defaults to "Analyze this PDF document.")
|
||||
- `pages` (optional page range like `1-5` or `1,3,7-9`)
|
||||
- `model` (optional model override)
|
||||
- `maxBytesMb` (optional size cap)
|
||||
|
||||
Notes:
|
||||
|
||||
- Native PDF provider mode is supported for Anthropic and Google models.
|
||||
- Non-native models use PDF extraction fallback, text first, then rasterized page images when needed.
|
||||
- `pages` filtering is only supported in extraction fallback mode. Native providers return a clear error when `pages` is set.
|
||||
- Defaults are configurable via `agents.defaults.pdfModel`, `agents.defaults.pdfMaxBytesMb`, and `agents.defaults.pdfMaxPages`.
|
||||
|
||||
### `message`
|
||||
|
||||
Send messages and channel actions across Discord/Google Chat/Slack/Telegram/WhatsApp/Signal/iMessage/MS Teams.
|
||||
|
||||
Reference in New Issue
Block a user