docs: add split processing example (#303)

Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Tiago Santana
2025-09-04 09:42:11 +01:00
committed by GitHub
parent fe98338239
commit 0d4545a65a
6 changed files with 170 additions and 6 deletions

22
docs/examples.md Normal file
View File

@@ -0,0 +1,22 @@
# Examples
## Split processing
The example of provided of split processing demonstrates how to split a PDF into chunks of pages and send them for conversion. At the end, it concatenates all split pages into a single conversion `JSON`.
At beginning of file there's variables to be used (and modified) such as:
| Variable | Description |
| ---------|-------------|
| `path_to_pdf`| Path to PDF file to be split |
| `pages_per_file`| The number of pages per chunk to split PDF |
| `base_url`| Base url of the `docling-serve` host |
| `out_dir`| The output folder of each conversion `JSON` of split PDF and the final concatenated `JSON` |
The example follows the following logic:
- Get the number of pages of the `PDF`
- Based on the number of chunks of pages, send each chunk to conversion using `page_range` parameter
- Wait all conversions to finish
- Get all conversion results
- Save each conversion `JSON` result into a `JSON` file
- Concatenate all `JSONs` into a single `JSON` using `docling` concatenate method
- Save concatenated `JSON` into a `JSON` file