mirror of
https://github.com/docling-project/docling-serve.git
synced 2025-11-29 16:43:24 +00:00
Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
22 lines
1.0 KiB
Markdown
22 lines
1.0 KiB
Markdown
# Examples
|
|
|
|
## Split processing
|
|
|
|
The example of provided of split processing demonstrates how to split a PDF into chunks of pages and send them for conversion. At the end, it concatenates all split pages into a single conversion `JSON`.
|
|
|
|
At beginning of file there's variables to be used (and modified) such as:
|
|
| Variable | Description |
|
|
| ---------|-------------|
|
|
| `path_to_pdf`| Path to PDF file to be split |
|
|
| `pages_per_file`| The number of pages per chunk to split PDF |
|
|
| `base_url`| Base url of the `docling-serve` host |
|
|
| `out_dir`| The output folder of each conversion `JSON` of split PDF and the final concatenated `JSON` |
|
|
|
|
The example follows the following logic:
|
|
- Get the number of pages of the `PDF`
|
|
- Based on the number of chunks of pages, send each chunk to conversion using `page_range` parameter
|
|
- Wait all conversions to finish
|
|
- Get all conversion results
|
|
- Save each conversion `JSON` result into a `JSON` file
|
|
- Concatenate all `JSONs` into a single `JSON` using `docling` concatenate method
|
|
- Save concatenated `JSON` into a `JSON` file |