14 Commits

Author SHA1 Message Date
github-actions[bot]
060ecd8b0e chore: bump version to 0.11.0 [skip ci] 2025-05-23 13:45:54 +00:00
Michele Dolfi
32b8a809f3 feat: page break placeholder in markdown exports options (#194)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-23 15:26:27 +02:00
Michele Dolfi
de002dfcdc feat: clear results registry (#192)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-23 14:30:57 +02:00
Michele Dolfi
abe5aa03f5 feat: Upgrade to Docling 2.33.0 (#198)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-22 17:00:29 +02:00
VIktor Kuropiantnyk
3f090b7d15 docs: Example and instructions on how to load model weights to persistent volume (#197)
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
2025-05-21 13:04:46 +02:00
Michele Dolfi
21c1791e42 docs: async api usage and fixes (#195)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-19 13:57:35 +02:00
Michele Dolfi
00be428490 feat: api to trigger offloading the models (#188)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-14 15:02:18 +02:00
Kasper Dinkla
3ff1b2f983 feat: Figure annotations @ docling components 0.0.7 (#181)
Signed-off-by: DKL <dkl@zurich.ibm.com>
2025-05-08 16:31:10 +02:00
Michele Dolfi
8406fb9b59 fix: usage of hashlib for FIPS (#171)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-05-02 15:00:10 +02:00
github-actions[bot]
a2dcb0a20f chore: bump version to 0.10.1 [skip ci] 2025-04-30 16:04:30 +00:00
Michele Dolfi
36787bc061 fix: avoid missing specialized keys in the options hash (#166)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-30 13:14:34 +02:00
Michele Dolfi
509f4889f8 fix: allow users to set the area threshold for picture descriptions (#165)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
2025-04-30 12:37:24 +02:00
Michele Dolfi
919cf5c041 fix: expose max wait time in sync endpoints (#164)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-30 12:30:11 +02:00
Michele Dolfi
35c2630c61 fix: add flash-attn for cuda images (#161)
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-04-29 16:58:33 +02:00
31 changed files with 4220 additions and 3131 deletions

View File

@@ -19,7 +19,7 @@ jobs:
platforms: linux/amd64, linux/arm64
- name: docling-project/docling-serve-cpu
build_args: |
UV_SYNC_EXTRA_ARGS=--no-extra cu124
UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn
platforms: linux/amd64, linux/arm64
- name: docling-project/docling-serve-cu124
build_args: |

View File

@@ -23,7 +23,7 @@ jobs:
platforms: linux/amd64, linux/arm64
- name: docling-project/docling-serve-cpu
build_args: |
UV_SYNC_EXTRA_ARGS=--no-extra cu124
UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn
platforms: linux/amd64, linux/arm64
- name: docling-project/docling-serve-cu124
build_args: |

View File

@@ -17,7 +17,7 @@ jobs:
python-version: ${{ matrix.python-version }}
enable-cache: true
- name: Install dependencies
run: uv sync --all-extras --no-extra cu124
run: uv sync --all-extras --no-extra cu124 --no-extra flash-attn
- name: Build package
run: uv build
- name: Check content of wheel

View File

@@ -25,7 +25,7 @@ jobs:
key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }}
- name: Install dependencies
run: uv sync --frozen --all-extras --no-extra cu124
run: uv sync --frozen --all-extras --no-extra cu124 --no-extra flash-attn
- name: Run styling check
run: pre-commit run --all-files

View File

@@ -1,3 +1,31 @@
## [v0.11.0](https://github.com/docling-project/docling-serve/releases/tag/v0.11.0) - 2025-05-23
### Feature
* Page break placeholder in markdown exports options ([#194](https://github.com/docling-project/docling-serve/issues/194)) ([`32b8a80`](https://github.com/docling-project/docling-serve/commit/32b8a809f348bf9fbde657f93589a56935d3749d))
* Clear results registry ([#192](https://github.com/docling-project/docling-serve/issues/192)) ([`de002df`](https://github.com/docling-project/docling-serve/commit/de002dfcdc111c942a08b156c84b7fa22b3fbaf3))
* Upgrade to Docling 2.33.0 ([#198](https://github.com/docling-project/docling-serve/issues/198)) ([`abe5aa0`](https://github.com/docling-project/docling-serve/commit/abe5aa03f54d44ecf5c6d76e3258028997a53e68))
* Api to trigger offloading the models ([#188](https://github.com/docling-project/docling-serve/issues/188)) ([`00be428`](https://github.com/docling-project/docling-serve/commit/00be4284904d55b78c75c5475578ef11c2ade94c))
* Figure annotations @ docling components 0.0.7 ([#181](https://github.com/docling-project/docling-serve/issues/181)) ([`3ff1b2f`](https://github.com/docling-project/docling-serve/commit/3ff1b2f9834aca37472a895a0e3da47560457d77))
### Fix
* Usage of hashlib for FIPS ([#171](https://github.com/docling-project/docling-serve/issues/171)) ([`8406fb9`](https://github.com/docling-project/docling-serve/commit/8406fb9b59d83247b8379974cabed497703dfc4d))
### Documentation
* Example and instructions on how to load model weights to persistent volume ([#197](https://github.com/docling-project/docling-serve/issues/197)) ([`3f090b7`](https://github.com/docling-project/docling-serve/commit/3f090b7d15eaf696611d89bbbba5b98569610828))
* Async api usage and fixes ([#195](https://github.com/docling-project/docling-serve/issues/195)) ([`21c1791`](https://github.com/docling-project/docling-serve/commit/21c1791e427f5b1946ed46c68dfda03c957dca8f))
## [v0.10.1](https://github.com/docling-project/docling-serve/releases/tag/v0.10.1) - 2025-04-30
### Fix
* Avoid missing specialized keys in the options hash ([#166](https://github.com/docling-project/docling-serve/issues/166)) ([`36787bc`](https://github.com/docling-project/docling-serve/commit/36787bc0616356a6199da618d8646de51636b34e))
* Allow users to set the area threshold for picture descriptions ([#165](https://github.com/docling-project/docling-serve/issues/165)) ([`509f488`](https://github.com/docling-project/docling-serve/commit/509f4889f8ed4c0f0ce25bec4126ef1f1199797c))
* Expose max wait time in sync endpoints ([#164](https://github.com/docling-project/docling-serve/issues/164)) ([`919cf5c`](https://github.com/docling-project/docling-serve/commit/919cf5c0414f2f11eb8012f451fed7a8f582b7ad))
* Add flash-attn for cuda images ([#161](https://github.com/docling-project/docling-serve/issues/161)) ([`35c2630`](https://github.com/docling-project/docling-serve/commit/35c2630c613cf229393fc67b6938152b063ff498))
## [v0.10.0](https://github.com/docling-project/docling-serve/releases/tag/v0.10.0) - 2025-04-28
### Feature

View File

@@ -46,7 +46,10 @@ RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
--mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
umask 002 && uv sync --frozen --no-install-project --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}
umask 002 && \
UV_SYNC_ARGS="--frozen --no-install-project --no-dev --all-extras" && \
uv sync ${UV_SYNC_ARGS} ${UV_SYNC_EXTRA_ARGS} --no-extra flash-attn && \
FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE uv sync ${UV_SYNC_ARGS} ${UV_SYNC_EXTRA_ARGS} --no-build-isolation-package=flash-attn
ARG MODELS_LIST="layout tableformer picture_classifier easyocr"

View File

@@ -35,7 +35,7 @@ docling-serve-image: Containerfile
.PHONY: docling-serve-cpu-image
docling-serve-cpu-image: Containerfile ## Build docling-serve "cpu only" container image
$(ECHO_PREFIX) printf " %-12s Containerfile\n" "[docling-serve CPU]"
$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) ghcr.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) quay.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)

View File

@@ -70,7 +70,7 @@ An easy to use UI is available at the `/ui` endpoint.
## Documentation and advance usages
Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md).
Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md), pre-load model weights into a persistent volume [model weights on persistent volume](./docs/pre-loading-models.md)
## Get help and support

View File

@@ -39,6 +39,7 @@ from docling_serve.datamodel.requests import (
ConvertDocumentsRequest,
)
from docling_serve.datamodel.responses import (
ClearResponse,
ConvertDocumentResponse,
HealthCheckResponse,
MessageKind,
@@ -46,6 +47,7 @@ from docling_serve.datamodel.responses import (
WebsocketMessage,
)
from docling_serve.datamodel.task import Task, TaskSource
from docling_serve.docling_conversion import _get_converter_from_hash
from docling_serve.engines.async_orchestrator import (
BaseAsyncOrchestrator,
ProgressInvalid,
@@ -251,7 +253,6 @@ def create_app(): # noqa: C901
async def _wait_task_complete(
orchestrator: BaseAsyncOrchestrator, task_id: str
) -> bool:
MAX_WAIT = 120
start_time = time.monotonic()
while True:
task = await orchestrator.task_status(task_id=task_id)
@@ -259,7 +260,7 @@ def create_app(): # noqa: C901
return True
await asyncio.sleep(5)
elapsed_time = time.monotonic() - start_time
if elapsed_time > MAX_WAIT:
if elapsed_time > docling_serve_settings.max_sync_wait:
return False
#############################
@@ -310,7 +311,8 @@ def create_app(): # noqa: C901
if not success:
# TODO: abort task!
return HTTPException(
status_code=504, detail="Conversion is taking too long."
status_code=504,
detail=f"Conversion is taking too long. The maximum wait time is configure as DOCLING_SERVE_MAX_SYNC_WAIT={docling_serve_settings.max_sync_wait}.",
)
result = await orchestrator.task_result(
@@ -351,7 +353,8 @@ def create_app(): # noqa: C901
if not success:
# TODO: abort task!
return HTTPException(
status_code=504, detail="Conversion is taking too long."
status_code=504,
detail=f"Conversion is taking too long. The maximum wait time is configure as DOCLING_SERVE_MAX_SYNC_WAIT={docling_serve_settings.max_sync_wait}.",
)
result = await orchestrator.task_result(
@@ -543,4 +546,27 @@ def create_app(): # noqa: C901
status_code=400, detail=f"Invalid progress payload: {err}"
)
#### Clear requests
# Offload models
@app.get(
"/v1alpha/clear/converters",
response_model=ClearResponse,
)
async def clear_converters():
_get_converter_from_hash.cache_clear()
return ClearResponse()
# Clean results
@app.get(
"/v1alpha/clear/results",
response_model=ClearResponse,
)
async def clear_results(
orchestrator: Annotated[BaseAsyncOrchestrator, Depends(get_async_orchestrator)],
older_then: float = 3600,
):
await orchestrator.clear_results(older_than=older_then)
return ClearResponse()
return app

View File

@@ -9,6 +9,7 @@ from docling.datamodel.pipeline_options import (
EasyOcrOptions,
PdfBackend,
PdfPipeline,
PictureDescriptionBaseOptions,
TableFormerMode,
TableStructureOptions,
)
@@ -295,6 +296,14 @@ class ConvertDocumentsOptions(BaseModel):
),
] = 2.0
md_page_break_placeholder: Annotated[
str,
Field(
description="Add this placeholder betweek pages in the markdown output.",
examples=["<!-- page-break -->", ""],
),
] = ""
do_code_enrichment: Annotated[
bool,
Field(
@@ -339,6 +348,14 @@ class ConvertDocumentsOptions(BaseModel):
),
] = False
picture_description_area_threshold: Annotated[
float,
Field(
description="Minimum percentage of the area for a picture to be processed with the models.",
examples=[PictureDescriptionBaseOptions().picture_area_threshold],
),
] = PictureDescriptionBaseOptions().picture_area_threshold
picture_description_local: Annotated[
Optional[PictureDescriptionLocal],
Field(

View File

@@ -15,6 +15,10 @@ class HealthCheckResponse(BaseModel):
status: str = "ok"
class ClearResponse(BaseModel):
status: str = "ok"
class DocumentResponse(BaseModel):
filename: str
md_content: Optional[str] = None

View File

@@ -1,8 +1,10 @@
import datetime
from functools import partial
from pathlib import Path
from typing import Optional, Union
from fastapi.responses import FileResponse
from pydantic import BaseModel, ConfigDict
from pydantic import BaseModel, ConfigDict, Field
from docling.datamodel.base_models import DocumentStream
@@ -25,6 +27,27 @@ class Task(BaseModel):
result: Optional[Union[ConvertDocumentResponse, FileResponse]] = None
scratch_dir: Optional[Path] = None
processing_meta: Optional[TaskProcessingMeta] = None
created_at: datetime.datetime = Field(
default_factory=partial(datetime.datetime.now, datetime.timezone.utc)
)
started_at: Optional[datetime.datetime] = None
finished_at: Optional[datetime.datetime] = None
last_update_at: datetime.datetime = Field(
default_factory=partial(datetime.datetime.now, datetime.timezone.utc)
)
def set_status(self, status: TaskStatus):
now = datetime.datetime.now(datetime.timezone.utc)
if status == TaskStatus.STARTED and self.started_at is None:
self.started_at = now
if (
status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]
and self.finished_at is None
):
self.finished_at = now
self.last_update_at = now
self.task_status = status
def is_completed(self) -> bool:
if self.task_status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]:

View File

@@ -42,15 +42,12 @@ _log = logging.getLogger(__name__)
# Custom serializer for PdfFormatOption
# (model_dump_json does not work with some classes)
def _hash_pdf_format_option(pdf_format_option: PdfFormatOption) -> bytes:
data = pdf_format_option.model_dump()
data = pdf_format_option.model_dump(serialize_as_any=True)
# pipeline_options are not fully serialized by model_dump, dedicated pass
if pdf_format_option.pipeline_options:
data["pipeline_options"] = pdf_format_option.pipeline_options.model_dump()
# Replace `artifacts_path` with a string representation
data["pipeline_options"]["artifacts_path"] = repr(
data["pipeline_options"]["artifacts_path"]
data["pipeline_options"] = pdf_format_option.pipeline_options.model_dump(
serialize_as_any=True, mode="json"
)
# Replace `pipeline_cls` with a string representation
@@ -59,15 +56,11 @@ def _hash_pdf_format_option(pdf_format_option: PdfFormatOption) -> bytes:
# Replace `backend` with a string representation
data["backend"] = repr(data["backend"])
# Handle `device` in `accelerator_options`
if "accelerator_options" in data and "device" in data["accelerator_options"]:
data["accelerator_options"]["device"] = repr(
data["accelerator_options"]["device"]
)
# Serialize the dictionary to JSON with sorted keys to have consistent hashes
serialized_data = json.dumps(data, sort_keys=True)
options_hash = hashlib.sha1(serialized_data.encode()).digest()
options_hash = hashlib.sha1(
serialized_data.encode(), usedforsecurity=False
).digest()
return options_hash
@@ -150,6 +143,9 @@ def _parse_standard_pdf_opts(
request.picture_description_api.model_dump()
)
)
pipeline_options.picture_description_options.picture_area_threshold = (
request.picture_description_area_threshold
)
return pipeline_options

View File

@@ -130,13 +130,13 @@ class AsyncKfpOrchestrator(BaseAsyncOrchestrator):
# CANCELED = "CANCELED"
# PAUSED = "PAUSED"
if run_info.state == V2beta1RuntimeState.SUCCEEDED:
task.task_status = TaskStatus.SUCCESS
task.set_status(TaskStatus.SUCCESS)
elif run_info.state == V2beta1RuntimeState.PENDING:
task.task_status = TaskStatus.PENDING
task.set_status(TaskStatus.PENDING)
elif run_info.state == V2beta1RuntimeState.RUNNING:
task.task_status = TaskStatus.STARTED
task.set_status(TaskStatus.STARTED)
else:
task.task_status = TaskStatus.FAILURE
task.set_status(TaskStatus.FAILURE)
async def task_status(self, task_id: str, wait: float = 0.0) -> Task:
await self._update_task_from_run(task_id=task_id, wait=wait)

View File

@@ -36,7 +36,7 @@ class AsyncLocalWorker:
task = self.orchestrator.tasks[task_id]
try:
task.task_status = TaskStatus.STARTED
task.set_status(TaskStatus.STARTED)
_log.info(f"Worker {self.worker_id} processing task {task_id}")
# Notify clients about task updates
@@ -106,7 +106,7 @@ class AsyncLocalWorker:
task.sources = []
task.options = None
task.task_status = TaskStatus.SUCCESS
task.set_status(TaskStatus.SUCCESS)
_log.info(
f"Worker {self.worker_id} completed job {task_id} "
f"in {processing_time:.2f} seconds"
@@ -116,7 +116,7 @@ class AsyncLocalWorker:
_log.error(
f"Worker {self.worker_id} failed to process job {task_id}: {e}"
)
task.task_status = TaskStatus.FAILURE
task.set_status(TaskStatus.FAILURE)
finally:
await self.orchestrator.notify_task_subscribers(task_id)

View File

@@ -1,3 +1,6 @@
import asyncio
import datetime
import logging
import shutil
from typing import Union
@@ -20,6 +23,8 @@ from docling_serve.engines.base_orchestrator import (
)
from docling_serve.settings import docling_serve_settings
_log = logging.getLogger(__name__)
class ProgressInvalid(OrchestratorError):
pass
@@ -46,13 +51,50 @@ class BaseAsyncOrchestrator(BaseOrchestrator):
async def task_result(
self, task_id: str, background_tasks: BackgroundTasks
) -> Union[ConvertDocumentResponse, FileResponse, None]:
task = await self.get_raw_task(task_id=task_id)
if task.is_completed() and task.scratch_dir is not None:
if docling_serve_settings.single_use_results:
background_tasks.add_task(
shutil.rmtree, task.scratch_dir, ignore_errors=True
)
return task.result
try:
task = await self.get_raw_task(task_id=task_id)
if task.is_completed() and docling_serve_settings.single_use_results:
if task.scratch_dir is not None:
background_tasks.add_task(
shutil.rmtree, task.scratch_dir, ignore_errors=True
)
async def _remove_task_impl():
await asyncio.sleep(docling_serve_settings.result_removal_delay)
await self.delete_task(task_id=task.task_id)
async def _remove_task():
asyncio.create_task(_remove_task_impl()) # noqa: RUF006
background_tasks.add_task(_remove_task)
return task.result
except TaskNotFoundError:
return None
async def delete_task(self, task_id: str):
_log.info(f"Deleting {task_id=}")
if task_id in self.task_subscribers:
for websocket in self.task_subscribers[task_id]:
await websocket.close()
del self.task_subscribers[task_id]
if task_id in self.tasks:
del self.tasks[task_id]
async def clear_results(self, older_than: float = 0.0):
cutoff_time = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(
seconds=older_than
)
tasks_to_delete = [
task_id
for task_id, task in self.tasks.items()
if task.finished_at is not None and task.finished_at < cutoff_time
]
for task_id in tasks_to_delete:
await self.delete_task(task_id=task_id)
async def notify_task_subscribers(self, task_id: str):
if task_id not in self.task_subscribers:

View File

@@ -42,6 +42,10 @@ class BaseOrchestrator(ABC):
) -> Union[ConvertDocumentResponse, FileResponse, None]:
pass
@abstractmethod
async def clear_results(self, older_than: float = 0.0):
pass
@abstractmethod
async def process_queue(self):
pass

View File

@@ -29,7 +29,7 @@ logger = logging.getLogger(__name__)
############################
logo_path = "https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg"
js_components_url = "https://unpkg.com/@docling/docling-components@0.0.6"
js_components_url = "https://unpkg.com/@docling/docling-components@0.0.7"
if (
docling_serve_settings.static_path is not None
and docling_serve_settings.static_path.is_dir()
@@ -83,7 +83,7 @@ css = """
height: 140px;
}
docling-img::part(pages) {
docling-img {
gap: 1rem;
}
@@ -443,7 +443,7 @@ def response_to_output(response, return_as_file):
)
# Embed document JSON and trigger load at client via an image.
json_rendered_content = f"""
<docling-img id="dclimg" pagenumbers tooltip="parsed"></docling-img>
<docling-img id="dclimg" pagenumbers><docling-tooltip></docling-tooltip></docling-img>
<script id="dcljson" type="application/json" onload="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);">{json_content}</script>
<img src onerror="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);" />
"""

View File

@@ -27,6 +27,7 @@ def _export_document_as_content(
export_txt: bool,
export_doctags: bool,
image_mode: ImageRefMode,
md_page_break_placeholder: str,
):
document = DocumentResponse(filename=conv_res.input.file.name)
@@ -40,10 +41,14 @@ def _export_document_as_content(
document.html_content = new_doc.export_to_html(image_mode=image_mode)
if export_txt:
document.text_content = new_doc.export_to_markdown(
strict_text=True, image_mode=image_mode
strict_text=True,
image_mode=image_mode,
)
if export_md:
document.md_content = new_doc.export_to_markdown(image_mode=image_mode)
document.md_content = new_doc.export_to_markdown(
image_mode=image_mode,
page_break_placeholder=md_page_break_placeholder or None,
)
if export_doctags:
document.doctags_content = new_doc.export_to_doctags()
elif conv_res.status == ConversionStatus.SKIPPED:
@@ -63,6 +68,7 @@ def _export_documents_as_files(
export_txt: bool,
export_doctags: bool,
image_export_mode: ImageRefMode,
md_page_break_placeholder: str,
):
success_count = 0
failure_count = 0
@@ -103,7 +109,9 @@ def _export_documents_as_files(
fname = output_dir / f"{doc_filename}.md"
_log.info(f"writing Markdown output to {fname}")
conv_res.document.save_as_markdown(
filename=fname, image_mode=image_export_mode
filename=fname,
image_mode=image_export_mode,
page_break_placeholder=md_page_break_placeholder or None,
)
# Export Document Tags format:
@@ -170,6 +178,7 @@ def process_results(
export_txt=export_txt,
export_doctags=export_doctags,
image_mode=conversion_options.image_export_mode,
md_page_break_placeholder=conversion_options.md_page_break_placeholder,
)
response = ConvertDocumentResponse(
@@ -198,6 +207,7 @@ def process_results(
export_txt=export_txt,
export_doctags=export_doctags,
image_export_mode=conversion_options.image_export_mode,
md_page_break_placeholder=conversion_options.md_page_break_placeholder,
)
files = os.listdir(output_dir)

View File

@@ -40,6 +40,7 @@ class DoclingServeSettings(BaseSettings):
static_path: Optional[Path] = None
scratch_path: Optional[Path] = None
single_use_results: bool = True
result_removal_delay: float = 300 # 5 minutes
options_cache_size: int = 2
enable_remote_services: bool = False
allow_external_plugins: bool = False
@@ -48,6 +49,8 @@ class DoclingServeSettings(BaseSettings):
max_num_pages: int = sys.maxsize
max_file_size: int = sys.maxsize
max_sync_wait: int = 120 # 2 minutes
cors_origins: list[str] = ["*"]
cors_methods: list[str] = ["*"]
cors_headers: list[str] = ["*"]

View File

@@ -42,9 +42,11 @@ THe following table describes the options to configure the Docling Serve app.
| | `DOCLING_SERVE_ENABLE_REMOTE_SERVICES` | `false` | Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |
| | `DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS` | `false` | Allow the selection of third-party plugins. |
| | `DOCLING_SERVE_SINGLE_USE_RESULTS` | `true` | If true, results can be accessed only once. If false, the results accumulate in the scratch directory. |
| | `DOCLING_SERVE_RESULT_REMOVAL_DELAY` | `300` | When `DOCLING_SERVE_SINGLE_USE_RESULTS` is active, this is the delay before results are removed from the task registry. |
| | `DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT` | `604800` (7 days) | The maximum time for processing a document. |
| | `DOCLING_SERVE_MAX_NUM_PAGES` | | The maximum number of pages for a document to be processed. |
| | `DOCLING_SERVE_MAX_FILE_SIZE` | | The maximum file size for a document to be processed. |
| | `DOCLING_SERVE_MAX_SYNC_WAIT` | `120` | Max number of seconds a synchronous endpoint is waiting for the task completion. |
| | `DOCLING_SERVE_OPTIONS_CACHE_SIZE` | `2` | How many DocumentConveter objects (including their loaded models) to keep in the cache. |
| | `DOCLING_SERVE_CORS_ORIGINS` | `["*"]` | A list of origins that should be permitted to make cross-origin requests. |
| | `DOCLING_SERVE_CORS_METHODS` | `["*"]` | A list of HTTP methods that should be allowed for cross-origin requests. |

View File

@@ -0,0 +1,47 @@
kind: Deployment
apiVersion: apps/v1
metadata:
name: docling-serve
labels:
app: docling-serve
component: docling-serve-api
spec:
replicas: 1
selector:
matchLabels:
app: docling-serve
component: docling-serve-api
template:
metadata:
labels:
app: docling-serve
component: docling-serve-api
spec:
restartPolicy: Always
containers:
- name: api
resources:
limits:
cpu: 500m
memory: 2Gi
requests:
cpu: 250m
memory: 1Gi
env:
- name: DOCLING_SERVE_ENABLE_UI
value: 'true'
- name: DOCLING_SERVE_ARTIFACTS_PATH
value: '/modelcache'
ports:
- name: http
containerPort: 5001
protocol: TCP
imagePullPolicy: Always
image: 'ghcr.io/docling-project/docling-serve-cpu'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc

View File

@@ -0,0 +1,33 @@
apiVersion: batch/v1
kind: Job
metadata:
name: docling-model-cache-load
spec:
selector: {}
template:
metadata:
name: docling-model-load
spec:
containers:
- name: loader
image: ghcr.io/docling-project/docling-serve-cpu:main
command:
- docling-tools
- models
- download
- '--output-dir=/modelcache'
- 'layout'
- 'tableformer'
- 'code_formula'
- 'picture_classifier'
- 'smolvlm'
- 'granite_vision'
- 'easyocr'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc
restartPolicy: Never

View File

@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docling-model-cache-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi

103
docs/pre-loading-models.md Normal file
View File

@@ -0,0 +1,103 @@
# Pre-loading models for docling
This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.
1. We need to create a persistent volume that will store models weights:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docling-model-cache-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Gi
```
If you don't want to use default storage class, set your custom storage class with following:
```yaml
spec:
...
storageClassName: <Storage Class Name>
```
Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml)
2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: docling-model-cache-load
spec:
selector: {}
template:
metadata:
name: docling-model-load
spec:
containers:
- name: loader
image: ghcr.io/docling-project/docling-serve-cpu:main
command:
- docling-tools
- models
- download
- '--output-dir=/modelcache'
- 'layout'
- 'tableformer'
- 'code_formula'
- 'picture_classifier'
- 'smolvlm'
- 'granite_vision'
- 'easyocr'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc
restartPolicy: Never
```
The job will mount previously created persistent volume and execute command similar to how we would load models locally:
`docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]`
In manifest, we specify desired models individually, or we can use `--all` parameter to download all models.
Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
Following additions to deploymeny should be made:
```yaml
spec:
template:
spec:
containers:
- name: api
env:
...
- name: DOCLING_SERVE_ARTIFACTS_PATH
value: '/modelcache'
volumeMounts:
- name: docling-model-cache
mountPath: /modelcache
...
volumes:
- name: docling-model-cache
persistentVolumeClaim:
claimName: docling-model-cache-pvc
```
Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.
Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)

View File

@@ -9,6 +9,7 @@ On top of the source of file (see below), both endpoints support the same parame
- `from_formats` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
- `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
- `pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
- `page_range` (tuple). If speficied, only convert a range of pages. The page number starts at 1.
- `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
- `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
- `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
@@ -18,11 +19,13 @@ On top of the source of file (see below), both endpoints support the same parame
- `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`.
- `abort_on_error` (bool): If enabled, abort on error. Defaults to false.
- `return_as_file` (boo): If enabled, return the output as a file. Defaults to false.
- `md_page_break_placeholder` (str): Add this placeholder betweek pages in the markdown output.
- `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
- `do_code_enrichment` (bool): If enabled, perform OCR code enrichment. Defaults to false.
- `do_formula_enrichment` (bool): If enabled, perform formula OCR, return LaTeX code. Defaults to false.
- `do_picture_classification` (bool): If enabled, classify pictures in documents. Defaults to false.
- `do_picture_description` (bool): If enabled, describe pictures in documents. Defaults to false.
- `picture_description_area_threshold` (float): Minimum percentage of the area for a picture to be processed with the models. Defaults to 0.05.
- `picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with picture_description_api.
- `picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with picture_description_local.
- `include_images` (bool): If enabled, images will be extracted from the document. Defaults to false.
@@ -243,7 +246,7 @@ files = {
'files': ('2206.01062v1.pdf', open(file_path, 'rb'), 'application/pdf'),
}
response = await async_client.post(url, files=files, data={"parameters": json.dumps(parameters)})
response = await async_client.post(url, files=files, data=parameters)
assert response.status_code == 200, "Response should be 200 OK"
data = response.json()
@@ -347,4 +350,92 @@ The response can be a JSON Document or a File.
## Asynchronous API
TBA
Both `/v1alpha/convert/source` and `/v1alpha/convert/file` endpoints are available as asynchronous variants.
The advantage of the asynchronous endpoints is the possible to interrupt the connection, check for the progress update and fetch the result.
This approach is more resilient against network stabilities and allows the client application logic to easily interleave conversion with other tasks.
Launch an asynchronous conversion with:
- `POST /v1alpha/convert/source/async` when providing the input as sources.
- `POST /v1alpha/convert/file/async` when providing the input as multipart-form files.
The response format is a task detail:
```jsonc
{
"task_id": "<task_id>", // the task_id which can be used for the next operations
"task_status": "pending|started|success|failure", // the task status
"task_position": 1, // the position in the queue
"task_meta": null, // metadata e.g. how many documents are in the total job and how many have been converted
}
```
### Polling status
For checking the progress of the conversion task and wait for its completion, use the endpoint:
- `GET /v1alpha/status/poll/{task_id}`
<details>
<summary>Example waiting loop:</summary>
```python
import time
import httpx
# ...
# response from the async task submission
task = response.json()
while task["task_status"] not in ("success", "failure"):
response = httpx.get(f"{base_url}/status/poll/{task['task_id']}")
task = response.json()
time.sleep(5)
```
<details>
### Subscribe with websockets
Using websocket you can get the client application being notified about updates of the conversion task.
To start the websocker connection, use the endpoint:
- `/v1alpha/status/ws/{task_id}`
Websocket messages are JSON object with the following structure:
```jsonc
{
"message": "connection|update|error", // type of message being sent
"task": {}, // the same content of the task description
"error": "", // description of the error
}
```
<details>
<summary>Example websocker usage:</summary>
```python
from websockets.sync.client import connect
uri = f"ws://{base_url}/v1alpha/status/ws/{task['task_id']}"
with connect(uri) as websocket:
for message in websocket:
try:
payload = json.loads(message)
if payload["message"] == "error":
break
if payload["message"] == "error" and payload["task"]["task_status"] in ("success", "failure"):
break
except:
break
```
</details>
### Fetch results
When the task is completed, the result can be fetched with the endpoint:
- `GET /v1alpha/result/{task_id}`

View File

@@ -1,6 +1,6 @@
[project]
name = "docling-serve"
version = "0.10.0" # DO NOT EDIT, updated automatically
version = "0.11.0" # DO NOT EDIT, updated automatically
description = "Running Docling as a service"
license = {text = "MIT"}
authors = [
@@ -63,6 +63,9 @@ cu124 = [
"torch>=2.6.0",
"torchvision>=0.21.0",
]
flash-attn = [
"flash-attn~=2.7.0; sys_platform == 'linux' and platform_machine == 'x86_64'"
]
[dependency-groups]
dev = [
@@ -83,7 +86,10 @@ conflicts = [
{ extra = "cpu" },
{ extra = "cu124" },
],
]
[
{ extra = "cpu" },
{ extra = "flash-attn" },
],]
environments = ["sys_platform != 'darwin' or platform_machine != 'x86_64'"]
override-dependencies = [
"urllib3~=2.0"

View File

@@ -51,10 +51,12 @@ async def test_convert_url(async_client):
time.sleep(2)
assert task["task_status"] == "success"
print(f"Task completed with status {task['task_status']=}")
result_resp = await async_client.get(f"{base_url}/result/{task['task_id']}")
assert result_resp.status_code == 200, "Response should be 200 OK"
result = result_resp.json()
print("Got result.")
assert "md_content" in result["document"]
assert result["document"]["md_content"] is not None

View File

@@ -0,0 +1,54 @@
from docling_serve.datamodel.convert import (
ConvertDocumentsOptions,
PictureDescriptionApi,
)
from docling_serve.docling_conversion import (
_hash_pdf_format_option,
get_pdf_pipeline_opts,
)
def test_options_cache_key():
hashes = set()
opts = ConvertDocumentsOptions()
pipeline_opts = get_pdf_pipeline_opts(opts)
hash = _hash_pdf_format_option(pipeline_opts)
assert hash not in hashes
hashes.add(hash)
opts.do_picture_description = True
pipeline_opts = get_pdf_pipeline_opts(opts)
hash = _hash_pdf_format_option(pipeline_opts)
# pprint(pipeline_opts.pipeline_options.model_dump(serialize_as_any=True))
assert hash not in hashes
hashes.add(hash)
opts.picture_description_api = PictureDescriptionApi(
url="http://localhost",
params={"model": "mymodel"},
prompt="Hello 1",
)
pipeline_opts = get_pdf_pipeline_opts(opts)
hash = _hash_pdf_format_option(pipeline_opts)
# pprint(pipeline_opts.pipeline_options.model_dump(serialize_as_any=True))
assert hash not in hashes
hashes.add(hash)
opts.picture_description_api = PictureDescriptionApi(
url="http://localhost",
params={"model": "your-model"},
prompt="Hello 1",
)
pipeline_opts = get_pdf_pipeline_opts(opts)
hash = _hash_pdf_format_option(pipeline_opts)
# pprint(pipeline_opts.pipeline_options.model_dump(serialize_as_any=True))
assert hash not in hashes
hashes.add(hash)
opts.picture_description_api.prompt = "World"
pipeline_opts = get_pdf_pipeline_opts(opts)
hash = _hash_pdf_format_option(pipeline_opts)
# pprint(pipeline_opts.pipeline_options.model_dump(serialize_as_any=True))
assert hash not in hashes
hashes.add(hash)

127
tests/test_results_clear.py Normal file
View File

@@ -0,0 +1,127 @@
import asyncio
import base64
import json
from pathlib import Path
import pytest
import pytest_asyncio
from asgi_lifespan import LifespanManager
from httpx import ASGITransport, AsyncClient
from docling_serve.app import create_app
from docling_serve.settings import docling_serve_settings
@pytest.fixture(scope="session")
def event_loop():
return asyncio.get_event_loop()
@pytest_asyncio.fixture(scope="session")
async def app():
app = create_app()
async with LifespanManager(app) as manager:
print("Launching lifespan of app.")
yield manager.app
@pytest_asyncio.fixture(scope="session")
async def client(app):
async with AsyncClient(
transport=ASGITransport(app=app), base_url="http://app.io"
) as client:
print("Client is ready")
yield client
async def convert_file(client: AsyncClient):
doc_filename = Path("tests/2408.09869v5.pdf")
encoded_doc = base64.b64encode(doc_filename.read_bytes()).decode()
payload = {
"options": {
"to_formats": ["json"],
},
"file_sources": [{"base64_string": encoded_doc, "filename": doc_filename.name}],
}
response = await client.post("/v1alpha/convert/source/async", json=payload)
assert response.status_code == 200, "Response should be 200 OK"
task = response.json()
print(json.dumps(task, indent=2))
while task["task_status"] not in ("success", "failure"):
response = await client.get(f"/v1alpha/status/poll/{task['task_id']}")
assert response.status_code == 200, "Response should be 200 OK"
task = response.json()
print(f"{task['task_status']=}")
print(f"{task['task_position']=}")
await asyncio.sleep(2)
assert task["task_status"] == "success"
return task
@pytest.mark.asyncio
async def test_clear_results(client: AsyncClient):
"""Test removal of task."""
# Set long delay deletion
docling_serve_settings.result_removal_delay = 100
# Convert and wait for completion
task = await convert_file(client)
# Get result once
result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
assert result_response.status_code == 200, "Response should be 200 OK"
print("Result 1 ok.")
result = result_response.json()
assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
# Get result twice
result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
assert result_response.status_code == 200, "Response should be 200 OK"
print("Result 2 ok.")
result = result_response.json()
assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
# Clear
clear_response = await client.get("/v1alpha/clear/results?older_then=0")
assert clear_response.status_code == 200, "Response should be 200 OK"
print("Clear ok.")
# Get deleted result
result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
assert result_response.status_code == 404, "Response should be removed"
print("Result was no longer found.")
@pytest.mark.asyncio
async def test_delay_remove(client: AsyncClient):
"""Test automatic removal of task with delay."""
# Set short delay deletion
docling_serve_settings.result_removal_delay = 5
# Convert and wait for completion
task = await convert_file(client)
# Get result once
result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
assert result_response.status_code == 200, "Response should be 200 OK"
print("Result ok.")
result = result_response.json()
assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
print("Sleeping to wait the automatic task deletion.")
await asyncio.sleep(10)
# Get deleted result
result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
assert result_response.status_code == 404, "Response should be removed"

6621
uv.lock generated

File diff suppressed because it is too large Load Diff