chore: bump version to 1.9.0 [skip ci]

chore: avoid installing multiple times dependencies (#429 )
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-29 08:33:50 +00:00 · 2025-11-24 08:43:53 +00:00 · 2025-11-21 15:42:42 +01:00 · 2025-11-21 10:31:56 +01:00 · 2025-11-20 17:57:10 +01:00 · 2025-11-17 08:31:44 +01:00
20 changed files with 2830 additions and 1694 deletions
--- a/.github/styles/config/vocabularies/Docling/accept.txt
+++ b/.github/styles/config/vocabularies/Docling/accept.txt
@@ -4,6 +4,7 @@ asgi
 async
 (?i)urls
 uvicorn
+Config
 [Ww]ebserver
 RQ
 (?i)url
--- a/.github/workflows/discord-release.yml
+++ b/.github/workflows/discord-release.yml
@@ -0,0 +1,42 @@
+# .github/workflows/discord-release.yml
+name: Notify Discord on Release
+
+on:
+  release:
+    types: [published]
+
+jobs:
+  discord:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Send release info to Discord
+        env:
+          DISCORD_WEBHOOK: ${{ secrets.RELEASES_DISCORD_WEBHOOK }}
+        run: |
+          REPO_NAME=${{ github.repository }}
+          RELEASE_TAG=${{ github.event.release.tag_name }}
+          RELEASE_NAME="${{ github.event.release.name }}"
+          RELEASE_URL=${{ github.event.release.html_url }}
+
+          # Capture the body safely (handles backticks, $, ", etc.)
+          RELEASE_BODY=$(cat <<'EOF'
+            ${{ github.event.release.body }}
+          EOF
+          )
+
+          # Fallback if release name is empty
+          if [ -z "$RELEASE_NAME" ]; then
+            RELEASE_NAME=$RELEASE_TAG
+          fi
+
+          PAYLOAD=$(jq -n \
+          --arg title "🚀 New Release: $RELEASE_NAME" \
+          --arg url "$RELEASE_URL" \
+          --arg desc "$RELEASE_BODY" \
+          --arg author_name "$REPO_NAME" \
+          --arg author_icon "https://github.com/docling-project.png" \
+          '{embeds: [{title: $title, url: $url, description: $desc, color: 5814783, author: {name: $author_name, icon_url: $author_icon}}]}')
+
+          curl -H "Content-Type: application/json" \
+               -d "$PAYLOAD" \
+               "$DISCORD_WEBHOOK"
--- a/.github/workflows/job-image.yml
+++ b/.github/workflows/job-image.yml
@@ -160,13 +160,10 @@ jobs:
              pip install uv
              uv venv --allow-existing
              source .venv/bin/activate
-              uv sync --all-extras --no-extra flash-attn
+              uv sync --only-dev

              # Run pytest tests
              echo "Running tests..."
-              # Test import
-              python -c 'from docling_serve.app import create_app; create_app()'
-
              # Run pytest and check result directly
              if ! pytest -sv -k "test_convert_url" tests/test_1-url-async.py \
                --disable-warnings; then
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -7,12 +7,12 @@ repos:
      - id: ruff-format
        name: "Ruff formatter"
        args: [--config=pyproject.toml]
-        files: '^(docling_serve|tests|examples).*\.(py|ipynb)$'
+        files: '^(docling_serve|tests|examples|scripts).*\.(py|ipynb)$'
      # Run the Ruff linter.
      - id: ruff
        name: "Ruff linter"
        args: [--exit-non-zero-on-fix, --fix, --config=pyproject.toml]
-        files: '^(docling_serve|tests|examples).*\.(py|ipynb)$'
+        files: '^(docling_serve|tests|examples|scripts).*\.(py|ipynb)$'
  - repo: local
    hooks:
      - id: system
@@ -21,6 +21,15 @@ repos:
        pass_filenames: false
        language: system
        files: '\.py$'
+  - repo: local
+    hooks:
+      - id: update-docs-common-parameters
+        name: Update Documentation File
+        entry: uv run scripts/update_doc_usage.py
+        language: python
+        pass_filenames: false
+        # Fail the commit if documentation generation fails
+        require_serial: true
  - repo: https://github.com/errata-ai/vale
    rev: v3.12.0  # Use latest stable version
    hooks:
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,79 @@
+## [v1.9.0](https://github.com/docling-project/docling-serve/releases/tag/v1.9.0) - 2025-11-24
+
+### Feature
+
+* Version endpoint ([#442](https://github.com/docling-project/docling-serve/issues/442)) ([`2c23f65`](https://github.com/docling-project/docling-serve/commit/2c23f65507d7699694debd7faa0de840ef2d2cb7))
+
+### Fix
+
+* Dependencies updates – Docling 2.63.0 ([#443](https://github.com/docling-project/docling-serve/issues/443)) ([`e437e83`](https://github.com/docling-project/docling-serve/commit/e437e830c956f9a76cd0c62faf9add0231992548))
+
+### Docling libraries included in this release:
+- docling 2.63.0
+- docling-core 2.52.0
+- docling-ibm-models 3.10.2
+- docling-jobkit 1.8.0
+- docling-mcp 1.3.3
+- docling-parse 4.7.1
+- docling-serve 1.9.0
+
+## [v1.8.0](https://github.com/docling-project/docling-serve/releases/tag/v1.8.0) - 2025-10-31
+
+### Feature
+
+* Docling with new standard pipeline with threading ([#428](https://github.com/docling-project/docling-serve/issues/428)) ([`bf132a3`](https://github.com/docling-project/docling-serve/commit/bf132a3c3e615ddbe624841ea5b3a98593c00654))
+
+### Documentation
+
+* Expand automatic docs to nested objects. More complete usage docs. ([#426](https://github.com/docling-project/docling-serve/issues/426)) ([`35319b0`](https://github.com/docling-project/docling-serve/commit/35319b0da793a2a1a434fd2b60b7632e10ecced3))
+* Add docs for docling parameters like performance and debug ([#424](https://github.com/docling-project/docling-serve/issues/424)) ([`f3957ae`](https://github.com/docling-project/docling-serve/commit/f3957aeb577097121fe9d0d21f75a50643f03369))
+
+### Docling libraries included in this release:
+- docling 2.60.0
+- docling-core 2.50.0
+- docling-ibm-models 3.10.2
+- docling-jobkit 1.8.0
+- docling-mcp 1.3.2
+- docling-parse 4.7.0
+- docling-serve 1.8.0
+
+## [v1.7.2](https://github.com/docling-project/docling-serve/releases/tag/v1.7.2) - 2025-10-30
+
+### Fix
+
+* Update locked dependencies. Docling fixes, Expose temperature parameter for vlm models ([#423](https://github.com/docling-project/docling-serve/issues/423)) ([`e9b4140`](https://github.com/docling-project/docling-serve/commit/e9b41406c4116ff79a212877ff6484a1151e144d))
+* Temporary constrain fastapi version ([#418](https://github.com/docling-project/docling-serve/issues/418)) ([`7bf2e7b`](https://github.com/docling-project/docling-serve/commit/7bf2e7b366470e0cf1c4900df7c84becd6a96991))
+
+### Docling libraries included in this release:
+- docling 2.59.0
+- docling-core 2.50.0
+- docling-ibm-models 3.10.2
+- docling-jobkit 1.7.1
+- docling-mcp 1.3.2
+- docling-parse 4.7.0
+- docling-serve 1.7.2
+
+## [v1.7.1](https://github.com/docling-project/docling-serve/releases/tag/v1.7.1) - 2025-10-22
+
+### Fix
+
+* Upgrade dependencies ([#417](https://github.com/docling-project/docling-serve/issues/417)) ([`97613a1`](https://github.com/docling-project/docling-serve/commit/97613a19748e8c152db4a0f62b5a57fca807a33a))
+* Makes task status shared across multiple instances in RQ mode, resolves #378 ([#415](https://github.com/docling-project/docling-serve/issues/415)) ([`0961f2c`](https://github.com/docling-project/docling-serve/commit/0961f2c57425859c76130da3ea8a871d65df4b26))
+* `DOCLING_SERVE_SYNC_POLL_INTERVAL` controls the synchronous polling time ([#413](https://github.com/docling-project/docling-serve/issues/413)) ([`0f274ab`](https://github.com/docling-project/docling-serve/commit/0f274ab135a9bb41accd05db3c12a9dcce220ad9))
+
+### Documentation
+
+* Generate usage.md automatically ([#340](https://github.com/docling-project/docling-serve/issues/340)) ([`9672f31`](https://github.com/docling-project/docling-serve/commit/9672f310b1bb7030af8a276f14691e46f7da0e9e))
+
+### Docling libraries included in this release:
+- docling 2.58.0
+- docling-core 2.49.0
+- docling-ibm-models 3.10.1
+- docling-jobkit 1.7.0
+- docling-mcp 1.3.2
+- docling-parse 4.7.0
+- docling-serve 1.7.1
+
 ## [v1.7.0](https://github.com/docling-project/docling-serve/releases/tag/v1.7.0) - 2025-10-17

 ### Feature
--- a/61
+++ b/61
@@ -16,6 +16,9 @@ else
    PIPE_DEV_NULL=
 endif

+# Container runtime - can be overridden: make CONTAINER_RUNTIME=podman cmd
+CONTAINER_RUNTIME ?= docker
+
 TAG=$(shell git rev-parse HEAD)
 BRANCH_TAG=$(shell git rev-parse --abbrev-ref HEAD)

@@ -28,44 +31,44 @@ md-lint-file:
 .PHONY: docling-serve-image
 docling-serve-image: Containerfile ## Build docling-serve container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve]"
-	$(CMD_PREFIX) docker build --load -f Containerfile -t ghcr.io/docling-project/docling-serve:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) ghcr.io/docling-project/docling-serve:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) quay.io/docling-project/docling-serve:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load -f Containerfile -t ghcr.io/docling-project/docling-serve:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve:$(TAG) ghcr.io/docling-project/docling-serve:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve:$(TAG) quay.io/docling-project/docling-serve:$(BRANCH_TAG)

 .PHONY: docling-serve-cpu-image
 docling-serve-cpu-image: Containerfile ## Build docling-serve "cpu only" container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve CPU]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) ghcr.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) quay.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) ghcr.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) quay.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)

 .PHONY: docling-serve-cu124-image
 docling-serve-cu124-image: Containerfile ## Build docling-serve container image with CUDA 12.4 support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.4]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu124:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) ghcr.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) quay.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu124:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) ghcr.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) quay.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)

 .PHONY: docling-serve-cu126-image
 docling-serve-cu126-image: Containerfile ## Build docling-serve container image with CUDA 12.6 support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.6]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu126:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) ghcr.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) quay.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu126:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) ghcr.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) quay.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)

 .PHONY: docling-serve-cu128-image
 docling-serve-cu128-image: Containerfile ## Build docling-serve container image with CUDA 12.8 support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.8]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu128:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) ghcr.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) quay.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu128:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) ghcr.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) quay.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)

 .PHONY: docling-serve-rocm-image
 docling-serve-rocm-image: Containerfile ## Build docling-serve container image with ROCm support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with ROCm 6.3]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group rocm --no-extra flash-attn" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-rocm:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-rocm:$(TAG) ghcr.io/docling-project/docling-serve-rocm:$(BRANCH_TAG)
-	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-rocm:$(TAG) quay.io/docling-project/docling-serve-rocm:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group rocm --no-extra flash-attn" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-rocm:$(TAG) .
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-rocm:$(TAG) ghcr.io/docling-project/docling-serve-rocm:$(BRANCH_TAG)
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) tag ghcr.io/docling-project/docling-serve-rocm:$(TAG) quay.io/docling-project/docling-serve-rocm:$(BRANCH_TAG)

 .PHONY: action-lint
 action-lint: .action-lint ##      Lint GitHub Action workflows
@@ -88,7 +91,7 @@ action-lint: .action-lint ##      Lint GitHub Action workflows
 md-lint: .md-lint ##      Lint markdown files
 .md-lint: $(wildcard */**/*.md) | md-lint-file
 	$(ECHO_PREFIX) printf "  %-12s ./...\n" "[MD LINT]"
-	$(CMD_PREFIX) docker run --rm -v $$(pwd):/workdir davidanson/markdownlint-cli2:v0.16.0 "**/*.md" "#.venv"
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run --rm -v $$(pwd):/workdir davidanson/markdownlint-cli2:v0.16.0 "**/*.md" "#.venv"
 	$(CMD_PREFIX) touch $@

 .PHONY: py-Lint
@@ -104,34 +107,34 @@ py-lint: ##      Lint Python files
 .PHONY: run-docling-cpu
 run-docling-cpu: ## Run the docling-serve container with CPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-cpu 2>/dev/null || true
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) rm -f docling-serve-cpu 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with CPU support on port 5001...\n" "[RUN CPU]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-cpu -p 5001:5001 ghcr.io/docling-project/docling-serve-cpu:main
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run -it --name docling-serve-cpu -p 5001:5001 ghcr.io/docling-project/docling-serve-cpu:main

 .PHONY: run-docling-cu124
 run-docling-cu124: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-cu124 2>/dev/null || true
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) rm -f docling-serve-cu124 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN CUDA 12.4]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-cu124 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu124:main
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run -it --name docling-serve-cu124 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu124:main

 .PHONY: run-docling-cu126
 run-docling-cu126: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-cu126 2>/dev/null || true
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) rm -f docling-serve-cu126 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN CUDA 12.6]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-cu126 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu126:main
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run -it --name docling-serve-cu126 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu126:main

 .PHONY: run-docling-cu128
 run-docling-cu128: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-cu128 2>/dev/null || true
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) rm -f docling-serve-cu128 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN CUDA 12.8]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-cu128 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu128:main
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run -it --name docling-serve-cu128 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu128:main

 .PHONY: run-docling-rocm
 run-docling-rocm: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-rocm 2>/dev/null || true
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) rm -f docling-serve-rocm 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN ROCm 6.3]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-rocm -p 5001:5001 ghcr.io/docling-project/docling-serve-rocm:main
+	$(CMD_PREFIX) $(CONTAINER_RUNTIME) run -it --name docling-serve-rocm -p 5001:5001 ghcr.io/docling-project/docling-serve-rocm:main
--- a/docling_serve/main.py
+++ b/docling_serve/main.py
@@ -30,7 +30,7 @@ logger = logging.getLogger(__name__)

 def version_callback(value: bool) -> None:
    if value:
-        docling_serve_version = importlib.metadata.version("docling_serve")
+        docling_serve_version = importlib.metadata.version("docling-serve")
        docling_jobkit_version = importlib.metadata.version("docling-jobkit")
        docling_version = importlib.metadata.version("docling")
        docling_core_version = importlib.metadata.version("docling-core")
@@ -385,6 +385,11 @@ def rq_worker() -> Any:
        allow_external_plugins=docling_serve_settings.allow_external_plugins,
        max_num_pages=docling_serve_settings.max_num_pages,
        max_file_size=docling_serve_settings.max_file_size,
+        queue_max_size=docling_serve_settings.queue_max_size,
+        ocr_batch_size=docling_serve_settings.ocr_batch_size,
+        layout_batch_size=docling_serve_settings.layout_batch_size,
+        table_batch_size=docling_serve_settings.table_batch_size,
+        batch_polling_interval_seconds=docling_serve_settings.batch_polling_interval_seconds,
    )

    run_worker(
--- a/docling_serve/app.py
+++ b/docling_serve/app.py
@@ -76,7 +76,7 @@ from docling_serve.datamodel.responses import (
    TaskStatusResponse,
    WebsocketMessage,
 )
-from docling_serve.helper_functions import FormDepends
+from docling_serve.helper_functions import DOCLING_VERSIONS, FormDepends
 from docling_serve.orchestrator_factory import get_async_orchestrator
 from docling_serve.response_preparation import prepare_response
 from docling_serve.settings import docling_serve_settings
@@ -341,7 +341,7 @@ def create_app():  # noqa: C901
            task = await orchestrator.task_status(task_id=task_id)
            if task.is_completed():
                return True
-            await asyncio.sleep(5)
+            await asyncio.sleep(docling_serve_settings.sync_poll_interval)
            elapsed_time = time.monotonic() - start_time
            if elapsed_time > docling_serve_settings.max_sync_wait:
                return False
@@ -437,6 +437,16 @@ def create_app():  # noqa: C901
    def api_check() -> HealthCheckResponse:
        return HealthCheckResponse()

+    # Docling versions
+    @app.get("/version", tags=["health"])
+    def version_info() -> dict:
+        if not docling_serve_settings.show_version_info:
+            raise HTTPException(
+                status_code=status.HTTP_403_FORBIDDEN,
+                detail="Forbidden. The server is configured for not showing version details.",
+            )
+        return DOCLING_VERSIONS
+
    # Convert a document from URL(s)
    @app.post(
        "/v1/convert/source",
@@ -869,7 +879,10 @@ def create_app():  # noqa: C901
        assert isinstance(orchestrator.notifier, WebsocketNotifier)
        await websocket.accept()

-        if task_id not in orchestrator.tasks:
+        try:
+            # Get task status from Redis or RQ directly instead of checking in-memory registry
+            task = await orchestrator.task_status(task_id=task_id)
+        except TaskNotFoundError:
            await websocket.send_text(
                WebsocketMessage(
                    message=MessageKind.ERROR, error="Task not found."
@@ -878,8 +891,6 @@ def create_app():  # noqa: C901
            await websocket.close()
            return

-        task = orchestrator.tasks[task_id]
-
        # Track active WebSocket connections for this job
        orchestrator.notifier.task_subscribers[task_id].add(websocket)

--- a/docling_serve/helper_functions.py
+++ b/docling_serve/helper_functions.py
@@ -1,11 +1,25 @@
+import importlib.metadata
 import inspect
 import json
+import platform
 import re
+import sys
 from typing import Union, get_args, get_origin

 from fastapi import Depends, Form
 from pydantic import BaseModel, TypeAdapter

+DOCLING_VERSIONS = {
+    "docling-serve": importlib.metadata.version("docling-serve"),
+    "docling-jobkit": importlib.metadata.version("docling-jobkit"),
+    "docling": importlib.metadata.version("docling"),
+    "docling-core": importlib.metadata.version("docling-core"),
+    "docling-ibm-models": importlib.metadata.version("docling-ibm-models"),
+    "docling-parse": importlib.metadata.version("docling-parse"),
+    "python": f"{sys.implementation.cache_tag} ({platform.python_version()})",
+    "plaform": platform.platform(),
+}
+

 def is_pydantic_model(type_):
    try:
--- a/docling_serve/orchestrator_factory.py
+++ b/docling_serve/orchestrator_factory.py
@@ -1,10 +1,267 @@
+import json
+import logging
 from functools import lru_cache
+from typing import Any, Optional

-from docling_jobkit.orchestrators.base_orchestrator import BaseOrchestrator
+import redis.asyncio as redis
+
+from docling_jobkit.datamodel.task import Task
+from docling_jobkit.datamodel.task_meta import TaskStatus
+from docling_jobkit.orchestrators.base_orchestrator import (
+    BaseOrchestrator,
+    TaskNotFoundError,
+)

 from docling_serve.settings import AsyncEngine, docling_serve_settings
 from docling_serve.storage import get_scratch

+_log = logging.getLogger(__name__)
+
+
+class RedisTaskStatusMixin:
+    tasks: dict[str, Task]
+    _task_result_keys: dict[str, str]
+    config: Any
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.redis_prefix = "docling:tasks:"
+        self._redis_pool = redis.ConnectionPool.from_url(
+            self.config.redis_url,
+            max_connections=10,
+            socket_timeout=2.0,
+        )
+
+    async def task_status(self, task_id: str, wait: float = 0.0) -> Task:
+        """
+        Get task status by checking Redis first, then falling back to RQ verification.
+
+        When Redis shows 'pending' but RQ shows 'success', we update Redis
+        and return the RQ status for cross-instance consistency.
+        """
+        _log.info(f"Task {task_id} status check")
+
+        # Always check RQ directly first - this is the most reliable source
+        rq_task = await self._get_task_from_rq_direct(task_id)
+        if rq_task:
+            _log.info(f"Task {task_id} in RQ: {rq_task.task_status}")
+
+            # Update memory registry
+            self.tasks[task_id] = rq_task
+
+            # Store/update in Redis for other instances
+            await self._store_task_in_redis(rq_task)
+            return rq_task
+
+        # If not in RQ, check Redis (maybe it's cached from another instance)
+        task = await self._get_task_from_redis(task_id)
+        if task:
+            _log.info(f"Task {task_id} in Redis: {task.task_status}")
+
+            # CRITICAL FIX: Check if Redis status might be stale
+            # STARTED tasks might have completed since they were cached
+            if task.task_status in [TaskStatus.PENDING, TaskStatus.STARTED]:
+                _log.debug(f"Task {task_id} verifying stale status")
+
+                # Try to get fresh status from RQ
+                fresh_rq_task = await self._get_task_from_rq_direct(task_id)
+                if fresh_rq_task and fresh_rq_task.task_status != task.task_status:
+                    _log.info(
+                        f"Task {task_id} status updated: {fresh_rq_task.task_status}"
+                    )
+
+                    # Update memory and Redis with fresh status
+                    self.tasks[task_id] = fresh_rq_task
+                    await self._store_task_in_redis(fresh_rq_task)
+                    return fresh_rq_task
+                else:
+                    _log.debug(f"Task {task_id} status consistent")
+
+            return task
+
+        # Fall back to parent implementation
+        try:
+            parent_task = await super().task_status(task_id, wait)  # type: ignore[misc]
+            _log.debug(f"Task {task_id} from parent: {parent_task.task_status}")
+
+            # Store in Redis for other instances to find
+            await self._store_task_in_redis(parent_task)
+            return parent_task
+        except TaskNotFoundError:
+            _log.warning(f"Task {task_id} not found")
+            raise
+
+    async def _get_task_from_redis(self, task_id: str) -> Optional[Task]:
+        try:
+            async with redis.Redis(connection_pool=self._redis_pool) as r:
+                task_data = await r.get(f"{self.redis_prefix}{task_id}:metadata")
+                if not task_data:
+                    return None
+
+                data: dict[str, Any] = json.loads(task_data)
+                meta = data.get("processing_meta") or {}
+                meta.setdefault("num_docs", 0)
+                meta.setdefault("num_processed", 0)
+                meta.setdefault("num_succeeded", 0)
+                meta.setdefault("num_failed", 0)
+
+                return Task(
+                    task_id=data["task_id"],
+                    task_type=data["task_type"],
+                    task_status=TaskStatus(data["task_status"]),
+                    processing_meta=meta,
+                )
+        except Exception as e:
+            _log.error(f"Redis get task {task_id}: {e}")
+            return None
+
+    async def _get_task_from_rq_direct(self, task_id: str) -> Optional[Task]:
+        try:
+            _log.debug(f"Checking RQ for task {task_id}")
+
+            temp_task = Task(
+                task_id=task_id,
+                task_type="convert",
+                task_status=TaskStatus.PENDING,
+                processing_meta={
+                    "num_docs": 0,
+                    "num_processed": 0,
+                    "num_succeeded": 0,
+                    "num_failed": 0,
+                },
+            )
+
+            original_task = self.tasks.get(task_id)
+            self.tasks[task_id] = temp_task
+
+            try:
+                await super()._update_task_from_rq(task_id)  # type: ignore[misc]
+
+                updated_task = self.tasks.get(task_id)
+                if updated_task and updated_task.task_status != TaskStatus.PENDING:
+                    _log.debug(f"RQ task {task_id}: {updated_task.task_status}")
+
+                    # Store result key if available
+                    if task_id in self._task_result_keys:
+                        try:
+                            async with redis.Redis(
+                                connection_pool=self._redis_pool
+                            ) as r:
+                                await r.set(
+                                    f"{self.redis_prefix}{task_id}:result_key",
+                                    self._task_result_keys[task_id],
+                                    ex=86400,
+                                )
+                                _log.debug(f"Stored result key for {task_id}")
+                        except Exception as e:
+                            _log.error(f"Store result key {task_id}: {e}")
+
+                    return updated_task
+                return None
+
+            finally:
+                # Restore original task state
+                if original_task:
+                    self.tasks[task_id] = original_task
+                elif task_id in self.tasks and self.tasks[task_id] == temp_task:
+                    # Only remove if it's still our temp task
+                    del self.tasks[task_id]
+
+        except Exception as e:
+            _log.error(f"RQ check {task_id}: {e}")
+            return None
+
+    async def get_raw_task(self, task_id: str) -> Task:
+        if task_id in self.tasks:
+            return self.tasks[task_id]
+
+        task = await self._get_task_from_redis(task_id)
+        if task:
+            self.tasks[task_id] = task
+            return task
+
+        try:
+            parent_task = await super().get_raw_task(task_id)  # type: ignore[misc]
+            await self._store_task_in_redis(parent_task)
+            return parent_task
+        except TaskNotFoundError:
+            raise
+
+    async def _store_task_in_redis(self, task: Task) -> None:
+        try:
+            meta: Any = task.processing_meta
+            if hasattr(meta, "model_dump"):
+                meta = meta.model_dump()
+            elif not isinstance(meta, dict):
+                meta = {
+                    "num_docs": 0,
+                    "num_processed": 0,
+                    "num_succeeded": 0,
+                    "num_failed": 0,
+                }
+
+            data: dict[str, Any] = {
+                "task_id": task.task_id,
+                "task_type": task.task_type.value
+                if hasattr(task.task_type, "value")
+                else str(task.task_type),
+                "task_status": task.task_status.value,
+                "processing_meta": meta,
+            }
+            async with redis.Redis(connection_pool=self._redis_pool) as r:
+                await r.set(
+                    f"{self.redis_prefix}{task.task_id}:metadata",
+                    json.dumps(data),
+                    ex=86400,
+                )
+        except Exception as e:
+            _log.error(f"Store task {task.task_id}: {e}")
+
+    async def enqueue(self, **kwargs):  # type: ignore[override]
+        task = await super().enqueue(**kwargs)  # type: ignore[misc]
+        await self._store_task_in_redis(task)
+        return task
+
+    async def task_result(self, task_id: str):  # type: ignore[override]
+        result = await super().task_result(task_id)  # type: ignore[misc]
+        if result is not None:
+            return result
+
+        try:
+            async with redis.Redis(connection_pool=self._redis_pool) as r:
+                result_key = await r.get(f"{self.redis_prefix}{task_id}:result_key")
+                if result_key:
+                    self._task_result_keys[task_id] = result_key.decode("utf-8")
+                    return await super().task_result(task_id)  # type: ignore[misc]
+        except Exception as e:
+            _log.error(f"Redis result key {task_id}: {e}")
+
+        return None
+
+    async def _update_task_from_rq(self, task_id: str) -> None:
+        original_status = (
+            self.tasks[task_id].task_status if task_id in self.tasks else None
+        )
+
+        await super()._update_task_from_rq(task_id)  # type: ignore[misc]
+
+        if task_id in self.tasks:
+            new_status = self.tasks[task_id].task_status
+            if original_status != new_status:
+                _log.debug(f"Task {task_id} status: {original_status} -> {new_status}")
+                await self._store_task_in_redis(self.tasks[task_id])
+
+        if task_id in self._task_result_keys:
+            try:
+                async with redis.Redis(connection_pool=self._redis_pool) as r:
+                    await r.set(
+                        f"{self.redis_prefix}{task_id}:result_key",
+                        self._task_result_keys[task_id],
+                        ex=86400,
+                    )
+            except Exception as e:
+                _log.error(f"Store result key {task_id}: {e}")
+

@lru_cache
 def get_async_orchestrator() -> BaseOrchestrator:
@@ -31,16 +288,25 @@ def get_async_orchestrator() -> BaseOrchestrator:
            allow_external_plugins=docling_serve_settings.allow_external_plugins,
            max_num_pages=docling_serve_settings.max_num_pages,
            max_file_size=docling_serve_settings.max_file_size,
+            queue_max_size=docling_serve_settings.queue_max_size,
+            ocr_batch_size=docling_serve_settings.ocr_batch_size,
+            layout_batch_size=docling_serve_settings.layout_batch_size,
+            table_batch_size=docling_serve_settings.table_batch_size,
+            batch_polling_interval_seconds=docling_serve_settings.batch_polling_interval_seconds,
        )
        cm = DoclingConverterManager(config=cm_config)

        return LocalOrchestrator(config=local_config, converter_manager=cm)
+
    elif docling_serve_settings.eng_kind == AsyncEngine.RQ:
        from docling_jobkit.orchestrators.rq.orchestrator import (
            RQOrchestrator,
            RQOrchestratorConfig,
        )

+        class RedisAwareRQOrchestrator(RedisTaskStatusMixin, RQOrchestrator):  # type: ignore[misc]
+            pass
+
        rq_config = RQOrchestratorConfig(
            redis_url=docling_serve_settings.eng_rq_redis_url,
            results_prefix=docling_serve_settings.eng_rq_results_prefix,
@@ -48,7 +314,8 @@ def get_async_orchestrator() -> BaseOrchestrator:
            scratch_dir=get_scratch(),
        )

-        return RQOrchestrator(config=rq_config)
+        return RedisAwareRQOrchestrator(config=rq_config)
+
    elif docling_serve_settings.eng_kind == AsyncEngine.KFP:
        from docling_jobkit.orchestrators.kfp.orchestrator import (
            KfpOrchestrator,
--- a/docling_serve/settings.py
+++ b/docling_serve/settings.py
@@ -50,6 +50,7 @@ class DoclingServeSettings(BaseSettings):
    options_cache_size: int = 2
    enable_remote_services: bool = False
    allow_external_plugins: bool = False
+    show_version_info: bool = True

    api_key: str = ""

@@ -57,6 +58,14 @@ class DoclingServeSettings(BaseSettings):
    max_num_pages: int = sys.maxsize
    max_file_size: int = sys.maxsize

+    # Threading pipeline
+    queue_max_size: Optional[int] = None
+    ocr_batch_size: Optional[int] = None
+    layout_batch_size: Optional[int] = None
+    table_batch_size: Optional[int] = None
+    batch_polling_interval_seconds: Optional[float] = None
+
+    sync_poll_interval: int = 2  # seconds
    max_sync_wait: int = 120  # 2 minutes

    cors_origins: list[str] = ["*"]
--- a/docling_serve/websocket_notifier.py
+++ b/docling_serve/websocket_notifier.py
@@ -30,7 +30,9 @@ class WebsocketNotifier(BaseNotifier):
        if task_id not in self.task_subscribers:
            raise RuntimeError(f"Task {task_id} does not have a subscribers list.")

-        task = await self.orchestrator.get_raw_task(task_id=task_id)
+        try:
+            # Get task status from Redis or RQ directly instead of in-memory registry
+            task = await self.orchestrator.task_status(task_id=task_id)
            task_queue_position = await self.orchestrator.get_queue_position(task_id)
            msg = TaskStatusResponse(
                task_id=task.task_id,
@@ -41,15 +43,34 @@ class WebsocketNotifier(BaseNotifier):
            )
            for websocket in self.task_subscribers[task_id]:
                await websocket.send_text(
-                WebsocketMessage(message=MessageKind.UPDATE, task=msg).model_dump_json()
+                    WebsocketMessage(
+                        message=MessageKind.UPDATE, task=msg
+                    ).model_dump_json()
                )
                if task.is_completed():
                    await websocket.close()
+        except Exception as e:
+            # Log the error but don't crash the notifier
+            import logging
+
+            _log = logging.getLogger(__name__)
+            _log.error(f"Error notifying subscribers for task {task_id}: {e}")

    async def notify_queue_positions(self):
+        """Notify all subscribers of pending tasks about queue position updates."""
        for task_id in self.task_subscribers.keys():
-            # notify only pending tasks
-            if self.orchestrator.tasks[task_id].task_status != TaskStatus.PENDING:
-                continue
+            try:
+                # Check task status directly from Redis or RQ
+                task = await self.orchestrator.task_status(task_id)

+                # Notify only pending tasks
+                if task.task_status == TaskStatus.PENDING:
                    await self.notify_task_subscribers(task_id)
+            except Exception as e:
+                # Log the error but don't crash the notifier
+                import logging
+
+                _log = logging.getLogger(__name__)
+                _log.error(
+                    f"Error checking task {task_id} status for queue position notification: {e}"
+                )
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -39,23 +39,42 @@ THe following table describes the options to configure the Docling Serve app.
 |  | `DOCLING_SERVE_STATIC_PATH` | unset | If set to a valid directory, the static assets for the docs and UI will be loaded from this path |
 |  | `DOCLING_SERVE_SCRATCH_PATH` |  | If set, this directory will be used as scratch workspace, e.g. storing the results before they get requested. If unset, a temporary created is created for this purpose. |
 | `--enable-ui` | `DOCLING_SERVE_ENABLE_UI` | `false` | Enable the demonstrator UI. |
+|  | `DOCLING_SERVE_SHOW_VERSION_INFO` | `true` | If enabled, the `/version` endpoint will provide the Docling package versions, otherwise it will return a forbidden 403 error. |
 |  | `DOCLING_SERVE_ENABLE_REMOTE_SERVICES` | `false` | Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |
 |  | `DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS` | `false` | Allow the selection of third-party plugins. |
 |  | `DOCLING_SERVE_SINGLE_USE_RESULTS` | `true` | If true, results can be accessed only once. If false, the results accumulate in the scratch directory. |
 |  | `DOCLING_SERVE_RESULT_REMOVAL_DELAY` | `300` | When `DOCLING_SERVE_SINGLE_USE_RESULTS` is active, this is the delay before results are removed from the task registry. |
 |  | `DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT` | `604800` (7 days) | The maximum time for processing a document. |
-|  | `DOCLING_NUM_THREADS` | `4` | Number of concurrent threads for processing a document. |
 |  | `DOCLING_SERVE_MAX_NUM_PAGES` |  | The maximum number of pages for a document to be processed. |
 |  | `DOCLING_SERVE_MAX_FILE_SIZE` |  | The maximum file size for a document to be processed. |
+|  | `DOCLING_SERVE_SYNC_POLL_INTERVAL` | `2` | Number of seconds to sleep between polling the task status in the sync endpoints. |
 |  | `DOCLING_SERVE_MAX_SYNC_WAIT` | `120` | Max number of seconds a synchronous endpoint is waiting for the task completion. |
 |  | `DOCLING_SERVE_LOAD_MODELS_AT_BOOT` | `True` | If enabled, the models for the default options will be loaded at boot. |
 |  | `DOCLING_SERVE_OPTIONS_CACHE_SIZE` | `2` | How many DocumentConveter objects (including their loaded models) to keep in the cache. |
+|  | `DOCLING_SERVE_QUEUE_MAX_SIZE` | | Size of the pages queue. Potentially so many pages opened at the same time. |
+|  | `DOCLING_SERVE_OCR_BATCH_SIZE` | | Batch size for the OCR stage. |
+|  | `DOCLING_SERVE_LAYOUT_BATCH_SIZE` | | Batch size for the layout detection stage. |
+|  | `DOCLING_SERVE_TABLE_BATCH_SIZE` | | Batch size for the table structure stage. |
+|  | `DOCLING_SERVE_BATCH_POLLING_INTERVAL_SECONDS` | | Wait time for gathering pages before starting a stage processing. |
 |  | `DOCLING_SERVE_CORS_ORIGINS` | `["*"]` | A list of origins that should be permitted to make cross-origin requests. |
 |  | `DOCLING_SERVE_CORS_METHODS` | `["*"]` | A list of HTTP methods that should be allowed for cross-origin requests. |
 |  | `DOCLING_SERVE_CORS_HEADERS` | `["*"]` | A list of HTTP request headers that should be supported for cross-origin requests. |
 |  | `DOCLING_SERVE_API_KEY` | | If specified, all the API requests must contain the header `X-Api-Key` with this value. |
 |  | `DOCLING_SERVE_ENG_KIND` | `local` | The compute engine to use for the async tasks. Possible values are `local`, `rq` and `kfp`. See below for more configurations of the engines. |

+### Docling configuration
+
+Some Docling settings, mostly about performance, are exposed as environment variable which can be used also when running Docling Serve.
+
+| ENV | Default | Description |
+| ----|---------|-------------|
+| `DOCLING_NUM_THREADS` | `4` | Number of concurrent threads used for the `torch` CPU execution. |
+| `DOCLING_DEVICE` | | Device used for the model execution. Valid values are `cpu`, `cuda`, `mps`. When unset, the best device is chosen. For CUDA-enabled environments, you can choose which GPU using the syntax `cuda:0`, `cuda:1`, ... |
+| `DOCLING_PERF_PAGE_BATCH_SIZE` | `4` | Number of pages processed in the same batch. |
+| `DOCLING_PERF_ELEMENTS_BATCH_SIZE` | `8` | Number of document items/elements processed in the same batch during enrichment. |
+| `DOCLING_DEBUG_PROFILE_PIPELINE_TIMINGS` | `false` | When enabled, Docling will provide detailed timings information. |
+
+
 ### Compute engine

 Docling Serve can be deployed with several possible of compute engine.
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -4,31 +4,89 @@ The API provides two endpoints: one for urls, one for files. This is necessary t

 ## Common parameters

-On top of the source of file (see below), both endpoints support the same parameters, which are almost the same as the Docling CLI.
+On top of the source of file (see below), both endpoints support the same parameters.

- `from_formats` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
- `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
- `pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
- `page_range` (tuple). If specified, only convert a range of pages. The page number starts at 1.
- `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
- `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
- `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
- `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesserocr`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`. To use the `tesserocr` engine, `tesserocr` must be installed where docling-serve is running: `pip install tesserocr`
- `ocr_lang` (List[str]): List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. Defaults to empty.
- `pdf_backend` (str): PDF backend to use. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4`. Defaults to `dlparse_v4`.
- `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`.
- `abort_on_error` (bool): If enabled, abort on error. Defaults to false.
- `md_page_break_placeholder` (str): Add this placeholder between pages in the markdown output.
- `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
- `do_code_enrichment` (bool): If enabled, perform OCR code enrichment. Defaults to false.
- `do_formula_enrichment` (bool): If enabled, perform formula OCR, return LaTeX code. Defaults to false.
- `do_picture_classification` (bool): If enabled, classify pictures in documents. Defaults to false.
- `do_picture_description` (bool): If enabled, describe pictures in documents. Defaults to false.
- `picture_description_area_threshold` (float): Minimum percentage of the area for a picture to be processed with the models. Defaults to 0.05.
- `picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with `picture_description_api`.
- `picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with `picture_description_local`.
- `include_images` (bool): If enabled, images will be extracted from the document. Defaults to false.
- `images_scale` (float): Scale factor for images. Defaults to 2.0.
+<!-- begin: parameters-docs -->
+<h4>ConvertDocumentsRequestOptions</h4>
+
+| Field Name | Type | Description |
+|------------|------|-------------|
+| `from_formats` | List[InputFormat] | Input format(s) to convert from. String or list of strings. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`, `csv`, `xlsx`, `xml_uspto`, `xml_jats`, `mets_gbs`, `json_docling`, `audio`, `vtt`. Optional, defaults to all formats. |
+| `to_formats` | List[OutputFormat] | Output format(s) to convert to. String or list of strings. Allowed values: `md`, `json`, `html`, `html_split_page`, `text`, `doctags`. Optional, defaults to Markdown. |
+| `image_export_mode` | ImageRefMode | Image export mode for the document (in case of JSON, Markdown or HTML). Allowed values: `placeholder`, `embedded`, `referenced`. Optional, defaults to Embedded. |
+| `do_ocr` | bool | If enabled, the bitmap content will be processed using OCR. Boolean. Optional, defaults to true |
+| `force_ocr` | bool | If enabled, replace existing text with OCR-generated text over content. Boolean. Optional, defaults to false. |
+| `ocr_engine` | `ocr_engines_enum` | The OCR engine to use. String. Allowed values: `auto`, `easyocr`, `ocrmac`, `rapidocr`, `tesserocr`, `tesseract`. Optional, defaults to `easyocr`. |
+| `ocr_lang` | List[str] or NoneType | List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. String or list of strings. Optional, defaults to empty. |
+| `pdf_backend` | PdfBackend | The PDF backend to use. String. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4`. Optional, defaults to `dlparse_v4`. |
+| `table_mode` | TableFormerMode | Mode to use for table structure, String. Allowed values: `fast`, `accurate`. Optional, defaults to accurate. |
+| `table_cell_matching` | bool | If true, matches table cells predictions back to PDF cells. Can break table output if PDF cells are merged across table columns. If false, let table structure model define the text cells, ignore PDF cells. |
+| `pipeline` | ProcessingPipeline | Choose the pipeline to process PDF or image files. |
+| `page_range` | Tuple | Only convert a range of pages. The page number starts at 1. |
+| `document_timeout` | float | The timeout for processing each document, in seconds. |
+| `abort_on_error` | bool | Abort on error if enabled. Boolean. Optional, defaults to false. |
+| `do_table_structure` | bool | If enabled, the table structure will be extracted. Boolean. Optional, defaults to true. |
+| `include_images` | bool | If enabled, images will be extracted from the document. Boolean. Optional, defaults to true. |
+| `images_scale` | float | Scale factor for images. Float. Optional, defaults to 2.0. |
+| `md_page_break_placeholder` | str | Add this placeholder between pages in the markdown output. |
+| `do_code_enrichment` | bool | If enabled, perform OCR code enrichment. Boolean. Optional, defaults to false. |
+| `do_formula_enrichment` | bool | If enabled, perform formula OCR, return LaTeX code. Boolean. Optional, defaults to false. |
+| `do_picture_classification` | bool | If enabled, classify pictures in documents. Boolean. Optional, defaults to false. |
+| `do_picture_description` | bool | If enabled, describe pictures in documents. Boolean. Optional, defaults to false. |
+| `picture_description_area_threshold` | float | Minimum percentage of the area for a picture to be processed with the models. |
+| `picture_description_local` | PictureDescriptionLocal or NoneType | Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with `picture_description_api`. |
+| `picture_description_api` | PictureDescriptionApi or NoneType | API details for using a vision-language model in the picture description. This parameter is mutually exclusive with `picture_description_local`. |
+| `vlm_pipeline_model` | VlmModelType or NoneType | Preset of local and API models for the `vlm` pipeline. This parameter is mutually exclusive with `vlm_pipeline_model_local` and `vlm_pipeline_model_api`. Use the other options for more parameters. |
+| `vlm_pipeline_model_local` | VlmModelLocal or NoneType | Options for running a local vision-language model for the `vlm` pipeline. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with `vlm_pipeline_model_api` and `vlm_pipeline_model`. |
+| `vlm_pipeline_model_api` | VlmModelApi or NoneType | API details for using a vision-language model for the `vlm` pipeline. This parameter is mutually exclusive with `vlm_pipeline_model_local` and `vlm_pipeline_model`. |
+
+<h4>VlmModelApi</h4>
+
+| Field Name | Type | Description |
+|------------|------|-------------|
+| `url` | AnyUrl | Endpoint which accepts openai-api compatible requests. |
+| `headers` | Dict[str, str] | Headers used for calling the API endpoint. For example, it could include authentication headers. |
+| `params` | Dict[str, Any] | Model parameters. |
+| `timeout` | float | Timeout for the API request. |
+| `concurrency` | int | Maximum number of concurrent requests to the API. |
+| `prompt` | str | Prompt used when calling the vision-language model. |
+| `scale` | float | Scale factor of the images used. |
+| `response_format` | ResponseFormat | Type of response generated by the model. |
+| `temperature` | float | Temperature parameter controlling the reproducibility of the result. |
+
+<h4>VlmModelLocal</h4>
+
+| Field Name | Type | Description |
+|------------|------|-------------|
+| `repo_id` | str | Repository id from the Hugging Face Hub. |
+| `prompt` | str | Prompt used when calling the vision-language model. |
+| `scale` | float | Scale factor of the images used. |
+| `response_format` | ResponseFormat | Type of response generated by the model. |
+| `inference_framework` | InferenceFramework | Inference framework to use. |
+| `transformers_model_type` | TransformersModelType | Type of transformers auto-model to use. |
+| `extra_generation_config` | Dict[str, Any] | Config from https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig |
+| `temperature` | float | Temperature parameter controlling the reproducibility of the result. |
+
+<h4>PictureDescriptionApi</h4>
+
+| Field Name | Type | Description |
+|------------|------|-------------|
+| `url` | AnyUrl | Endpoint which accepts openai-api compatible requests. |
+| `headers` | Dict[str, str] | Headers used for calling the API endpoint. For example, it could include authentication headers. |
+| `params` | Dict[str, Any] | Model parameters. |
+| `timeout` | float | Timeout for the API request. |
+| `concurrency` | int | Maximum number of concurrent requests to the API. |
+| `prompt` | str | Prompt used when calling the vision-language model. |
+
+<h4>PictureDescriptionLocal</h4>
+
+| Field Name | Type | Description |
+|------------|------|-------------|
+| `repo_id` | str | Repository id from the Hugging Face Hub. |
+| `prompt` | str | Prompt used when calling the vision-language model. |
+| `generation_config` | Dict[str, Any] | Config from https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig |
+
+<!-- end: parameters-docs -->

 ### Authentication

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "docling-serve"
-version = "1.7.0"  # DO NOT EDIT, updated automatically
+version = "1.9.0"  # DO NOT EDIT, updated automatically
 description = "Running Docling as a service"
 license = {text = "MIT"}
 authors = [
@@ -35,8 +35,8 @@ requires-python = ">=3.10"
 dependencies = [
    "docling~=2.38",
    "docling-core>=2.45.0",
-    "docling-jobkit[kfp,rq,vlm]>=1.6.0,<2.0.0",
-    "fastapi[standard]~=0.115",
+    "docling-jobkit[kfp,rq,vlm]>=1.8.0,<2.0.0",
+    "fastapi[standard]<0.119.0",  # ~=0.115
    "httpx~=0.28",
    "pydantic~=2.10",
    "pydantic-settings~=2.4",
@@ -69,6 +69,9 @@ flash-attn = [
 [dependency-groups]
 dev = [
    "asgi-lifespan~=2.0",
+    "httpx",
+    "pydantic",
+    "pydantic-settings",
    "mypy~=1.11",
    "pre-commit-uv~=4.1",
    "pypdf>=6.0.0",
--- a/scripts/init.py
+++ b/scripts/init.py
--- a/scripts/update_doc_usage.py
+++ b/scripts/update_doc_usage.py
@@ -0,0 +1,199 @@
+import re
+from typing import Annotated, Any, Union, get_args, get_origin
+
+from pydantic import BaseModel
+
+from docling_serve.datamodel.convert import ConvertDocumentsRequestOptions
+
+DOCS_FILE = "docs/usage.md"
+
+VARIABLE_WORDS: list[str] = [
+    "picture_description_local",
+    "vlm_pipeline_model",
+    "vlm",
+    "vlm_pipeline_model_api",
+    "ocr_engines_enum",
+    "easyocr",
+    "dlparse_v4",
+    "fast",
+    "picture_description_api",
+    "vlm_pipeline_model_local",
+]
+
+
+def format_variable_names(text: str) -> str:
+    """Format specific words in description to be code-formatted."""
+    sorted_words = sorted(VARIABLE_WORDS, key=len, reverse=True)
+
+    escaped_words = [re.escape(word) for word in sorted_words]
+
+    for word in escaped_words:
+        pattern = rf"(?<!`)\b{word}\b(?!`)"
+        text = re.sub(pattern, f"`{word}`", text)
+
+    return text
+
+
+def format_allowed_values_description(description: str) -> str:
+    """Format description to code-format allowed values."""
+    # Regex pattern to find text after "Allowed values:"
+    match = re.search(r"Allowed values:(.+?)(?:\.|$)", description, re.DOTALL)
+
+    if match:
+        # Extract the allowed values
+        values_str = match.group(1).strip()
+
+        # Split values, handling both comma and 'and' separators
+        values = re.split(r"\s*(?:,\s*|\s+and\s+)", values_str)
+
+        # Remove any remaining punctuation and whitespace
+        values = [value.strip("., ") for value in values]
+
+        # Create code-formatted values
+        formatted_values = ", ".join(f"`{value}`" for value in values)
+
+        # Replace the original allowed values with formatted version
+        formatted_description = re.sub(
+            r"(Allowed values:)(.+?)(?:\.|$)",
+            f"\\1 {formatted_values}.",
+            description,
+            flags=re.DOTALL,
+        )
+
+        return formatted_description
+
+    return description
+
+
+def _format_type(type_hint: Any) -> str:
+    """Format type ccrrectly, like Annotation or Union."""
+    if get_origin(type_hint) is Annotated:
+        base_type = get_args(type_hint)[0]
+        return _format_type(base_type)
+
+    if hasattr(type_hint, "__origin__"):
+        origin = type_hint.__origin__
+        args = get_args(type_hint)
+
+        if origin is list:
+            return f"List[{_format_type(args[0])}]"
+        elif origin is dict:
+            return f"Dict[{_format_type(args[0])}, {_format_type(args[1])}]"
+        elif str(origin).__contains__("Union") or str(origin).__contains__("Optional"):
+            return " or ".join(_format_type(arg) for arg in args)
+        elif origin is None:
+            return "null"
+
+    if hasattr(type_hint, "__name__"):
+        return type_hint.__name__
+
+    return str(type_hint)
+
+
+def _unroll_types(tp) -> list[type]:
+    """
+    Unrolls typing.Union and typing.Optional types into a flat list of types.
+    """
+    origin = get_origin(tp)
+    if origin is Union:
+        # Recursively unroll each type inside the Union
+        types = []
+        for arg in get_args(tp):
+            types.extend(_unroll_types(arg))
+        # Remove duplicates while preserving order
+        return list(dict.fromkeys(types))
+    else:
+        # If it's not a Union, just return it as a single-element list
+        return [tp]
+
+
+def generate_model_doc(model: type[BaseModel]) -> str:
+    """Generate documentation for a Pydantic model."""
+
+    models_stack = [model]
+
+    doc = ""
+    while models_stack:
+        current_model = models_stack.pop()
+
+        doc += f"<h4>{current_model.__name__}</h4>\n"
+
+        doc += "\n| Field Name | Type | Description |\n"
+        doc += "|------------|------|-------------|\n"
+
+        base_models = []
+        if hasattr(current_model, "__mro__"):
+            base_models = current_model.__mro__
+        else:
+            base_models = [current_model]
+
+        for base_model in base_models:
+            # Check if this is a Pydantic model
+            if hasattr(base_model, "model_fields"):
+                # Iterate through fields of this model
+                for field_name, field in base_model.model_fields.items():
+                    # Extract description from Annotated field if possible
+                    description = field.description or "No description provided."
+                    description = format_allowed_values_description(description)
+                    description = format_variable_names(description)
+
+                    # Handle Annotated types
+                    original_type = field.annotation
+                    if get_origin(original_type) is Annotated:
+                        # Extract base type and additional metadata
+                        type_args = get_args(original_type)
+                        base_type = type_args[0]
+                    else:
+                        base_type = original_type
+
+                    field_type = _format_type(base_type)
+                    field_type = format_variable_names(field_type)
+
+                    doc += f"| `{field_name}` | {field_type} | {description} |\n"
+
+                    for field_type in _unroll_types(base_type):
+                        if issubclass(field_type, BaseModel):
+                            models_stack.append(field_type)
+
+                # stop iterating the base classes
+                break
+
+        doc += "\n"
+    return doc
+
+
+def update_documentation():
+    """Update the documentation file with model information."""
+    doc_request = generate_model_doc(ConvertDocumentsRequestOptions)
+
+    with open(DOCS_FILE) as f:
+        content = f.readlines()
+
+    # Prepare to update the content
+    new_content = []
+    in_cp_section = False
+
+    for line in content:
+        if line.startswith("<!-- begin: parameters-docs -->"):
+            in_cp_section = True
+            new_content.append(line)
+            new_content.append(doc_request)
+            continue
+
+        if in_cp_section and line.strip() == "<!-- end: parameters-docs -->":
+            in_cp_section = False
+
+        if not in_cp_section:
+            new_content.append(line)
+
+    # Only write to the file if new_content is different from content
+    if "".join(new_content) != "".join(content):
+        with open(DOCS_FILE, "w") as f:
+            f.writelines(new_content)
+        print(f"Documentation updated in {DOCS_FILE}")
+    else:
+        print("No changes detected. Documentation file remains unchanged.")
+
+
+if __name__ == "__main__":
+    update_documentation()
--- a/tests/test_1-url-async-ws.py
+++ b/tests/test_1-url-async-ws.py
@@ -69,3 +69,9 @@ async def test_convert_url(async_client: httpx.AsyncClient):
    with connect(uri) as websocket:
        for message in websocket:
            print(message)
+
+    result_resp = await async_client.get(f"{base_url}/result/{task['task_id']}")
+    assert result_resp.status_code == 200, "Response should be 200 OK"
+    result = result_resp.json()
+    print(f"{result['processing_time']=}")
+    assert result["processing_time"] > 1.0
--- a/tests/test_fastapi_endpoints.py
+++ b/tests/test_fastapi_endpoints.py
@@ -54,6 +54,14 @@ async def test_health(client: AsyncClient):
    assert response.json() == {"status": "ok"}


+@pytest.mark.asyncio
+async def test_openapijson(client: AsyncClient):
+    response = await client.get("/openapi.json")
+    assert response.status_code == 200
+    schema = response.json()
+    assert "openapi" in schema
+
+
@pytest.mark.asyncio
 async def test_convert_file(client: AsyncClient, auth_headers: dict):
    """Test convert single file to all outputs"""
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
github-actions[bot]	0ec67a37b7	chore: bump version to 1.9.0 [skip ci]	2025-11-24 08:43:53 +00:00
Michele Dolfi	772fcec4ae	chore: avoid installing multiple times dependencies (#429 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-11-21 15:42:42 +01:00
Michele Dolfi	e437e830c9	fix: Dependencies updates – Docling 2.63.0 (#443 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-11-21 10:31:56 +01:00
Michele Dolfi	2c23f65507	feat: version endpoint (#442 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-11-20 17:57:10 +01:00
Burt Holzman	5dc942f25b	chore: docs typo (cude -> cuda) (#437 ) Signed-off-by: Burt Holzman <burt@fnal.gov>	2025-11-17 08:31:44 +01:00
github-actions[bot]	ff310f2b13	chore: bump version to 1.8.0 [skip ci]	2025-10-31 17:01:56 +00:00
Michele Dolfi	bf132a3c3e	feat: Docling with new standard pipeline with threading (#428 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-31 17:57:38 +01:00
Michele Dolfi	35319b0da7	docs: Expand automatic docs to nested objects. More complete usage docs. (#426 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-31 15:02:20 +01:00
Michele Dolfi	f3957aeb57	docs: add docs for docling parameters like performance and debug (#424 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-31 14:17:31 +01:00
github-actions[bot]	1ec44220f5	chore: bump version to 1.7.2 [skip ci]	2025-10-30 15:14:17 +00:00
Michele Dolfi	e9b41406c4	fix: Update locked dependencies. Docling fixes, Expose temperature parameter for vlm models (#423 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-30 16:09:21 +01:00
Michele Dolfi	a2e68d39ae	test: check that processing time is not skipped (#416 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-27 08:29:05 +01:00
Michele Dolfi	7bf2e7b366	fix: temporary constrain fastapi version (#418 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-24 11:22:05 +02:00
github-actions[bot]	462ceff9d1	chore: bump version to 1.7.1 [skip ci]	2025-10-22 14:01:58 +00:00
Michele Dolfi	97613a1974	fix: Upgrade dependencies (#417 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-22 15:42:59 +02:00
Paweł Rein	0961f2c574	fix: makes task status shared across multiple instances in RQ mode, resolves #378 (#415 ) Signed-off-by: Pawel Rein <pawel.rein@prezi.com>	2025-10-21 15:08:42 +02:00
Tiago Santana	9672f310b1	docs: Generate usage.md automatically (#340 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-21 14:27:01 +02:00
Michele Dolfi	56e8535a7a	chore: publish release notes on Discord (#409 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-20 14:15:58 +02:00
Michele Dolfi	0f274ab135	fix: `DOCLING_SERVE_SYNC_POLL_INTERVAL` controls the synchronous polling time (#413 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-20 14:14:00 +02:00
Michele Dolfi	0427f71ef4	chore: allow to change the container runtime (#412 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-10-20 14:13:51 +02:00