chore: bump version to 0.16.1 [skip ci]

fix: upgrade deps including, docling v2.40.0 with locks in models init (#264 )
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-29 16:43:24 +00:00 · 2025-07-07 16:17:50 +00:00 · 2025-07-07 17:13:45 +02:00 · 2025-07-07 16:36:43 +02:00 · 2025-07-07 08:47:34 +02:00 · 2025-07-07 08:47:21 +02:00
44 changed files with 4646 additions and 2219 deletions
--- a/.github/dco.yml
+++ b/.github/dco.yml
@@ -0,0 +1,2 @@
+allowRemediationCommits:
+  individual: true
--- a/.github/workflows/ci-images-dryrun.yml
+++ b/.github/workflows/ci-images-dryrun.yml
@@ -15,15 +15,23 @@ jobs:
        spec:
          - name: docling-project/docling-serve
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-extra flash-attn
            platforms: linux/amd64, linux/arm64
          - name: docling-project/docling-serve-cpu
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn
            platforms: linux/amd64, linux/arm64
          - name: docling-project/docling-serve-cu124
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu126
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu128
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128
            platforms: linux/amd64

    permissions:
--- a/.github/workflows/dco-advisor.yml
+++ b/.github/workflows/dco-advisor.yml
@@ -0,0 +1,192 @@
+name: DCO Advisor Bot
+
+on:
+  pull_request_target:
+    types: [opened, reopened, synchronize]
+
+permissions:
+  pull-requests: write
+  issues: write
+
+jobs:
+  dco_advisor:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Handle DCO check result
+        uses: actions/github-script@v7
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const pr = context.payload.pull_request || context.payload.check_run?.pull_requests?.[0];
+            if (!pr) return;
+
+            const prNumber = pr.number;
+            const baseRef = pr.base.ref;
+            const headSha =
+              context.payload.check_run?.head_sha ||
+              pr.head?.sha;
+            const username = pr.user.login;
+
+            console.log("HEAD SHA:", headSha);
+
+            const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
+
+            // Poll until DCO check has a conclusion (max 6 attempts, 30s)
+            let dcoCheck = null;
+            for (let attempt = 0; attempt < 6; attempt++) {
+              const { data: checks } = await github.rest.checks.listForRef({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                ref: headSha
+              });
+
+              
+              console.log("All check runs:");
+                checks.check_runs.forEach(run => {
+                console.log(`- ${run.name} (${run.status}/${run.conclusion}) @ ${run.head_sha}`);
+              });
+
+              dcoCheck = checks.check_runs.find(run =>
+                run.name.toLowerCase().includes("dco") &&
+              !run.name.toLowerCase().includes("dco_advisor") &&
+                run.head_sha === headSha
+              );
+
+
+              if (dcoCheck?.conclusion) break;
+              console.log(`Waiting for DCO check... (${attempt + 1})`);
+              await sleep(5000); // wait 5 seconds
+            }
+
+            if (!dcoCheck || !dcoCheck.conclusion) {
+              console.log("DCO check did not complete in time.");
+              return;
+            }
+
+            const isFailure = ["failure", "action_required"].includes(dcoCheck.conclusion);
+            console.log(`DCO check conclusion for ${headSha}: ${dcoCheck.conclusion} (treated as ${isFailure ? "failure" : "success"})`);
+
+            // Parse DCO output for commit SHAs and author
+            let badCommits = [];
+            let authorName = "";
+            let authorEmail = "";
+            let moreInfo = `More info: [DCO check report](${dcoCheck?.html_url})`;
+
+            if (isFailure) {
+                const { data: commits } = await github.rest.pulls.listCommits({
+                    owner: context.repo.owner,
+                    repo: context.repo.repo,
+                    pull_number: prNumber,
+                });
+
+                for (const commit of commits) {
+                    const commitMessage = commit.commit.message;
+                    const signoffMatch = commitMessage.match(/^Signed-off-by:\s+.+<.+>$/m);
+                    if (!signoffMatch) {
+                        console.log(`Bad commit found ${commit.sha}`)
+                        badCommits.push({
+                        sha: commit.sha,
+                        authorName: commit.commit.author.name,
+                        authorEmail: commit.commit.author.email,
+                        });
+                    }
+                }            
+            }
+
+            // If multiple authors are present, you could adapt the message accordingly
+            // For now, we'll just use the first one
+            if (badCommits.length > 0) {
+            authorName = badCommits[0].authorName;
+            authorEmail = badCommits[0].authorEmail;
+            }
+
+            // Generate remediation commit message if needed
+            let remediationSnippet = "";
+            if (badCommits.length && authorEmail) {
+              remediationSnippet = `git commit --allow-empty -s -m "DCO Remediation Commit for ${authorName} <${authorEmail}>\n\n` +
+                badCommits.map(c => `I, ${c.authorName} <${c.authorEmail}>, hereby add my Signed-off-by to this commit: ${c.sha}`).join('\n') +
+                `"`;
+            } else {
+              remediationSnippet = "# Unable to auto-generate remediation message. Please check the DCO check details.";
+            }
+
+            // Build comment
+            const commentHeader = '<!-- dco-advice-bot -->';
+            let body = "";
+
+            if (isFailure) {
+              body = [
+                commentHeader,
+                '❌ **DCO Check Failed**',
+                '',
+                `Hi @${username}, your pull request has failed the Developer Certificate of Origin (DCO) check.`,
+                '',
+                'This repository supports **remediation commits**, so you can fix this without rewriting history — but you must follow the required message format.',
+                '',
+                '---',
+                '',
+                '### 🛠 Quick Fix: Add a remediation commit',
+                'Run this command:',
+                '',
+                '```bash',
+                remediationSnippet,
+                'git push',
+                '```',
+                '',
+                '---',
+                '',
+                '<details>',
+                '<summary>🔧 Advanced: Sign off each commit directly</summary>',
+                '',
+                '**For the latest commit:**',
+                '```bash',
+                'git commit --amend --signoff',
+                'git push --force-with-lease',
+                '```',
+                '',
+                '**For multiple commits:**',
+                '```bash',
+                `git rebase --signoff origin/${baseRef}`,
+                'git push --force-with-lease',
+                '```',
+                '',
+                '</details>',
+                '',
+                moreInfo
+              ].join('\n');
+            } else {
+              body = [
+                commentHeader,
+                '✅ **DCO Check Passed**',
+                '',
+                `Thanks @${username}, all your commits are properly signed off. 🎉`
+              ].join('\n');
+            }
+
+            // Get existing comments on the PR
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: prNumber
+            });
+
+            // Look for a previous bot comment
+            const existingComment = comments.find(c =>
+              c.body.includes("<!-- dco-advice-bot -->")
+            );
+
+            if (existingComment) {
+              await github.rest.issues.updateComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: existingComment.id,
+                body: body
+              });
+            } else {
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: body
+              });
+            }
--- a/.github/workflows/images.yml
+++ b/.github/workflows/images.yml
@@ -19,15 +19,23 @@ jobs:
        spec:
          - name: docling-project/docling-serve
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-extra flash-attn
            platforms: linux/amd64, linux/arm64
          - name: docling-project/docling-serve-cpu
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn
            platforms: linux/amd64, linux/arm64
          - name: docling-project/docling-serve-cu124
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu126
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu128
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128
            platforms: linux/amd64

    permissions:
--- a/.github/workflows/job-build.yml
+++ b/.github/workflows/job-build.yml
@@ -17,7 +17,7 @@ jobs:
          python-version: ${{ matrix.python-version }}
          enable-cache: true
      - name: Install dependencies
-        run: uv sync --all-extras --no-extra cu124 --no-extra flash-attn
+        run: uv sync --all-extras --no-extra flash-attn
      - name: Build package
        run: uv build
      - name: Check content of wheel
--- a/.github/workflows/job-checks.yml
+++ b/.github/workflows/job-checks.yml
@@ -25,7 +25,7 @@ jobs:
          key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }}

      - name: Install dependencies
-        run: uv sync --frozen --all-extras --no-extra cu124 --no-extra flash-attn
+        run: uv sync --frozen --all-extras --no-extra flash-attn

      - name: Run styling check
        run: pre-commit run --all-files
--- a/.gitignore
+++ b/.gitignore
@@ -444,3 +444,5 @@ pip-selfcheck.json
 # Makefile
 .action-lint
 .markdown-lint
+
+cookies.txt
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,6 +23,6 @@ repos:
        files: '\.py$'
  - repo: https://github.com/astral-sh/uv-pre-commit
    # uv version.
-    rev: 0.6.1
+    rev: 0.7.13
    hooks:
      - id: uv-lock
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,86 @@
+## [v0.16.1](https://github.com/docling-project/docling-serve/releases/tag/v0.16.1) - 2025-07-07
+
+### Fix
+
+* Upgrade deps including, docling v2.40.0 with locks in models init ([#264](https://github.com/docling-project/docling-serve/issues/264)) ([`bfde1a0`](https://github.com/docling-project/docling-serve/commit/bfde1a0991c2da53b72c4f131ff74fa10f6340de))
+* Missing tesseract osd ([#263](https://github.com/docling-project/docling-serve/issues/263)) ([`eb3892e`](https://github.com/docling-project/docling-serve/commit/eb3892ee141eb2c941d580b095d8a266f2d2610c))
+* Properly load models at boot ([#244](https://github.com/docling-project/docling-serve/issues/244)) ([`149a8cb`](https://github.com/docling-project/docling-serve/commit/149a8cb1c0a16c1e0b7d17f40b88b4d6e8f0109d))
+
+### Documentation
+
+* Fix typo ([#259](https://github.com/docling-project/docling-serve/issues/259)) ([`93b8471`](https://github.com/docling-project/docling-serve/commit/93b84712b2c6d180908a197847b52b217a7ff05f))
+* Change the doc example ([#258](https://github.com/docling-project/docling-serve/issues/258)) ([`c45b937`](https://github.com/docling-project/docling-serve/commit/c45b93706466a073ab4a5c75aa8a267110873e26))
+* Update typo ([#247](https://github.com/docling-project/docling-serve/issues/247)) ([`50e431f`](https://github.com/docling-project/docling-serve/commit/50e431f30fbffa33f43727417fe746d20cbb9d6b))
+
+## [v0.16.0](https://github.com/docling-project/docling-serve/releases/tag/v0.16.0) - 2025-06-25
+
+### Feature
+
+* Package updates and more cuda images ([#229](https://github.com/docling-project/docling-serve/issues/229)) ([`30aca92`](https://github.com/docling-project/docling-serve/commit/30aca92298ab0d86bb4debcfcacb2dd8b9040a27))
+
+### Documentation
+
+* Update example resources and improve README ([#231](https://github.com/docling-project/docling-serve/issues/231)) ([`80755a7`](https://github.com/docling-project/docling-serve/commit/80755a7d5955f7d0c53df8e558fdd852dd1f5b75))
+
+## [v0.15.0](https://github.com/docling-project/docling-serve/releases/tag/v0.15.0) - 2025-06-17
+
+### Feature
+
+* Use redocs and scalar as api docs ([#228](https://github.com/docling-project/docling-serve/issues/228)) ([`873d05a`](https://github.com/docling-project/docling-serve/commit/873d05aefe141c63b9c1cf53b23b4fa8c96de05d))
+
+### Fix
+
+* "tesserocr" instead of "tesseract_cli" in usage docs ([#223](https://github.com/docling-project/docling-serve/issues/223)) ([`196c5ce`](https://github.com/docling-project/docling-serve/commit/196c5ce42a04d77234a4212c3d9b9772d2c2073e))
+
+## [v0.14.0](https://github.com/docling-project/docling-serve/releases/tag/v0.14.0) - 2025-06-17
+
+### Feature
+
+* Read supported file extensions from docling ([#214](https://github.com/docling-project/docling-serve/issues/214)) ([`524f6a8`](https://github.com/docling-project/docling-serve/commit/524f6a8997b86d2f869ca491ec8fb40585b42ca4))
+
+### Fix
+
+* Typo in Headline ([#220](https://github.com/docling-project/docling-serve/issues/220)) ([`d5455b7`](https://github.com/docling-project/docling-serve/commit/d5455b7f66de39ea1f8b8927b5968d2baa23ca88))
+
+## [v0.13.0](https://github.com/docling-project/docling-serve/releases/tag/v0.13.0) - 2025-06-04
+
+### Feature
+
+* Upgrade docling to 2.36 ([#212](https://github.com/docling-project/docling-serve/issues/212)) ([`ffea347`](https://github.com/docling-project/docling-serve/commit/ffea34732b24fdd438fabd6df02d3d9ce66b4534))
+
+## [v0.12.0](https://github.com/docling-project/docling-serve/releases/tag/v0.12.0) - 2025-06-03
+
+### Feature
+
+* Export annotations in markdown and html (Docling upgrade) ([#202](https://github.com/docling-project/docling-serve/issues/202)) ([`c4c41f1`](https://github.com/docling-project/docling-serve/commit/c4c41f16dff83c5d2a0b8a4c625b5de19b36b7c5))
+
+### Fix
+
+* Processing complex params in multipart-form ([#210](https://github.com/docling-project/docling-serve/issues/210)) ([`7066f35`](https://github.com/docling-project/docling-serve/commit/7066f3520a88c07df1c80a0cc6c4339eaac4d6a7))
+
+### Documentation
+
+* Add openshift replicasets examples ([#209](https://github.com/docling-project/docling-serve/issues/209)) ([`6a8190c`](https://github.com/docling-project/docling-serve/commit/6a8190c315792bd1e0e2b0af310656baaa5551e5))
+
+## [v0.11.0](https://github.com/docling-project/docling-serve/releases/tag/v0.11.0) - 2025-05-23
+
+### Feature
+
+* Page break placeholder in markdown exports options ([#194](https://github.com/docling-project/docling-serve/issues/194)) ([`32b8a80`](https://github.com/docling-project/docling-serve/commit/32b8a809f348bf9fbde657f93589a56935d3749d))
+* Clear results registry ([#192](https://github.com/docling-project/docling-serve/issues/192)) ([`de002df`](https://github.com/docling-project/docling-serve/commit/de002dfcdc111c942a08b156c84b7fa22b3fbaf3))
+* Upgrade to Docling 2.33.0 ([#198](https://github.com/docling-project/docling-serve/issues/198)) ([`abe5aa0`](https://github.com/docling-project/docling-serve/commit/abe5aa03f54d44ecf5c6d76e3258028997a53e68))
+* Api to trigger offloading the models ([#188](https://github.com/docling-project/docling-serve/issues/188)) ([`00be428`](https://github.com/docling-project/docling-serve/commit/00be4284904d55b78c75c5475578ef11c2ade94c))
+* Figure annotations @ docling components 0.0.7 ([#181](https://github.com/docling-project/docling-serve/issues/181)) ([`3ff1b2f`](https://github.com/docling-project/docling-serve/commit/3ff1b2f9834aca37472a895a0e3da47560457d77))
+
+### Fix
+
+* Usage of hashlib for FIPS ([#171](https://github.com/docling-project/docling-serve/issues/171)) ([`8406fb9`](https://github.com/docling-project/docling-serve/commit/8406fb9b59d83247b8379974cabed497703dfc4d))
+
+### Documentation
+
+* Example and instructions on how to load model weights to persistent volume ([#197](https://github.com/docling-project/docling-serve/issues/197)) ([`3f090b7`](https://github.com/docling-project/docling-serve/commit/3f090b7d15eaf696611d89bbbba5b98569610828))
+* Async api usage and fixes ([#195](https://github.com/docling-project/docling-serve/issues/195)) ([`21c1791`](https://github.com/docling-project/docling-serve/commit/21c1791e427f5b1946ed46c68dfda03c957dca8f))
+
 ## [v0.10.1](https://github.com/docling-project/docling-serve/releases/tag/v0.10.1) - 2025-04-30

 ### Fix
--- a/4
+++ b/4
@@ -42,7 +42,7 @@ ENV \

 ARG UV_SYNC_EXTRA_ARGS=""

-RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+RUN --mount=from=ghcr.io/astral-sh/uv:0.7.13,source=/uv,target=/bin/uv \
    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
@@ -61,7 +61,7 @@ RUN echo "Downloading models..." && \
    chmod -R g=u ${DOCLING_SERVE_ARTIFACTS_PATH}

 COPY --chown=1001:0 ./docling_serve ./docling_serve
-RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+RUN --mount=from=ghcr.io/astral-sh/uv:0.7.13,source=/uv,target=/bin/uv \
    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
--- a/34
+++ b/34
@@ -26,26 +26,40 @@ md-lint-file:
 	$(CMD_PREFIX) touch .markdown-lint

 .PHONY: docling-serve-image
-docling-serve-image: Containerfile
+docling-serve-image: Containerfile ## Build docling-serve container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu" -f Containerfile -t ghcr.io/docling-project/docling-serve:$(TAG) .
+	$(CMD_PREFIX) docker build --load -f Containerfile -t ghcr.io/docling-project/docling-serve:$(TAG) .
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) ghcr.io/docling-project/docling-serve:$(BRANCH_TAG)
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) quay.io/docling-project/docling-serve:$(BRANCH_TAG)

 .PHONY: docling-serve-cpu-image
 docling-serve-cpu-image: Containerfile ## Build docling-serve "cpu only" container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve CPU]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) ghcr.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) quay.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)

 .PHONY: docling-serve-cu124-image
-docling-serve-cu124-image: Containerfile ## Build docling-serve container image with GPU support
+docling-serve-cu124-image: Containerfile ## Build docling-serve container image with CUDA 12.4 support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.4]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cpu" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu124:$(TAG) .
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu124:$(TAG) .
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) ghcr.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
 	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) quay.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)

+.PHONY: docling-serve-cu126-image
+docling-serve-cu126-image: Containerfile ## Build docling-serve container image with CUDA 12.6 support
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.6]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu126:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) ghcr.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) quay.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+
+.PHONY: docling-serve-cu128-image
+docling-serve-cu128-image: Containerfile ## Build docling-serve container image with CUDA 12.8 support
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.8]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu128:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) ghcr.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) quay.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
+
 .PHONY: action-lint
 action-lint: .action-lint ##      Lint GitHub Action workflows
 .action-lint: $(shell find .github -type f) | action-lint-file
@@ -87,9 +101,9 @@ run-docling-cpu: ## Run the docling-serve container with CPU support and assign
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with CPU support on port 5001...\n" "[RUN CPU]"
 	$(CMD_PREFIX) docker run -it --name docling-serve-cpu -p 5001:5001 ghcr.io/docling-project/docling-serve-cpu:main

-.PHONY: run-docling-gpu
-run-docling-gpu: ## Run the docling-serve container with GPU support and assign a container name
+.PHONY: run-docling-cu124
+run-docling-cu124: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-gpu 2>/dev/null || true
-	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN GPU]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-gpu -p 5001:5001 ghcr.io/docling-project/docling-serve:main
+	$(CMD_PREFIX) docker rm -f docling-serve-cu124 2>/dev/null || true
+	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN CUDA 12.4]"
+	$(CMD_PREFIX) docker run -it --name docling-serve-cu124 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu124:main
--- a/README.md
+++ b/README.md
@@ -8,23 +8,31 @@

 Running [Docling](https://github.com/docling-project/docling) as an API service.

+📚 [Docling Serve documentation](./docs/README.md)
+
+- Learning how to [configure the webserver](./docs/configuration.md)
+- Get to know all [runtime options](./docs/usage.md) of the API
+- Explore usefule [deployment examples](./docs/deployment.md)
+- And more
+
 ## Getting started

 Install the `docling-serve` package and run the server.

 ```bash
 # Using the python package
-pip install "docling-serve"
-docling-serve run
+pip install "docling-serve[ui]"
+docling-serve run --enable-ui

 # Using container images, e.g. with Podman
-podman run -p 5001:5001 quay.io/docling-project/docling-serve
+podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=1 quay.io/docling-project/docling-serve
 ```

 The server is available at

 - API <http://127.0.0.1:5001>
 - API documentation <http://127.0.0.1:5001/docs>
+- UI playground <http://127.0.0.1:5001/ui>
  ![swagger.png](img/swagger.png)

 Try it out with a simple conversion:
@@ -45,33 +53,22 @@ Available container images:

 | Name | Description | Arch | Size |
 | -----|-------------|------|------|
-| [`ghcr.io/docling-project/docling-serve`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve) <br /> [`quay.io/docling-project/docling-serve`](https://quay.io/repository/docling-project/docling-serve) | Simple image for Docling Serve, installing all packages from the official pypi.org index. | `linux/amd64`, `linux/arm64` | 3.6 GB |
+| [`ghcr.io/docling-project/docling-serve`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve) <br /> [`quay.io/docling-project/docling-serve`](https://quay.io/repository/docling-project/docling-serve) | Simple image for Docling Serve, installing all packages from the official pypi.org index. | `linux/amd64`, `linux/arm64` | 3.6 GB (arm64) <br /> 8.7 GB (amd64) |
 | [`ghcr.io/docling-project/docling-serve-cpu`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cpu) <br /> [`quay.io/docling-project/docling-serve-cpu`](https://quay.io/repository/docling-project/docling-serve-cpu) | Cpu-only image which installs `torch` from the pytorch cpu index. | `linux/amd64`, `linux/arm64` | 3.6 GB |
 | [`ghcr.io/docling-project/docling-serve-cu124`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu124) <br /> [`quay.io/docling-project/docling-serve-cu124`](https://quay.io/repository/docling-project/docling-serve-cu124) | Cuda 12.4 image which installs `torch` from the pytorch cu124 index. | `linux/amd64` | 8.7 GB |
+| [`ghcr.io/docling-project/docling-serve-cu126`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu126) <br /> [`quay.io/docling-project/docling-serve-cu126`](https://quay.io/repository/docling-project/docling-serve-cu126) | Cuda 12.6 image which installs `torch` from the pytorch cu126 index. | `linux/amd64` | 8.7 GB |
+| [`ghcr.io/docling-project/docling-serve-cu128`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu128) <br /> [`quay.io/docling-project/docling-serve-cu128`](https://quay.io/repository/docling-project/docling-serve-cu128) | Cuda 12.8 image which installs `torch` from the pytorch cu128 index. | `linux/amd64` | 8.7 GB |

 Coming soon: `docling-serve-slim` images will reduce the size by skipping the model weights download.

 ### Demonstration UI

-```bash
-# Install the Python package with the extra dependencies
-pip install "docling-serve[ui]"
-docling-serve run --enable-ui
-
-# Run the container image with the extra env parameters
-podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve
-```
-
 An easy to use UI is available at the `/ui` endpoint.

 ![ui-input.png](img/ui-input.png)

 ![ui-output.png](img/ui-output.png)

-## Documentation and advance usages
-
-Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md).
-
 ## Get help and support

 Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).
--- a/docling_serve/main.py
+++ b/docling_serve/main.py
@@ -113,11 +113,13 @@ def _run(
    protocol = "https" if run_ssl else "http"
    url = f"{protocol}://{uvicorn_settings.host}:{uvicorn_settings.port}"
    url_docs = f"{url}/docs"
+    url_scalar = f"{url}/scalar"
    url_ui = f"{url}/ui"

    console.print("")
    console.print(f"Server started at [link={url}]{url}[/]")
    console.print(f"Documentation at [link={url_docs}]{url_docs}[/]")
+    console.print(f"Scalar docs at [link={url_docs}]{url_scalar}[/]")
    if docling_serve_settings.enable_ui:
        console.print(f"UI at [link={url_ui}]{url_ui}[/]")

--- a/docling_serve/app.py
+++ b/docling_serve/app.py
@@ -25,6 +25,7 @@ from fastapi.openapi.docs import (
 )
 from fastapi.responses import RedirectResponse
 from fastapi.staticfiles import StaticFiles
+from scalar_fastapi import get_scalar_api_reference

 from docling.datamodel.base_models import DocumentStream

@@ -39,6 +40,7 @@ from docling_serve.datamodel.requests import (
    ConvertDocumentsRequest,
 )
 from docling_serve.datamodel.responses import (
+    ClearResponse,
    ConvertDocumentResponse,
    HealthCheckResponse,
    MessageKind,
@@ -46,6 +48,7 @@ from docling_serve.datamodel.responses import (
    WebsocketMessage,
 )
 from docling_serve.datamodel.task import Task, TaskSource
+from docling_serve.docling_conversion import _get_converter_from_hash
 from docling_serve.engines.async_orchestrator import (
    BaseAsyncOrchestrator,
    ProgressInvalid,
@@ -96,7 +99,8 @@ async def lifespan(app: FastAPI):
    scratch_dir = get_scratch()

    # Warm up processing cache
-    await orchestrator.warm_up_caches()
+    if docling_serve_settings.load_models_at_boot:
+        await orchestrator.warm_up_caches()

    # Start the background queue processor
    queue_task = asyncio.create_task(orchestrator.process_queue())
@@ -138,8 +142,8 @@ def create_app():  # noqa: C901

    app = FastAPI(
        title="Docling Serve",
-        docs_url=None if offline_docs_assets else "/docs",
-        redoc_url=None if offline_docs_assets else "/redocs",
+        docs_url=None if offline_docs_assets else "/swagger",
+        redoc_url=None if offline_docs_assets else "/docs",
        lifespan=lifespan,
        version=version,
    )
@@ -190,7 +194,7 @@ def create_app():  # noqa: C901
            name="static",
        )

-        @app.get("/docs", include_in_schema=False)
+        @app.get("/swagger", include_in_schema=False)
        async def custom_swagger_ui_html():
            return get_swagger_ui_html(
                openapi_url=app.openapi_url,
@@ -204,7 +208,7 @@ def create_app():  # noqa: C901
        async def swagger_ui_redirect():
            return get_swagger_ui_oauth2_redirect_html()

-        @app.get("/redoc", include_in_schema=False)
+        @app.get("/docs", include_in_schema=False)
        async def redoc_html():
            return get_redoc_html(
                openapi_url=app.openapi_url,
@@ -212,6 +216,15 @@ def create_app():  # noqa: C901
                redoc_js_url="/static/redoc.standalone.js",
            )

+    @app.get("/scalar", include_in_schema=False)
+    async def scalar_html():
+        return get_scalar_api_reference(
+            openapi_url=app.openapi_url,
+            title=app.title,
+            scalar_favicon_url="https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg",
+            # hide_client_button=True,  # not yet released but in main
+        )
+
    ########################
    # Async / Sync helpers #
    ########################
@@ -544,4 +557,27 @@ def create_app():  # noqa: C901
                status_code=400, detail=f"Invalid progress payload: {err}"
            )

+    #### Clear requests
+
+    # Offload models
+    @app.get(
+        "/v1alpha/clear/converters",
+        response_model=ClearResponse,
+    )
+    async def clear_converters():
+        _get_converter_from_hash.cache_clear()
+        return ClearResponse()
+
+    # Clean results
+    @app.get(
+        "/v1alpha/clear/results",
+        response_model=ClearResponse,
+    )
+    async def clear_results(
+        orchestrator: Annotated[BaseAsyncOrchestrator, Depends(get_async_orchestrator)],
+        older_then: float = 3600,
+    ):
+        await orchestrator.clear_results(older_than=older_then)
+        return ClearResponse()
+
    return app
--- a/docling_serve/datamodel/convert.py
+++ b/docling_serve/datamodel/convert.py
@@ -8,8 +8,8 @@ from docling.datamodel.base_models import InputFormat, OutputFormat
 from docling.datamodel.pipeline_options import (
    EasyOcrOptions,
    PdfBackend,
-    PdfPipeline,
    PictureDescriptionBaseOptions,
+    ProcessingPipeline,
    TableFormerMode,
    TableStructureOptions,
 )
@@ -132,7 +132,11 @@ class ConvertDocumentsOptions(BaseModel):
                f"Allowed values: {', '.join([v.value for v in OutputFormat])}. "
                "Optional, defaults to Markdown."
            ),
-            examples=[[OutputFormat.MARKDOWN]],
+            examples=[
+                [OutputFormat.MARKDOWN],
+                [OutputFormat.MARKDOWN, OutputFormat.JSON],
+                [v.value for v in OutputFormat],
+            ],
        ),
    ] = [OutputFormat.MARKDOWN]

@@ -223,15 +227,15 @@ class ConvertDocumentsOptions(BaseModel):
    ] = TableStructureOptions().mode

    pipeline: Annotated[
-        PdfPipeline,
+        ProcessingPipeline,
        Field(description="Choose the pipeline to process PDF or image files."),
-    ] = PdfPipeline.STANDARD
+    ] = ProcessingPipeline.STANDARD

    page_range: Annotated[
        PageRange,
        Field(
            description="Only convert a range of pages. The page number starts at 1.",
-            examples=[(1, 4)],
+            examples=[DEFAULT_PAGE_RANGE, (1, 4)],
        ),
    ] = DEFAULT_PAGE_RANGE

@@ -296,6 +300,14 @@ class ConvertDocumentsOptions(BaseModel):
        ),
    ] = 2.0

+    md_page_break_placeholder: Annotated[
+        str,
+        Field(
+            description="Add this placeholder betweek pages in the markdown output.",
+            examples=["<!-- page-break -->", ""],
+        ),
+    ] = ""
+
    do_code_enrichment: Annotated[
        bool,
        Field(
@@ -351,14 +363,24 @@ class ConvertDocumentsOptions(BaseModel):
    picture_description_local: Annotated[
        Optional[PictureDescriptionLocal],
        Field(
-            description="Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with picture_description_api."
+            description="Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with picture_description_api.",
+            examples=[
+                PictureDescriptionLocal(repo_id="ibm-granite/granite-vision-3.2-2b"),
+                PictureDescriptionLocal(repo_id="HuggingFaceTB/SmolVLM-256M-Instruct"),
+            ],
        ),
    ] = None

    picture_description_api: Annotated[
        Optional[PictureDescriptionApi],
        Field(
-            description="API details for using a vision-language model in the picture description. This parameter is mutually exclusive with picture_description_local."
+            description="API details for using a vision-language model in the picture description. This parameter is mutually exclusive with picture_description_local.",
+            examples=[
+                PictureDescriptionApi(
+                    url="http://localhost:11434/v1/chat/completions",
+                    params={"model": "granite3.2-vision:2b"},
+                )
+            ],
        ),
    ] = None

--- a/docling_serve/datamodel/responses.py
+++ b/docling_serve/datamodel/responses.py
@@ -15,6 +15,10 @@ class HealthCheckResponse(BaseModel):
    status: str = "ok"


+class ClearResponse(BaseModel):
+    status: str = "ok"
+
+
 class DocumentResponse(BaseModel):
    filename: str
    md_content: Optional[str] = None
--- a/docling_serve/datamodel/task.py
+++ b/docling_serve/datamodel/task.py
@@ -1,8 +1,10 @@
+import datetime
+from functools import partial
 from pathlib import Path
 from typing import Optional, Union

 from fastapi.responses import FileResponse
-from pydantic import BaseModel, ConfigDict
+from pydantic import BaseModel, ConfigDict, Field

 from docling.datamodel.base_models import DocumentStream

@@ -25,6 +27,27 @@ class Task(BaseModel):
    result: Optional[Union[ConvertDocumentResponse, FileResponse]] = None
    scratch_dir: Optional[Path] = None
    processing_meta: Optional[TaskProcessingMeta] = None
+    created_at: datetime.datetime = Field(
+        default_factory=partial(datetime.datetime.now, datetime.timezone.utc)
+    )
+    started_at: Optional[datetime.datetime] = None
+    finished_at: Optional[datetime.datetime] = None
+    last_update_at: datetime.datetime = Field(
+        default_factory=partial(datetime.datetime.now, datetime.timezone.utc)
+    )
+
+    def set_status(self, status: TaskStatus):
+        now = datetime.datetime.now(datetime.timezone.utc)
+        if status == TaskStatus.STARTED and self.started_at is None:
+            self.started_at = now
+        if (
+            status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]
+            and self.finished_at is None
+        ):
+            self.finished_at = now
+
+        self.last_update_at = now
+        self.task_status = status

    def is_completed(self) -> bool:
        if self.task_status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]:
--- a/docling_serve/docling_conversion.py
+++ b/docling_serve/docling_conversion.py
@@ -19,10 +19,10 @@ from docling.datamodel.document import ConversionResult
 from docling.datamodel.pipeline_options import (
    OcrOptions,
    PdfBackend,
-    PdfPipeline,
    PdfPipelineOptions,
    PictureDescriptionApiOptions,
    PictureDescriptionVlmOptions,
+    ProcessingPipeline,
    TableFormerMode,
    VlmPipelineOptions,
    smoldocling_vlm_conversion_options,
@@ -58,7 +58,9 @@ def _hash_pdf_format_option(pdf_format_option: PdfFormatOption) -> bytes:

    # Serialize the dictionary to JSON with sorted keys to have consistent hashes
    serialized_data = json.dumps(data, sort_keys=True)
-    options_hash = hashlib.sha1(serialized_data.encode()).digest()
+    options_hash = hashlib.sha1(
+        serialized_data.encode(), usedforsecurity=False
+    ).digest()
    return options_hash


@@ -215,7 +217,7 @@ def get_pdf_pipeline_opts(
        )

    pipeline_options: Union[PdfPipelineOptions, VlmPipelineOptions]
-    if request.pipeline == PdfPipeline.STANDARD:
+    if request.pipeline == ProcessingPipeline.STANDARD:
        pipeline_options = _parse_standard_pdf_opts(request, artifacts_path)
        backend = _parse_backend(request)
        pdf_format_option = PdfFormatOption(
@@ -223,7 +225,7 @@ def get_pdf_pipeline_opts(
            backend=backend,
        )

-    elif request.pipeline == PdfPipeline.VLM:
+    elif request.pipeline == ProcessingPipeline.VLM:
        pipeline_options = _parse_vlm_pdf_opts(request, artifacts_path)
        pdf_format_option = PdfFormatOption(
            pipeline_cls=VlmPipeline, pipeline_options=pipeline_options
--- a/docling_serve/engines/async_kfp/orchestrator.py
+++ b/docling_serve/engines/async_kfp/orchestrator.py
@@ -130,13 +130,13 @@ class AsyncKfpOrchestrator(BaseAsyncOrchestrator):
        # CANCELED = "CANCELED"
        # PAUSED = "PAUSED"
        if run_info.state == V2beta1RuntimeState.SUCCEEDED:
-            task.task_status = TaskStatus.SUCCESS
+            task.set_status(TaskStatus.SUCCESS)
        elif run_info.state == V2beta1RuntimeState.PENDING:
-            task.task_status = TaskStatus.PENDING
+            task.set_status(TaskStatus.PENDING)
        elif run_info.state == V2beta1RuntimeState.RUNNING:
-            task.task_status = TaskStatus.STARTED
+            task.set_status(TaskStatus.STARTED)
        else:
-            task.task_status = TaskStatus.FAILURE
+            task.set_status(TaskStatus.FAILURE)

    async def task_status(self, task_id: str, wait: float = 0.0) -> Task:
        await self._update_task_from_run(task_id=task_id, wait=wait)
--- a/docling_serve/engines/async_local/orchestrator.py
+++ b/docling_serve/engines/async_local/orchestrator.py
@@ -3,6 +3,8 @@ import logging
 import uuid
 from typing import Optional

+from docling.datamodel.base_models import InputFormat
+
 from docling_serve.datamodel.convert import ConvertDocumentsOptions
 from docling_serve.datamodel.task import Task, TaskSource
 from docling_serve.docling_conversion import get_converter, get_pdf_pipeline_opts
@@ -54,4 +56,5 @@ class AsyncLocalOrchestrator(BaseAsyncOrchestrator):
    async def warm_up_caches(self):
        # Converter with default options
        pdf_format_option = get_pdf_pipeline_opts(ConvertDocumentsOptions())
-        get_converter(pdf_format_option)
+        converter = get_converter(pdf_format_option)
+        converter.initialize_pipeline(InputFormat.PDF)
--- a/docling_serve/engines/async_local/worker.py
+++ b/docling_serve/engines/async_local/worker.py
@@ -36,7 +36,7 @@ class AsyncLocalWorker:
            task = self.orchestrator.tasks[task_id]

            try:
-                task.task_status = TaskStatus.STARTED
+                task.set_status(TaskStatus.STARTED)
                _log.info(f"Worker {self.worker_id} processing task {task_id}")

                # Notify clients about task updates
@@ -106,7 +106,7 @@ class AsyncLocalWorker:
                task.sources = []
                task.options = None

-                task.task_status = TaskStatus.SUCCESS
+                task.set_status(TaskStatus.SUCCESS)
                _log.info(
                    f"Worker {self.worker_id} completed job {task_id} "
                    f"in {processing_time:.2f} seconds"
@@ -116,7 +116,7 @@ class AsyncLocalWorker:
                _log.error(
                    f"Worker {self.worker_id} failed to process job {task_id}: {e}"
                )
-                task.task_status = TaskStatus.FAILURE
+                task.set_status(TaskStatus.FAILURE)

            finally:
                await self.orchestrator.notify_task_subscribers(task_id)
--- a/docling_serve/engines/async_orchestrator.py
+++ b/docling_serve/engines/async_orchestrator.py
@@ -1,3 +1,6 @@
+import asyncio
+import datetime
+import logging
 import shutil
 from typing import Union

@@ -20,6 +23,8 @@ from docling_serve.engines.base_orchestrator import (
 )
 from docling_serve.settings import docling_serve_settings

+_log = logging.getLogger(__name__)
+

 class ProgressInvalid(OrchestratorError):
    pass
@@ -46,13 +51,50 @@ class BaseAsyncOrchestrator(BaseOrchestrator):
    async def task_result(
        self, task_id: str, background_tasks: BackgroundTasks
    ) -> Union[ConvertDocumentResponse, FileResponse, None]:
-        task = await self.get_raw_task(task_id=task_id)
-        if task.is_completed() and task.scratch_dir is not None:
-            if docling_serve_settings.single_use_results:
-                background_tasks.add_task(
-                    shutil.rmtree, task.scratch_dir, ignore_errors=True
-                )
-        return task.result
+        try:
+            task = await self.get_raw_task(task_id=task_id)
+            if task.is_completed() and docling_serve_settings.single_use_results:
+                if task.scratch_dir is not None:
+                    background_tasks.add_task(
+                        shutil.rmtree, task.scratch_dir, ignore_errors=True
+                    )
+
+                async def _remove_task_impl():
+                    await asyncio.sleep(docling_serve_settings.result_removal_delay)
+                    await self.delete_task(task_id=task.task_id)
+
+                async def _remove_task():
+                    asyncio.create_task(_remove_task_impl())  # noqa: RUF006
+
+                background_tasks.add_task(_remove_task)
+
+            return task.result
+        except TaskNotFoundError:
+            return None
+
+    async def delete_task(self, task_id: str):
+        _log.info(f"Deleting {task_id=}")
+        if task_id in self.task_subscribers:
+            for websocket in self.task_subscribers[task_id]:
+                await websocket.close()
+
+            del self.task_subscribers[task_id]
+
+        if task_id in self.tasks:
+            del self.tasks[task_id]
+
+    async def clear_results(self, older_than: float = 0.0):
+        cutoff_time = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(
+            seconds=older_than
+        )
+
+        tasks_to_delete = [
+            task_id
+            for task_id, task in self.tasks.items()
+            if task.finished_at is not None and task.finished_at < cutoff_time
+        ]
+        for task_id in tasks_to_delete:
+            await self.delete_task(task_id=task_id)

    async def notify_task_subscribers(self, task_id: str):
        if task_id not in self.task_subscribers:
--- a/docling_serve/engines/base_orchestrator.py
+++ b/docling_serve/engines/base_orchestrator.py
@@ -42,6 +42,10 @@ class BaseOrchestrator(ABC):
    ) -> Union[ConvertDocumentResponse, FileResponse, None]:
        pass

+    @abstractmethod
+    async def clear_results(self, older_than: float = 0.0):
+        pass
+
    @abstractmethod
    async def process_queue(self):
        pass
--- a/docling_serve/gradio_ui.py
+++ b/docling_serve/gradio_ui.py
@@ -1,5 +1,6 @@
 import base64
 import importlib
+import itertools
 import json
 import logging
 import ssl
@@ -12,9 +13,10 @@ import certifi
 import gradio as gr
 import httpx

+from docling.datamodel.base_models import FormatToExtensions
 from docling.datamodel.pipeline_options import (
    PdfBackend,
-    PdfPipeline,
+    ProcessingPipeline,
    TableFormerMode,
    TableStructureOptions,
 )
@@ -29,7 +31,7 @@ logger = logging.getLogger(__name__)
 ############################

 logo_path = "https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg"
-js_components_url = "https://unpkg.com/@docling/docling-components@0.0.6"
+js_components_url = "https://unpkg.com/@docling/docling-components@0.0.7"
 if (
    docling_serve_settings.static_path is not None
    and docling_serve_settings.static_path.is_dir()
@@ -83,7 +85,7 @@ css = """
    height: 140px;
 }

-docling-img::part(pages) {
+docling-img {
    gap: 1rem;
 }

@@ -443,7 +445,7 @@ def response_to_output(response, return_as_file):
        )
        # Embed document JSON and trigger load at client via an image.
        json_rendered_content = f"""
-            <docling-img id="dclimg" pagenumbers tooltip="parsed"></docling-img>
+            <docling-img id="dclimg" pagenumbers><docling-tooltip></docling-tooltip></docling-img>
            <script id="dcljson" type="application/json" onload="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);">{json_content}</script>
            <img src onerror="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);" />
            """
@@ -545,19 +547,10 @@ with gr.Blocks(
                    elem_id="file_input_zone",
                    label="Upload File",
                    file_types=[
-                        ".pdf",
-                        ".docx",
-                        ".pptx",
-                        ".html",
-                        ".xlsx",
-                        ".json",
-                        ".asciidoc",
-                        ".txt",
-                        ".md",
-                        ".jpg",
-                        ".jpeg",
-                        ".png",
-                        ".gif",
+                        f".{v}"
+                        for v in itertools.chain.from_iterable(
+                            FormatToExtensions.values()
+                        )
                    ],
                    file_count="multiple",
                    scale=4,
@@ -594,9 +587,9 @@ with gr.Blocks(
        with gr.Row():
            with gr.Column(scale=1, min_width=200):
                pipeline = gr.Radio(
-                    [(v.value.capitalize(), v.value) for v in PdfPipeline],
+                    [(v.value.capitalize(), v.value) for v in ProcessingPipeline],
                    label="Pipeline type",
-                    value=PdfPipeline.STANDARD.value,
+                    value=ProcessingPipeline.STANDARD.value,
                )
        with gr.Row():
            with gr.Column(scale=1, min_width=200):
--- a/docling_serve/helper_functions.py
+++ b/docling_serve/helper_functions.py
@@ -1,9 +1,30 @@
 import inspect
+import json
 import re
-from typing import Union
+from typing import Union, get_args, get_origin

 from fastapi import Depends, Form
-from pydantic import BaseModel
+from pydantic import BaseModel, TypeAdapter
+
+
+def is_pydantic_model(type_):
+    try:
+        if inspect.isclass(type_) and issubclass(type_, BaseModel):
+            return True
+
+        origin = get_origin(type_)
+        if origin is Union:
+            args = get_args(type_)
+            return any(
+                inspect.isclass(arg) and issubclass(arg, BaseModel)
+                for arg in args
+                if arg is not type(None)
+            )
+
+    except Exception:
+        pass
+
+    return False


 # Adapted from
@@ -12,25 +33,62 @@ def FormDepends(cls: type[BaseModel]):
    new_parameters = []

    for field_name, model_field in cls.model_fields.items():
+        annotation = model_field.annotation
+        description = model_field.description
+        default = (
+            Form(..., description=description, examples=model_field.examples)
+            if model_field.is_required()
+            else Form(
+                model_field.default,
+                examples=model_field.examples,
+                description=description,
+            )
+        )
+
+        # Flatten nested Pydantic models by accepting them as JSON strings
+        if is_pydantic_model(annotation):
+            annotation = str
+            default = Form(
+                None
+                if model_field.default is None
+                else json.dumps(model_field.default.model_dump(mode="json")),
+                description=description,
+                examples=None
+                if not model_field.examples
+                else [
+                    json.dumps(ex.model_dump(mode="json"))
+                    for ex in model_field.examples
+                ],
+            )
+
        new_parameters.append(
            inspect.Parameter(
                name=field_name,
                kind=inspect.Parameter.POSITIONAL_ONLY,
-                default=(
-                    Form(...)
-                    if model_field.is_required()
-                    else Form(model_field.default)
-                ),
-                annotation=model_field.annotation,
+                default=default,
+                annotation=annotation,
            )
        )

    async def as_form_func(**data):
+        for field_name, model_field in cls.model_fields.items():
+            value = data.get(field_name)
+            annotation = model_field.annotation
+
+            # Parse nested models from JSON string
+            if value is not None and is_pydantic_model(annotation):
+                try:
+                    validator = TypeAdapter(annotation)
+                    data[field_name] = validator.validate_json(value)
+                except Exception as e:
+                    raise ValueError(f"Invalid JSON for field '{field_name}': {e}")
+
        return cls(**data)

    sig = inspect.signature(as_form_func)
    sig = sig.replace(parameters=new_parameters)
    as_form_func.__signature__ = sig  # type: ignore
+
    return Depends(as_form_func)


--- a/docling_serve/response_preparation.py
+++ b/docling_serve/response_preparation.py
@@ -27,6 +27,7 @@ def _export_document_as_content(
    export_txt: bool,
    export_doctags: bool,
    image_mode: ImageRefMode,
+    md_page_break_placeholder: str,
 ):
    document = DocumentResponse(filename=conv_res.input.file.name)

@@ -40,10 +41,14 @@ def _export_document_as_content(
            document.html_content = new_doc.export_to_html(image_mode=image_mode)
        if export_txt:
            document.text_content = new_doc.export_to_markdown(
-                strict_text=True, image_mode=image_mode
+                strict_text=True,
+                image_mode=image_mode,
            )
        if export_md:
-            document.md_content = new_doc.export_to_markdown(image_mode=image_mode)
+            document.md_content = new_doc.export_to_markdown(
+                image_mode=image_mode,
+                page_break_placeholder=md_page_break_placeholder or None,
+            )
        if export_doctags:
            document.doctags_content = new_doc.export_to_doctags()
    elif conv_res.status == ConversionStatus.SKIPPED:
@@ -63,6 +68,7 @@ def _export_documents_as_files(
    export_txt: bool,
    export_doctags: bool,
    image_export_mode: ImageRefMode,
+    md_page_break_placeholder: str,
 ):
    success_count = 0
    failure_count = 0
@@ -103,7 +109,9 @@ def _export_documents_as_files(
                fname = output_dir / f"{doc_filename}.md"
                _log.info(f"writing Markdown output to {fname}")
                conv_res.document.save_as_markdown(
-                    filename=fname, image_mode=image_export_mode
+                    filename=fname,
+                    image_mode=image_export_mode,
+                    page_break_placeholder=md_page_break_placeholder or None,
                )

            # Export Document Tags format:
@@ -170,6 +178,7 @@ def process_results(
            export_txt=export_txt,
            export_doctags=export_doctags,
            image_mode=conversion_options.image_export_mode,
+            md_page_break_placeholder=conversion_options.md_page_break_placeholder,
        )

        response = ConvertDocumentResponse(
@@ -198,6 +207,7 @@ def process_results(
            export_txt=export_txt,
            export_doctags=export_doctags,
            image_export_mode=conversion_options.image_export_mode,
+            md_page_break_placeholder=conversion_options.md_page_break_placeholder,
        )

        files = os.listdir(output_dir)
--- a/docling_serve/settings.py
+++ b/docling_serve/settings.py
@@ -40,6 +40,8 @@ class DoclingServeSettings(BaseSettings):
    static_path: Optional[Path] = None
    scratch_path: Optional[Path] = None
    single_use_results: bool = True
+    result_removal_delay: float = 300  # 5 minutes
+    load_models_at_boot: bool = True
    options_cache_size: int = 2
    enable_remote_services: bool = False
    allow_external_plugins: bool = False
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,4 +1,4 @@
-# Dolcing Serve documentation
+# Docling Serve documentation

 This documentation pages explore the webserver configurations, runtime options, deployment examples as well as development best practices.

--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -7,7 +7,7 @@ server and the actual app-specific configurations.

 > [!WARNING]
 > When the server is running with `reload` or with multiple `workers`, uvicorn
-> will spawn multiple subprocessed. This invalides all the values configured
+> will spawn multiple subprocessed. This invalidates all the values configured
 > via the CLI command line options. Please use environment variables in this
 > type of deployments.

@@ -42,10 +42,12 @@ THe following table describes the options to configure the Docling Serve app.
 |  | `DOCLING_SERVE_ENABLE_REMOTE_SERVICES` | `false` | Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |
 |  | `DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS` | `false` | Allow the selection of third-party plugins. |
 |  | `DOCLING_SERVE_SINGLE_USE_RESULTS` | `true` | If true, results can be accessed only once. If false, the results accumulate in the scratch directory. |
+|  | `DOCLING_SERVE_RESULT_REMOVAL_DELAY` | `300` | When `DOCLING_SERVE_SINGLE_USE_RESULTS` is active, this is the delay before results are removed from the task registry. |
 |  | `DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT` | `604800` (7 days) | The maximum time for processing a document. |
 |  | `DOCLING_SERVE_MAX_NUM_PAGES` |  | The maximum number of pages for a document to be processed. |
 |  | `DOCLING_SERVE_MAX_FILE_SIZE` |  | The maximum file size for a document to be processed. |
 |  | `DOCLING_SERVE_MAX_SYNC_WAIT` | `120` | Max number of seconds a synchronous endpoint is waiting for the task completion. |
+|  | `DOCLING_SERVE_LOAD_MODELS_AT_BOOT` | `True` | If enabled, the models for the default options will be loaded at boot. |
 |  | `DOCLING_SERVE_OPTIONS_CACHE_SIZE` | `2` | How many DocumentConveter objects (including their loaded models) to keep in the cache. |
 |  | `DOCLING_SERVE_CORS_ORIGINS` | `["*"]` | A list of origins that should be permitted to make cross-origin requests. |
 |  | `DOCLING_SERVE_CORS_METHODS` | `["*"]` | A list of HTTP methods that should be allowed for cross-origin requests. |
@@ -59,7 +61,7 @@ The selected compute engine will be running all the async jobs.

 #### Local engine

-The following table describes the options to configure the Docling Serve KFP engine.
+The following table describes the options to configure the Docling Serve local engine.

 | ENV | Default | Description |
 |-----|---------|-------------|
--- a/docs/deploy-examples/docling-model-cache-deployment.yaml
+++ b/docs/deploy-examples/docling-model-cache-deployment.yaml
@@ -0,0 +1,47 @@
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 2
+              memory: 4Gi
+            requests:
+              cpu: 250m
+              memory: 1Gi
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+            - name: DOCLING_SERVE_ARTIFACTS_PATH
+              value: '/modelcache'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve-cpu'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
--- a/docs/deploy-examples/docling-model-cache-job.yaml
+++ b/docs/deploy-examples/docling-model-cache-job.yaml
@@ -0,0 +1,33 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: docling-model-cache-load
+spec:
+  selector: {}
+  template:
+    metadata:
+      name: docling-model-load
+    spec:
+      containers:
+        - name: loader
+          image: ghcr.io/docling-project/docling-serve-cpu:main
+          command:
+            - docling-tools
+            - models
+            - download
+            - '--output-dir=/modelcache'
+            - 'layout'
+            - 'tableformer'
+            - 'code_formula'
+            - 'picture_classifier'
+            - 'smolvlm'
+            - 'granite_vision'
+            - 'easyocr'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
+      restartPolicy: Never
--- a/docs/deploy-examples/docling-model-cache-pvc.yaml
+++ b/docs/deploy-examples/docling-model-cache-pvc.yaml
@@ -0,0 +1,11 @@
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: docling-model-cache-pvc
+spec:
+  accessModes:
+    - ReadWriteOnce
+  volumeMode: Filesystem
+  resources:
+    requests:
+      storage: 10Gi
--- a/docs/deploy-examples/docling-serve-oauth.yaml
+++ b/docs/deploy-examples/docling-serve-oauth.yaml
@@ -85,7 +85,7 @@ spec:
          resources:
            limits:
              cpu: 2000m
-              memory: 2Gi
+              memory: 4Gi
            requests:
              cpu: 800m
              memory: 1Gi
--- a/docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
+++ b/docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
@@ -0,0 +1,76 @@
+# This example deployment configures Docling Serve with a Route + Sticky sessions, a Service and cpu image
+---
+kind: Route
+apiVersion: route.openshift.io/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+  annotations:
+    haproxy.router.openshift.io/disable_cookies: "false" # this annotation enables the sticky sessions
+spec:
+  path: /
+  to:
+    kind: Service
+    name: docling-serve
+  port:
+    targetPort: http
+  tls:
+    termination: edge
+    insecureEdgeTerminationPolicy: Redirect
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  ports:
+  - name: http
+    port: 5001
+    targetPort: http
+  selector:
+    app: docling-serve
+    component: docling-serve-api
+---
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 1
+              memory: 4Gi
+            requests:
+              cpu: 250m
+              memory: 1Gi
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve'
--- a/docs/deploy-examples/docling-serve-simple.yaml
+++ b/docs/deploy-examples/docling-serve-simple.yaml
@@ -40,8 +40,8 @@ spec:
        - name: api
          resources:
            limits:
-              cpu: 500m
-              memory: 2Gi
+              cpu: 1
+              memory: 4Gi
              nvidia.com/gpu: 1  # Limit to one GPU
            requests:
              cpu: 250m
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -192,3 +192,45 @@ curl -X 'POST' \
    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
  }'
 ```
+
+### ReplicaSets with `sticky sessions`
+
+Manifest example: [docling-serve-replicas-w-sticky-sessions.yaml](./deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml)
+
+This deployment has the following features:
+
+- Deployment configuration with 3 replicas
+- Service configuration
+- Expose the service using a OpenShift `Route` and enables sticky sessions
+
+Install the app with:
+
+```sh
+oc apply -f docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
+```
+
+For using the API:
+
+```sh
+# Retrieve the endpoint
+DOCLING_NAME=docling-serve
+DOCLING_ROUTE="https://$(oc get routes $DOCLING_NAME --template={{.spec.host}})"
+
+# Make a test query, store the cookie and taskid
+task_id=$(curl -s -X 'POST' \
+    "${DOCLING_ROUTE}/v1alpha/convert/source/async" \
+    -H "accept: application/json" \
+    -H "Content-Type: application/json" \
+    -d '{
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+    }' \
+    -c cookies.txt | grep -oP '"task_id":"\K[^"]+')
+```
+
+```sh
+# Grab the taskid and cookie to check the task status
+curl -v -X 'GET' \
+  "${DOCLING_ROUTE}/v1alpha/status/poll/$task_id?wait=0" \
+  -H "accept: application/json" \
+  -b "cookies.txt"
+```
--- a/docs/pre-loading-models.md
+++ b/docs/pre-loading-models.md
@@ -0,0 +1,103 @@
+# Pre-loading models for docling
+
+This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.
+
+1. We need to create a persistent volume that will store models weights:
+
+    ```yaml
+    apiVersion: v1
+    kind: PersistentVolumeClaim
+    metadata:
+      name: docling-model-cache-pvc
+    spec:
+      accessModes:
+        - ReadWriteOnce
+      volumeMode: Filesystem
+      resources:
+        requests:
+          storage: 10Gi
+    ```
+
+    If you don't want to use default storage class, set your custom storage class with following:
+
+    ```yaml
+    spec:
+      ...
+      storageClassName: <Storage Class Name>
+    ```
+
+    Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml)
+
+2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:
+
+    ```yaml
+    apiVersion: batch/v1
+    kind: Job
+    metadata:
+      name: docling-model-cache-load
+    spec:
+      selector: {}
+      template:
+        metadata:
+          name: docling-model-load
+        spec:
+          containers:
+            - name: loader
+              image: ghcr.io/docling-project/docling-serve-cpu:main
+              command:
+                - docling-tools
+                - models
+                - download
+                - '--output-dir=/modelcache'
+                - 'layout'
+                - 'tableformer'
+                - 'code_formula'
+                - 'picture_classifier'
+                - 'smolvlm'
+                - 'granite_vision'
+                - 'easyocr'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+          restartPolicy: Never
+    ```
+
+    The job will mount previously created persistent volume and execute command similar to how we would load models locally:
+    `docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]`
+
+    In manifest, we specify desired models individually, or we can use `--all` parameter to download all models.
+
+    Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
+
+3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
+    Following additions to deploymeny should be made:
+
+    ```yaml
+    spec:
+      template:
+        spec:
+          containers:
+            - name: api
+              env:
+              ...
+                - name: DOCLING_SERVE_ARTIFACTS_PATH
+                  value: '/modelcache'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          ...
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+    ```
+
+    Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
+
+    Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.
+
+    Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -9,15 +9,17 @@ On top of the source of file (see below), both endpoints support the same parame
 - `from_formats` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
 - `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
 - `pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
+- `page_range` (tuple). If speficied, only convert a range of pages. The page number starts at 1.
 - `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
 - `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
 - `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
- `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesseract_cli`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`.
+- `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesserocr`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`. To use the `tesserocr` engine, `tesserocr` must be installed where docling-serve is running: `pip install tesserocr`
 - `ocr_lang` (List[str]): List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. Defaults to empty.
 - `pdf_backend` (str): PDF backend to use. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4`. Defaults to `dlparse_v4`.
 - `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`.
 - `abort_on_error` (bool): If enabled, abort on error. Defaults to false.
 - `return_as_file` (boo): If enabled, return the output as a file. Defaults to false.
+- `md_page_break_placeholder` (str): Add this placeholder between pages in the markdown output.
 - `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
 - `do_code_enrichment` (bool): If enabled, perform OCR code enrichment. Defaults to false.
 - `do_formula_enrichment` (bool): If enabled, perform formula OCR, return LaTeX code. Defaults to false.
@@ -244,7 +246,7 @@ files = {
    'files': ('2206.01062v1.pdf', open(file_path, 'rb'), 'application/pdf'),
 }

-response = await async_client.post(url, files=files, data={"parameters": json.dumps(parameters)})
+response = await async_client.post(url, files=files, data=parameters)
 assert response.status_code == 200, "Response should be 200 OK"

 data = response.json()
@@ -286,33 +288,42 @@ The api option is specified with:

 Example URLs are:

- `http://localhost:8000/v1/chat/completions` for the local vllm api, with example `params`:
+- `http://localhost:8000/v1/chat/completions` for the local vllm api, with example `picture_description_api`:
  - the `HuggingFaceTB/SmolVLM-256M-Instruct` model

    ```json
    {
+      "url": "http://localhost:8000/v1/chat/completions",
+      "params": {
        "model": "HuggingFaceTB/SmolVLM-256M-Instruct",
        "max_completion_tokens": 200,
+      }
    }
    ```
-  
+
  - the `ibm-granite/granite-vision-3.2-2b` model

    ```json
    {
+      "url": "http://localhost:8000/v1/chat/completions",
+      "params": {
        "model": "ibm-granite/granite-vision-3.2-2b",
        "max_completion_tokens": 200,
+      }
    }
    ```

- `http://localhost:11434/v1/chat/completions` for the local ollama api, with example `params`:
+- `http://localhost:11434/v1/chat/completions` for the local ollama api, with example `picture_description_api`:
  - the `granite3.2-vision:2b` model

    ```json
    {
+      "url": "http://localhost:11434/v1/chat/completions",
+      "params": {
        "model": "granite3.2-vision:2b"
+      }
    }
-    ```  
+    ```

 Note that when using `picture_description_api`, the server must be launched with `DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true`.

@@ -348,4 +359,92 @@ The response can be a JSON Document or a File.

 ## Asynchronous API

-TBA
+Both `/v1alpha/convert/source` and `/v1alpha/convert/file` endpoints are available as asynchronous variants.
+The advantage of the asynchronous endpoints is the possible to interrupt the connection, check for the progress update and fetch the result.
+This approach is more resilient against network stabilities and allows the client application logic to easily interleave conversion with other tasks.
+
+Launch an asynchronous conversion with:
+
+- `POST /v1alpha/convert/source/async` when providing the input as sources.
+- `POST /v1alpha/convert/file/async` when providing the input as multipart-form files.
+
+The response format is a task detail:
+
+```jsonc
+{
+  "task_id": "<task_id>",  // the task_id which can be used for the next operations
+  "task_status": "pending|started|success|failure",  // the task status
+  "task_position": 1,  // the position in the queue
+  "task_meta": null,  // metadata e.g. how many documents are in the total job and how many have been converted
+}
+```
+
+### Polling status
+
+For checking the progress of the conversion task and wait for its completion, use the endpoint:
+
+- `GET /v1alpha/status/poll/{task_id}`
+
+<details>
+<summary>Example waiting loop:</summary>
+
+```python
+import time
+import httpx
+
+# ...
+# response from the async task submission
+task = response.json()
+
+while task["task_status"] not in ("success", "failure"):
+    response = httpx.get(f"{base_url}/status/poll/{task['task_id']}")
+    task = response.json()
+
+    time.sleep(5)
+```
+
+<details>
+
+### Subscribe with websockets
+
+Using websocket you can get the client application being notified about updates of the conversion task.
+To start the websocker connection, use the endpoint:
+
+- `/v1alpha/status/ws/{task_id}`
+
+Websocket messages are JSON object with the following structure:
+
+```jsonc
+{
+  "message": "connection|update|error",  // type of message being sent
+  "task": {},  // the same content of the task description
+  "error": "",  // description of the error
+}
+```
+
+<details>
+<summary>Example websocker usage:</summary>
+
+```python
+from websockets.sync.client import connect
+
+uri = f"ws://{base_url}/v1alpha/status/ws/{task['task_id']}"
+with connect(uri) as websocket:
+    for message in websocket:
+        try:
+            payload = json.loads(message)
+            if payload["message"] == "error":
+                break
+            if payload["message"] == "error" and payload["task"]["task_status"] in ("success", "failure"):
+                break
+        except:
+          break
+```
+
+</details>
+
+### Fetch results
+
+When the task is completed, the result can be fetched with the endpoint:
+
+- `GET /v1alpha/result/{task_id}`
--- a/os-packages.txt
+++ b/os-packages.txt
@@ -1,6 +1,7 @@
 tesseract
 tesseract-devel
 tesseract-langpack-eng
+tesseract-osd
 leptonica-devel
 libglvnd-glx
 glib2
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "docling-serve"
-version = "0.10.1"  # DO NOT EDIT, updated automatically
+version = "0.16.1"  # DO NOT EDIT, updated automatically
 description = "Running Docling as a service"
 license = {text = "MIT"}
 authors = [
@@ -26,11 +26,16 @@ classifiers = [
    # "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "Typing :: Typed",
-    "Programming Language :: Python :: 3"
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
 ]
 requires-python = ">=3.10"
 dependencies = [
-    "docling[vlm]~=2.28",
+    "docling[vlm]~=2.38",
+    "docling-core>=2.32.0",
    "mlx-vlm~=0.1.12; sys_platform == 'darwin' and platform_machine == 'arm64'",
    "fastapi[standard]~=0.115",
    "httpx~=0.28",
@@ -41,6 +46,7 @@ dependencies = [
    "typer~=0.12",
    "uvicorn[standard]>=0.29.0,<1.0.0",
    "websockets~=14.0",
+    "scalar-fastapi>=1.0.3",
 ]

 [project.optional-dependencies]
@@ -55,14 +61,6 @@ rapidocr = [
    "rapidocr-onnxruntime~=1.4; python_version<'3.13'",
    "onnxruntime~=1.7",
 ]
-cpu = [
-  "torch>=2.6.0",
-  "torchvision>=0.21.0",
-]
-cu124 = [
-  "torch>=2.6.0",
-  "torchvision>=0.21.0",
-]
 flash-attn = [
  "flash-attn~=2.7.0; sys_platform == 'linux' and platform_machine == 'x86_64'"
 ]
@@ -78,18 +76,39 @@ dev = [
    "python-semantic-release~=7.32",
    "ruff>=0.9.6",
 ]
+pypi = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cpu = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cu124 = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cu126 = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cu128 = [
+  "torch>=2.7.0",
+  "torchvision>=0.22.0",
+]

 [tool.uv]
 package = true
+default-groups = ["dev", "pypi"]
 conflicts = [
  [
-    { extra = "cpu" },
-    { extra = "cu124" },
+    { group = "pypi" },
+    { group = "cpu" },
+    { group = "cu124" },
+    { group = "cu126" },
+    { group = "cu128" },
  ],
-  [
-    { extra = "cpu" },
-    { extra = "flash-attn" },
-  ],]
+]
 environments = ["sys_platform != 'darwin' or platform_machine != 'x86_64'"]
 override-dependencies = [
  "urllib3~=2.0"
@@ -97,14 +116,25 @@ override-dependencies = [

 [tool.uv.sources]
 torch = [
-  { index = "pytorch-cpu", extra = "cpu" },
-  { index = "pytorch-cu124", extra = "cu124" },
+  { index = "pytorch-pypi", group = "pypi" },
+  { index = "pytorch-cpu", group = "cpu" },
+  { index = "pytorch-cu124", group = "cu124" },
+  { index = "pytorch-cu126", group = "cu126" },
+  { index = "pytorch-cu128", group = "cu128" },
 ]
 torchvision = [
-  { index = "pytorch-cpu", extra = "cpu" },
-  { index = "pytorch-cu124", extra = "cu124" },
+  { index = "pytorch-pypi", group = "pypi" },
+  { index = "pytorch-cpu", group = "cpu" },
+  { index = "pytorch-cu124", group = "cu124" },
+  { index = "pytorch-cu126", group = "cu126" },
+  { index = "pytorch-cu128", group = "cu128" },
 ]

+[[tool.uv.index]]
+name = "pytorch-pypi"
+url = "https://pypi.org/simple"
+explicit = true
+
 [[tool.uv.index]]
 name = "pytorch-cpu"
 url = "https://download.pytorch.org/whl/cpu"
@@ -115,6 +145,16 @@ name = "pytorch-cu124"
 url = "https://download.pytorch.org/whl/cu124"
 explicit = true

+[[tool.uv.index]]
+name = "pytorch-cu126"
+url = "https://download.pytorch.org/whl/cu126"
+explicit = true
+
+[[tool.uv.index]]
+name = "pytorch-cu128"
+url = "https://download.pytorch.org/whl/cu128"
+explicit = true
+
 [tool.setuptools.packages.find]
 include = ["docling_serve*"]
 namespaces = true
@@ -212,6 +252,7 @@ module = [
    "kfp.*",
    "kfp_server_api.*",
    "mlx_vlm.*",
+    "scalar_fastapi.*",
 ]
 ignore_missing_imports = true

--- a/tests/test_1-file-async.py
+++ b/tests/test_1-file-async.py
@@ -51,10 +51,12 @@ async def test_convert_url(async_client):
        time.sleep(2)

    assert task["task_status"] == "success"
+    print(f"Task completed with status {task['task_status']=}")

    result_resp = await async_client.get(f"{base_url}/result/{task['task_id']}")
    assert result_resp.status_code == 200, "Response should be 200 OK"
    result = result_resp.json()
+    print("Got result.")

    assert "md_content" in result["document"]
    assert result["document"]["md_content"] is not None
--- a/tests/test_file_opts.py
+++ b/tests/test_file_opts.py
@@ -0,0 +1,77 @@
+import asyncio
+import json
+import os
+
+import pytest
+import pytest_asyncio
+from asgi_lifespan import LifespanManager
+from httpx import ASGITransport, AsyncClient
+
+from docling_core.types import DoclingDocument
+from docling_core.types.doc.document import PictureDescriptionData
+
+from docling_serve.app import create_app
+
+
+@pytest.fixture(scope="session")
+def event_loop():
+    return asyncio.get_event_loop()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def app():
+    app = create_app()
+
+    async with LifespanManager(app) as manager:
+        print("Launching lifespan of app.")
+        yield manager.app
+
+
+@pytest_asyncio.fixture(scope="session")
+async def client(app):
+    async with AsyncClient(
+        transport=ASGITransport(app=app), base_url="http://app.io"
+    ) as client:
+        print("Client is ready")
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_file(client: AsyncClient):
+    """Test convert single file to all outputs"""
+
+    endpoint = "/v1alpha/convert/file"
+    options = {
+        "to_formats": ["md", "json"],
+        "image_export_mode": "placeholder",
+        "ocr": False,
+        "do_picture_description": True,
+        "picture_description_api": json.dumps(
+            {
+                "url": "http://localhost:11434/v1/chat/completions",  # ollama
+                "params": {"model": "granite3.2-vision:2b"},
+                "timeout": 60,
+                "prompt": "Describe this image in a few sentences. ",
+            }
+        ),
+    }
+
+    current_dir = os.path.dirname(__file__)
+    file_path = os.path.join(current_dir, "2206.01062v1.pdf")
+
+    files = {
+        "files": ("2206.01062v1.pdf", open(file_path, "rb"), "application/pdf"),
+    }
+
+    response = await client.post(endpoint, files=files, data=options)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    data = response.json()
+
+    doc = DoclingDocument.model_validate(data["document"]["json_content"])
+
+    for pic in doc.pictures:
+        for ann in pic.annotations:
+            if isinstance(ann, PictureDescriptionData):
+                print(f"{pic.self_ref}")
+                print(ann.text)
--- a/tests/test_results_clear.py
+++ b/tests/test_results_clear.py
@@ -0,0 +1,127 @@
+import asyncio
+import base64
+import json
+from pathlib import Path
+
+import pytest
+import pytest_asyncio
+from asgi_lifespan import LifespanManager
+from httpx import ASGITransport, AsyncClient
+
+from docling_serve.app import create_app
+from docling_serve.settings import docling_serve_settings
+
+
+@pytest.fixture(scope="session")
+def event_loop():
+    return asyncio.get_event_loop()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def app():
+    app = create_app()
+
+    async with LifespanManager(app) as manager:
+        print("Launching lifespan of app.")
+        yield manager.app
+
+
+@pytest_asyncio.fixture(scope="session")
+async def client(app):
+    async with AsyncClient(
+        transport=ASGITransport(app=app), base_url="http://app.io"
+    ) as client:
+        print("Client is ready")
+        yield client
+
+
+async def convert_file(client: AsyncClient):
+    doc_filename = Path("tests/2408.09869v5.pdf")
+    encoded_doc = base64.b64encode(doc_filename.read_bytes()).decode()
+
+    payload = {
+        "options": {
+            "to_formats": ["json"],
+        },
+        "file_sources": [{"base64_string": encoded_doc, "filename": doc_filename.name}],
+    }
+
+    response = await client.post("/v1alpha/convert/source/async", json=payload)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    print(json.dumps(task, indent=2))
+
+    while task["task_status"] not in ("success", "failure"):
+        response = await client.get(f"/v1alpha/status/poll/{task['task_id']}")
+        assert response.status_code == 200, "Response should be 200 OK"
+        task = response.json()
+        print(f"{task['task_status']=}")
+        print(f"{task['task_position']=}")
+
+        await asyncio.sleep(2)
+
+    assert task["task_status"] == "success"
+
+    return task
+
+
+@pytest.mark.asyncio
+async def test_clear_results(client: AsyncClient):
+    """Test removal of task."""
+
+    # Set long delay deletion
+    docling_serve_settings.result_removal_delay = 100
+
+    # Convert and wait for completion
+    task = await convert_file(client)
+
+    # Get result once
+    result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result 1 ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    # Get result twice
+    result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result 2 ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    # Clear
+    clear_response = await client.get("/v1alpha/clear/results?older_then=0")
+    assert clear_response.status_code == 200, "Response should be 200 OK"
+    print("Clear ok.")
+
+    # Get deleted result
+    result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
+    assert result_response.status_code == 404, "Response should be removed"
+    print("Result was no longer found.")
+
+
+@pytest.mark.asyncio
+async def test_delay_remove(client: AsyncClient):
+    """Test automatic removal of task with delay."""
+
+    # Set short delay deletion
+    docling_serve_settings.result_removal_delay = 5
+
+    # Convert and wait for completion
+    task = await convert_file(client)
+
+    # Get result once
+    result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    print("Sleeping to wait the automatic task deletion.")
+    await asyncio.sleep(10)
+
+    # Get deleted result
+    result_response = await client.get(f"/v1alpha/result/{task['task_id']}")
+    assert result_response.status_code == 404, "Response should be removed"
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
github-actions[bot]	767ce0982b	chore: bump version to 0.16.1 [skip ci]	2025-07-07 16:17:50 +00:00
Michele Dolfi	bfde1a0991	fix: upgrade deps including, docling v2.40.0 with locks in models init (#264 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-07 17:13:45 +02:00
VIktor Kuropiantnyk	eb3892ee14	fix: missing tesseract osd (#263 ) Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-07-07 16:36:43 +02:00
tassadarliu	93b84712b2	docs: fix typo (#259 ) Signed-off-by: tassadarliu <rhapsodyn@gmail.com>	2025-07-07 08:47:34 +02:00
Yishen Miao	c45b937064	docs: change the doc example (#258 ) Signed-off-by: Yishen Miao <mys721tx@gmail.com>	2025-07-07 08:47:21 +02:00
Francisco Arceo	50e431f30f	docs: Update typo (#247 ) Signed-off-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-06-27 16:58:37 +02:00
Michele Dolfi	149a8cb1c0	fix: properly load models at boot (#244 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-27 12:20:38 +02:00
github-actions[bot]	5f9c20a985	chore: bump version to 0.16.0 [skip ci]	2025-06-25 09:52:08 +00:00
Michele Dolfi	80755a7d59	docs: Update example resources and improve README (#231 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-25 07:56:14 +02:00
Michele Dolfi	30aca92298	feat: package updates and more cuda images (#229 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-24 16:59:05 +02:00
github-actions[bot]	717fb3a8d8	chore: bump version to 0.15.0 [skip ci]	2025-06-17 15:00:38 +00:00
Michele Dolfi	873d05aefe	feat: use redocs and scalar as api docs (#228 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 16:54:00 +02:00
Ryan Fernandes	196c5ce42a	fix: "tesserocr" instead of "tesseract_cli" in usage docs (#223 ) Signed-off-by: Ryan Fernandes <ryan@fernandes.us>	2025-06-17 16:53:51 +02:00
github-actions[bot]	b5c5f47892	chore: bump version to 0.14.0 [skip ci]	2025-06-17 13:10:27 +00:00
23Ro	d5455b7f66	fix: Typo in Headline (#220 ) Signed-off-by: 23Ro <m.n@23ro.de>	2025-06-17 14:55:27 +02:00
Michele Dolfi	7a682494d6	chore: dco advisor (#224 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 09:38:56 +02:00
Eugene	524f6a8997	feat: Read supported file extensions from docling (#214 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-06-05 09:38:28 +02:00
github-actions[bot]	9ccf8e3b5e	chore: bump version to 0.13.0 [skip ci]	2025-06-04 12:24:40 +00:00
Michele Dolfi	ffea34732b	feat: upgrade docling to 2.36 (#212 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-04 14:20:34 +02:00
github-actions[bot]	b299af002b	chore: bump version to 0.12.0 [skip ci]	2025-06-03 16:30:28 +00:00
Michele Dolfi	c4c41f16df	feat: Export annotations in markdown and html (Docling upgrade) (#202 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 18:24:27 +02:00
Michele Dolfi	7066f3520a	fix: processing complex params in multipart-form (#210 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 18:24:05 +02:00
Rui Dias Gomes	6a8190c315	docs: add openshift replicasets examples (#209 ) Signed-off-by: Rui-Dias-Gomes <rui.dias.gomes@ibm.com> Co-authored-by: Rui-Dias-Gomes <rui.dias.gomes@ibm.com>	2025-06-03 17:43:41 +02:00
github-actions[bot]	060ecd8b0e	chore: bump version to 0.11.0 [skip ci]	2025-05-23 13:45:54 +00:00
Michele Dolfi	32b8a809f3	feat: page break placeholder in markdown exports options (#194 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-23 15:26:27 +02:00
Michele Dolfi	de002dfcdc	feat: clear results registry (#192 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-23 14:30:57 +02:00
Michele Dolfi	abe5aa03f5	feat: Upgrade to Docling 2.33.0 (#198 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-22 17:00:29 +02:00
VIktor Kuropiantnyk	3f090b7d15	docs: Example and instructions on how to load model weights to persistent volume (#197 ) Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-05-21 13:04:46 +02:00
Michele Dolfi	21c1791e42	docs: async api usage and fixes (#195 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-19 13:57:35 +02:00
Michele Dolfi	00be428490	feat: api to trigger offloading the models (#188 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-14 15:02:18 +02:00
Kasper Dinkla	3ff1b2f983	feat: Figure annotations @ docling components 0.0.7 (#181 ) Signed-off-by: DKL <dkl@zurich.ibm.com>	2025-05-08 16:31:10 +02:00
Michele Dolfi	8406fb9b59	fix: usage of hashlib for FIPS (#171 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-02 15:00:10 +02:00