chore: bump version to 1.1.0 [skip ci]

feat: Add docling-mcp in the distribution (#290 )
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2025-11-29 16:43:24 +00:00 · 2025-07-30 15:53:01 +00:00 · 2025-07-30 15:39:11 +02:00 · 2025-07-30 14:49:26 +02:00 · 2025-07-30 14:08:59 +02:00 · 2025-07-29 14:44:49 +02:00
68 changed files with 9315 additions and 3813 deletions
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -20,4 +20,4 @@ After the initial reply to your report, the security team will keep you informed

 ## Security Alerts

-We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/DS4SD/docling/discussions/categories/announcements).
+We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/docling-project/docling/discussions/categories/announcements).
--- a/.github/dco.yml
+++ b/.github/dco.yml
@@ -0,0 +1,2 @@
+allowRemediationCommits:
+  individual: true
--- a/.github/styles/config/vocabularies/Docling/accept.txt
+++ b/.github/styles/config/vocabularies/Docling/accept.txt
@@ -0,0 +1,35 @@
+[Dd]ocling
+precommit
+asgi
+async
+(?i)urls
+uvicorn
+[Ww]ebserver
+keyfile
+[Ww]ebsocket(s?)
+[Kk]ubernetes
+UI
+(?i)vllm
+APIs
+[Ss]ubprocesses
+(?i)api
+Kubeflow
+(?i)Jobkit
+(?i)cpu
+(?i)PyTorch
+(?i)CUDA
+(?i)NVIDIA
+(?i)env
+Gradio
+bool
+Ollama
+inbody
+LGTMs
+Dolfi
+Lysak
+Nikos
+Nassar
+Panos
+Vagenas
+Staar
+Livathinos
--- a/.github/vale.ini
+++ b/.github/vale.ini
@@ -0,0 +1,11 @@
+StylesPath = styles
+MinAlertLevel = suggestion
+; Packages = write-good, proselint
+
+Vocab = Docling
+
+[*.md]
+BasedOnStyles = Vale
+
+[CHANGELOG.md]
+BasedOnStyles = 
--- a/.github/workflows/ci-images-dryrun.yml
+++ b/.github/workflows/ci-images-dryrun.yml
@@ -13,17 +13,25 @@ jobs:
    strategy:
      matrix:
        spec:
-          - name: ds4sd/docling-serve
+          - name: docling-project/docling-serve
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-extra flash-attn
            platforms: linux/amd64, linux/arm64
-          - name: ds4sd/docling-serve-cpu
+          - name: docling-project/docling-serve-cpu
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn
            platforms: linux/amd64, linux/arm64
-          - name: ds4sd/docling-serve-cu124
+          - name: docling-project/docling-serve-cu124
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu126
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu128
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128
            platforms: linux/amd64

    permissions:
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -8,7 +8,7 @@ on:

 jobs:
  code-checks:
-    # if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling-serve' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling-serve') }}
+    # if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'docling-project/docling-serve' && github.event.pull_request.head.repo.full_name != 'docling-project/docling-serve') }}
    uses: ./.github/workflows/job-checks.yml
    permissions:
      packages: write
--- a/.github/workflows/dco-advisor.yml
+++ b/.github/workflows/dco-advisor.yml
@@ -0,0 +1,192 @@
+name: DCO Advisor Bot
+
+on:
+  pull_request_target:
+    types: [opened, reopened, synchronize]
+
+permissions:
+  pull-requests: write
+  issues: write
+
+jobs:
+  dco_advisor:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Handle DCO check result
+        uses: actions/github-script@v7
+        with:
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          script: |
+            const pr = context.payload.pull_request || context.payload.check_run?.pull_requests?.[0];
+            if (!pr) return;
+
+            const prNumber = pr.number;
+            const baseRef = pr.base.ref;
+            const headSha =
+              context.payload.check_run?.head_sha ||
+              pr.head?.sha;
+            const username = pr.user.login;
+
+            console.log("HEAD SHA:", headSha);
+
+            const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
+
+            // Poll until DCO check has a conclusion (max 6 attempts, 30s)
+            let dcoCheck = null;
+            for (let attempt = 0; attempt < 6; attempt++) {
+              const { data: checks } = await github.rest.checks.listForRef({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                ref: headSha
+              });
+
+              
+              console.log("All check runs:");
+                checks.check_runs.forEach(run => {
+                console.log(`- ${run.name} (${run.status}/${run.conclusion}) @ ${run.head_sha}`);
+              });
+
+              dcoCheck = checks.check_runs.find(run =>
+                run.name.toLowerCase().includes("dco") &&
+              !run.name.toLowerCase().includes("dco_advisor") &&
+                run.head_sha === headSha
+              );
+
+
+              if (dcoCheck?.conclusion) break;
+              console.log(`Waiting for DCO check... (${attempt + 1})`);
+              await sleep(5000); // wait 5 seconds
+            }
+
+            if (!dcoCheck || !dcoCheck.conclusion) {
+              console.log("DCO check did not complete in time.");
+              return;
+            }
+
+            const isFailure = ["failure", "action_required"].includes(dcoCheck.conclusion);
+            console.log(`DCO check conclusion for ${headSha}: ${dcoCheck.conclusion} (treated as ${isFailure ? "failure" : "success"})`);
+
+            // Parse DCO output for commit SHAs and author
+            let badCommits = [];
+            let authorName = "";
+            let authorEmail = "";
+            let moreInfo = `More info: [DCO check report](${dcoCheck?.html_url})`;
+
+            if (isFailure) {
+                const { data: commits } = await github.rest.pulls.listCommits({
+                    owner: context.repo.owner,
+                    repo: context.repo.repo,
+                    pull_number: prNumber,
+                });
+
+                for (const commit of commits) {
+                    const commitMessage = commit.commit.message;
+                    const signoffMatch = commitMessage.match(/^Signed-off-by:\s+.+<.+>$/m);
+                    if (!signoffMatch) {
+                        console.log(`Bad commit found ${commit.sha}`)
+                        badCommits.push({
+                        sha: commit.sha,
+                        authorName: commit.commit.author.name,
+                        authorEmail: commit.commit.author.email,
+                        });
+                    }
+                }            
+            }
+
+            // If multiple authors are present, you could adapt the message accordingly
+            // For now, we'll just use the first one
+            if (badCommits.length > 0) {
+            authorName = badCommits[0].authorName;
+            authorEmail = badCommits[0].authorEmail;
+            }
+
+            // Generate remediation commit message if needed
+            let remediationSnippet = "";
+            if (badCommits.length && authorEmail) {
+              remediationSnippet = `git commit --allow-empty -s -m "DCO Remediation Commit for ${authorName} <${authorEmail}>\n\n` +
+                badCommits.map(c => `I, ${c.authorName} <${c.authorEmail}>, hereby add my Signed-off-by to this commit: ${c.sha}`).join('\n') +
+                `"`;
+            } else {
+              remediationSnippet = "# Unable to auto-generate remediation message. Please check the DCO check details.";
+            }
+
+            // Build comment
+            const commentHeader = '<!-- dco-advice-bot -->';
+            let body = "";
+
+            if (isFailure) {
+              body = [
+                commentHeader,
+                '❌ **DCO Check Failed**',
+                '',
+                `Hi @${username}, your pull request has failed the Developer Certificate of Origin (DCO) check.`,
+                '',
+                'This repository supports **remediation commits**, so you can fix this without rewriting history — but you must follow the required message format.',
+                '',
+                '---',
+                '',
+                '### 🛠 Quick Fix: Add a remediation commit',
+                'Run this command:',
+                '',
+                '```bash',
+                remediationSnippet,
+                'git push',
+                '```',
+                '',
+                '---',
+                '',
+                '<details>',
+                '<summary>🔧 Advanced: Sign off each commit directly</summary>',
+                '',
+                '**For the latest commit:**',
+                '```bash',
+                'git commit --amend --signoff',
+                'git push --force-with-lease',
+                '```',
+                '',
+                '**For multiple commits:**',
+                '```bash',
+                `git rebase --signoff origin/${baseRef}`,
+                'git push --force-with-lease',
+                '```',
+                '',
+                '</details>',
+                '',
+                moreInfo
+              ].join('\n');
+            } else {
+              body = [
+                commentHeader,
+                '✅ **DCO Check Passed**',
+                '',
+                `Thanks @${username}, all your commits are properly signed off. 🎉`
+              ].join('\n');
+            }
+
+            // Get existing comments on the PR
+            const { data: comments } = await github.rest.issues.listComments({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: prNumber
+            });
+
+            // Look for a previous bot comment
+            const existingComment = comments.find(c =>
+              c.body.includes("<!-- dco-advice-bot -->")
+            );
+
+            if (existingComment) {
+              await github.rest.issues.updateComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                comment_id: existingComment.id,
+                body: body
+              });
+            } else {
+              await github.rest.issues.createComment({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                issue_number: prNumber,
+                body: body
+              });
+            }
--- a/.github/workflows/images.yml
+++ b/.github/workflows/images.yml
@@ -17,17 +17,25 @@ jobs:
    strategy:
      matrix:
        spec:
-          - name: ds4sd/docling-serve
+          - name: docling-project/docling-serve
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-extra flash-attn
            platforms: linux/amd64, linux/arm64
-          - name: ds4sd/docling-serve-cpu
+          - name: docling-project/docling-serve-cpu
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cu124
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn
            platforms: linux/amd64, linux/arm64
-          - name: ds4sd/docling-serve-cu124
+          - name: docling-project/docling-serve-cu124
            build_args: |
-              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu126
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126
+            platforms: linux/amd64
+          - name: docling-project/docling-serve-cu128
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128
            platforms: linux/amd64

    permissions:
--- a/.github/workflows/job-build.yml
+++ b/.github/workflows/job-build.yml
@@ -17,7 +17,7 @@ jobs:
          python-version: ${{ matrix.python-version }}
          enable-cache: true
      - name: Install dependencies
-        run: uv sync --all-extras --no-extra cu124
+        run: uv sync --all-extras --no-extra flash-attn
      - name: Build package
        run: uv build
      - name: Check content of wheel
--- a/.github/workflows/job-checks.yml
+++ b/.github/workflows/job-checks.yml
@@ -25,7 +25,7 @@ jobs:
          key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }}

      - name: Install dependencies
-        run: uv sync --frozen --all-extras --no-extra cu124
+        run: uv sync --frozen --all-extras --no-extra flash-attn

      - name: Run styling check
        run: pre-commit run --all-files
--- a/.gitignore
+++ b/.gitignore
@@ -444,3 +444,5 @@ pip-selfcheck.json
 # Makefile
 .action-lint
 .markdown-lint
+
+cookies.txt
--- a/.markdownlint-cli2.yaml
+++ b/.markdownlint-cli2.yaml
@@ -3,7 +3,7 @@ config:
  no-emphasis-as-header: false
  first-line-heading: false
  MD033:
-    allowed_elements: ["details", "summary"]
+    allowed_elements: ["details", "summary", "br", "a", "b", "p", "img"]
  MD024:
    siblings_only: true
 globs:
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -5,10 +5,14 @@ repos:
    hooks:
      # Run the Ruff formatter.
      - id: ruff-format
+        name: "Ruff formatter"
        args: [--config=pyproject.toml]
+        files: '^(docling_serve|tests).*\.(py|ipynb)$'
      # Run the Ruff linter.
      - id: ruff
+        name: "Ruff linter"
        args: [--exit-non-zero-on-fix, --fix, --config=pyproject.toml]
+        files: '^(docling_serve|tests).*\.(py|ipynb)$'
  - repo: local
    hooks:
      - id: system
@@ -17,8 +21,19 @@ repos:
        pass_filenames: false
        language: system
        files: '\.py$'
+  - repo: https://github.com/errata-ai/vale
+    rev: v3.12.0  # Use latest stable version
+    hooks:
+      - id: vale
+        name: vale sync
+        pass_filenames: false
+        args: [sync, "--config=.github/vale.ini"]
+      - id: vale
+        name: Spell and Style Check with Vale
+        args: ["--config=.github/vale.ini"]
+        files: \.md$
  - repo: https://github.com/astral-sh/uv-pre-commit
    # uv version.
-    rev: 0.6.1
+    rev: 0.7.13
    hooks:
      - id: uv-lock
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,36 +1,238 @@
-## [v0.5.1](https://github.com/DS4SD/docling-serve/releases/tag/v0.5.1) - 2025-03-10
-
-### Fix
-
-* Submodules in wheels ([#85](https://github.com/DS4SD/docling-serve/issues/85)) ([`a92ad48`](https://github.com/DS4SD/docling-serve/commit/a92ad48b287bfcb134011dc0fc3f91ee04e067ee))
-
-## [v0.5.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.5.0) - 2025-03-07
+## [v1.1.0](https://github.com/docling-project/docling-serve/releases/tag/v1.1.0) - 2025-07-30

 ### Feature

-* Async api ([#60](https://github.com/DS4SD/docling-serve/issues/60)) ([`82f8900`](https://github.com/DS4SD/docling-serve/commit/82f890019745859699c1b01f9ccfb64cb7e37906))
-* Display version in fastapi docs ([#78](https://github.com/DS4SD/docling-serve/issues/78)) ([`ed851c9`](https://github.com/DS4SD/docling-serve/commit/ed851c95fee5f59305ddc3dcd5c09efce618470b))
+* Add docling-mcp in the distribution ([#290](https://github.com/docling-project/docling-serve/issues/290)) ([`ecb1874`](https://github.com/docling-project/docling-serve/commit/ecb1874a507bef83d102e0e031e49fed34298637))
+* Add 3.0 openapi endpoint ([#287](https://github.com/docling-project/docling-serve/issues/287)) ([`ec594d8`](https://github.com/docling-project/docling-serve/commit/ec594d84fe36df23e7d010a2fcf769856c43600b))
+* Add new source and target ([#270](https://github.com/docling-project/docling-serve/issues/270)) ([`3771c1b`](https://github.com/docling-project/docling-serve/commit/3771c1b55403bd51966d07d8f760d5c4fbcc1760))

 ### Fix

-* Remove uv from image, merge ARG and ENV declarations ([#57](https://github.com/DS4SD/docling-serve/issues/57)) ([`c95db36`](https://github.com/DS4SD/docling-serve/commit/c95db3643807a4dfb96d93c8e10d6eb486c49a30))
-* **docs:** Remove comma in convert/source curl example ([#73](https://github.com/DS4SD/docling-serve/issues/73)) ([`05df073`](https://github.com/DS4SD/docling-serve/commit/05df0735d35a589bdc2a11fcdd764a10f700cb6f))
+* Referenced paths relative to zip root ([#289](https://github.com/docling-project/docling-serve/issues/289)) ([`1333f71`](https://github.com/docling-project/docling-serve/commit/1333f71c9c6495342b2169d574e921f828446f15))

-## [v0.4.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.4.0) - 2025-02-26
-
-### Feature
-
-* New container images ([#68](https://github.com/DS4SD/docling-serve/issues/68)) ([`7e6d9cd`](https://github.com/DS4SD/docling-serve/commit/7e6d9cdef398df70a5b4d626aeb523c428c10d56))
-* Render DoclingDocument with npm docling-components in the example UI ([#65](https://github.com/DS4SD/docling-serve/issues/65)) ([`c430d9b`](https://github.com/DS4SD/docling-serve/commit/c430d9b1a162ab29104d86ebaa1ac5a5488b1f09))
-
-## [v0.3.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.3.0) - 2025-02-19
-
-### Feature
-
-* Add new docling-serve cli ([#50](https://github.com/DS4SD/docling-serve/issues/50)) ([`ec33a61`](https://github.com/DS4SD/docling-serve/commit/ec33a61faa7846b9b7998fbf557ebe39a3b800f6))
+## [v1.0.1](https://github.com/docling-project/docling-serve/releases/tag/v1.0.1) - 2025-07-21

 ### Fix

-* Set DOCLING_SERVE_ARTIFACTS_PATH in images ([#53](https://github.com/DS4SD/docling-serve/issues/53)) ([`4877248`](https://github.com/DS4SD/docling-serve/commit/487724836896576ca4f98e84abf15fd1c383bec8))
-* Set root UI path when behind proxy ([#38](https://github.com/DS4SD/docling-serve/issues/38)) ([`c64a450`](https://github.com/DS4SD/docling-serve/commit/c64a450bf9ba9947ab180e92bef2763ff710b210))
-* Support python 3.13 and docling updates and switch to uv ([#48](https://github.com/DS4SD/docling-serve/issues/48)) ([`ae3b490`](https://github.com/DS4SD/docling-serve/commit/ae3b4906f1c0829b1331ea491f3518741cabff71))
+* Docling update v2.42.0 ([#277](https://github.com/docling-project/docling-serve/issues/277)) ([`8706706`](https://github.com/docling-project/docling-serve/commit/8706706e8797b0a06ec4baa7cf87988311be68b6))
+
+### Documentation
+
+* Typo in README ([#276](https://github.com/docling-project/docling-serve/issues/276)) ([`766adb2`](https://github.com/docling-project/docling-serve/commit/766adb248113c7bd5144d14b3c82929a2ad29f8e))
+
+## [v1.0.0](https://github.com/docling-project/docling-serve/releases/tag/v1.0.0) - 2025-07-14
+
+### Feature
+
+* V1 api with list of sources and target ([#249](https://github.com/docling-project/docling-serve/issues/249)) ([`56e328b`](https://github.com/docling-project/docling-serve/commit/56e328baf76b4bb0476fc6ca820b52034e4f97bf))
+* Use orchestrators from jobkit ([#248](https://github.com/docling-project/docling-serve/issues/248)) ([`daa924a`](https://github.com/docling-project/docling-serve/commit/daa924a77e56d063ef17347dfd8a838872a70529))
+
+### Breaking
+
+* v1 api with list of sources and target ([#249](https://github.com/docling-project/docling-serve/issues/249)) ([`56e328b`](https://github.com/docling-project/docling-serve/commit/56e328baf76b4bb0476fc6ca820b52034e4f97bf))
+* use orchestrators from jobkit ([#248](https://github.com/docling-project/docling-serve/issues/248)) ([`daa924a`](https://github.com/docling-project/docling-serve/commit/daa924a77e56d063ef17347dfd8a838872a70529))
+
+## [v0.16.1](https://github.com/docling-project/docling-serve/releases/tag/v0.16.1) - 2025-07-07
+
+### Fix
+
+* Upgrade deps including, docling v2.40.0 with locks in models init ([#264](https://github.com/docling-project/docling-serve/issues/264)) ([`bfde1a0`](https://github.com/docling-project/docling-serve/commit/bfde1a0991c2da53b72c4f131ff74fa10f6340de))
+* Missing tesseract osd ([#263](https://github.com/docling-project/docling-serve/issues/263)) ([`eb3892e`](https://github.com/docling-project/docling-serve/commit/eb3892ee141eb2c941d580b095d8a266f2d2610c))
+* Properly load models at boot ([#244](https://github.com/docling-project/docling-serve/issues/244)) ([`149a8cb`](https://github.com/docling-project/docling-serve/commit/149a8cb1c0a16c1e0b7d17f40b88b4d6e8f0109d))
+
+### Documentation
+
+* Fix typo ([#259](https://github.com/docling-project/docling-serve/issues/259)) ([`93b8471`](https://github.com/docling-project/docling-serve/commit/93b84712b2c6d180908a197847b52b217a7ff05f))
+* Change the doc example ([#258](https://github.com/docling-project/docling-serve/issues/258)) ([`c45b937`](https://github.com/docling-project/docling-serve/commit/c45b93706466a073ab4a5c75aa8a267110873e26))
+* Update typo ([#247](https://github.com/docling-project/docling-serve/issues/247)) ([`50e431f`](https://github.com/docling-project/docling-serve/commit/50e431f30fbffa33f43727417fe746d20cbb9d6b))
+
+## [v0.16.0](https://github.com/docling-project/docling-serve/releases/tag/v0.16.0) - 2025-06-25
+
+### Feature
+
+* Package updates and more cuda images ([#229](https://github.com/docling-project/docling-serve/issues/229)) ([`30aca92`](https://github.com/docling-project/docling-serve/commit/30aca92298ab0d86bb4debcfcacb2dd8b9040a27))
+
+### Documentation
+
+* Update example resources and improve README ([#231](https://github.com/docling-project/docling-serve/issues/231)) ([`80755a7`](https://github.com/docling-project/docling-serve/commit/80755a7d5955f7d0c53df8e558fdd852dd1f5b75))
+
+## [v0.15.0](https://github.com/docling-project/docling-serve/releases/tag/v0.15.0) - 2025-06-17
+
+### Feature
+
+* Use redocs and scalar as api docs ([#228](https://github.com/docling-project/docling-serve/issues/228)) ([`873d05a`](https://github.com/docling-project/docling-serve/commit/873d05aefe141c63b9c1cf53b23b4fa8c96de05d))
+
+### Fix
+
+* "tesserocr" instead of "tesseract_cli" in usage docs ([#223](https://github.com/docling-project/docling-serve/issues/223)) ([`196c5ce`](https://github.com/docling-project/docling-serve/commit/196c5ce42a04d77234a4212c3d9b9772d2c2073e))
+
+## [v0.14.0](https://github.com/docling-project/docling-serve/releases/tag/v0.14.0) - 2025-06-17
+
+### Feature
+
+* Read supported file extensions from docling ([#214](https://github.com/docling-project/docling-serve/issues/214)) ([`524f6a8`](https://github.com/docling-project/docling-serve/commit/524f6a8997b86d2f869ca491ec8fb40585b42ca4))
+
+### Fix
+
+* Typo in Headline ([#220](https://github.com/docling-project/docling-serve/issues/220)) ([`d5455b7`](https://github.com/docling-project/docling-serve/commit/d5455b7f66de39ea1f8b8927b5968d2baa23ca88))
+
+## [v0.13.0](https://github.com/docling-project/docling-serve/releases/tag/v0.13.0) - 2025-06-04
+
+### Feature
+
+* Upgrade docling to 2.36 ([#212](https://github.com/docling-project/docling-serve/issues/212)) ([`ffea347`](https://github.com/docling-project/docling-serve/commit/ffea34732b24fdd438fabd6df02d3d9ce66b4534))
+
+## [v0.12.0](https://github.com/docling-project/docling-serve/releases/tag/v0.12.0) - 2025-06-03
+
+### Feature
+
+* Export annotations in markdown and html (Docling upgrade) ([#202](https://github.com/docling-project/docling-serve/issues/202)) ([`c4c41f1`](https://github.com/docling-project/docling-serve/commit/c4c41f16dff83c5d2a0b8a4c625b5de19b36b7c5))
+
+### Fix
+
+* Processing complex params in multipart-form ([#210](https://github.com/docling-project/docling-serve/issues/210)) ([`7066f35`](https://github.com/docling-project/docling-serve/commit/7066f3520a88c07df1c80a0cc6c4339eaac4d6a7))
+
+### Documentation
+
+* Add openshift replicasets examples ([#209](https://github.com/docling-project/docling-serve/issues/209)) ([`6a8190c`](https://github.com/docling-project/docling-serve/commit/6a8190c315792bd1e0e2b0af310656baaa5551e5))
+
+## [v0.11.0](https://github.com/docling-project/docling-serve/releases/tag/v0.11.0) - 2025-05-23
+
+### Feature
+
+* Page break placeholder in markdown exports options ([#194](https://github.com/docling-project/docling-serve/issues/194)) ([`32b8a80`](https://github.com/docling-project/docling-serve/commit/32b8a809f348bf9fbde657f93589a56935d3749d))
+* Clear results registry ([#192](https://github.com/docling-project/docling-serve/issues/192)) ([`de002df`](https://github.com/docling-project/docling-serve/commit/de002dfcdc111c942a08b156c84b7fa22b3fbaf3))
+* Upgrade to Docling 2.33.0 ([#198](https://github.com/docling-project/docling-serve/issues/198)) ([`abe5aa0`](https://github.com/docling-project/docling-serve/commit/abe5aa03f54d44ecf5c6d76e3258028997a53e68))
+* Api to trigger offloading the models ([#188](https://github.com/docling-project/docling-serve/issues/188)) ([`00be428`](https://github.com/docling-project/docling-serve/commit/00be4284904d55b78c75c5475578ef11c2ade94c))
+* Figure annotations @ docling components 0.0.7 ([#181](https://github.com/docling-project/docling-serve/issues/181)) ([`3ff1b2f`](https://github.com/docling-project/docling-serve/commit/3ff1b2f9834aca37472a895a0e3da47560457d77))
+
+### Fix
+
+* Usage of hashlib for FIPS ([#171](https://github.com/docling-project/docling-serve/issues/171)) ([`8406fb9`](https://github.com/docling-project/docling-serve/commit/8406fb9b59d83247b8379974cabed497703dfc4d))
+
+### Documentation
+
+* Example and instructions on how to load model weights to persistent volume ([#197](https://github.com/docling-project/docling-serve/issues/197)) ([`3f090b7`](https://github.com/docling-project/docling-serve/commit/3f090b7d15eaf696611d89bbbba5b98569610828))
+* Async api usage and fixes ([#195](https://github.com/docling-project/docling-serve/issues/195)) ([`21c1791`](https://github.com/docling-project/docling-serve/commit/21c1791e427f5b1946ed46c68dfda03c957dca8f))
+
+## [v0.10.1](https://github.com/docling-project/docling-serve/releases/tag/v0.10.1) - 2025-04-30
+
+### Fix
+
+* Avoid missing specialized keys in the options hash ([#166](https://github.com/docling-project/docling-serve/issues/166)) ([`36787bc`](https://github.com/docling-project/docling-serve/commit/36787bc0616356a6199da618d8646de51636b34e))
+* Allow users to set the area threshold for picture descriptions ([#165](https://github.com/docling-project/docling-serve/issues/165)) ([`509f488`](https://github.com/docling-project/docling-serve/commit/509f4889f8ed4c0f0ce25bec4126ef1f1199797c))
+* Expose max wait time in sync endpoints ([#164](https://github.com/docling-project/docling-serve/issues/164)) ([`919cf5c`](https://github.com/docling-project/docling-serve/commit/919cf5c0414f2f11eb8012f451fed7a8f582b7ad))
+* Add flash-attn for cuda images ([#161](https://github.com/docling-project/docling-serve/issues/161)) ([`35c2630`](https://github.com/docling-project/docling-serve/commit/35c2630c613cf229393fc67b6938152b063ff498))
+
+## [v0.10.0](https://github.com/docling-project/docling-serve/releases/tag/v0.10.0) - 2025-04-28
+
+### Feature
+
+* Add support for file upload and return as file in async endpoints ([#152](https://github.com/docling-project/docling-serve/issues/152)) ([`c65f3c6`](https://github.com/docling-project/docling-serve/commit/c65f3c654c76c6b64b6aada1f0a153d74789d629))
+
+### Documentation
+
+* Fix new default pdf_backend ([#158](https://github.com/docling-project/docling-serve/issues/158)) ([`829effe`](https://github.com/docling-project/docling-serve/commit/829effec1a1b80320ccaf2c501be8015169b6fa3))
+* Fixing small typo in docs ([#155](https://github.com/docling-project/docling-serve/issues/155)) ([`14bafb2`](https://github.com/docling-project/docling-serve/commit/14bafb26286b94f80b56846c50d6e9a6d99a9763))
+
+## [v0.9.0](https://github.com/docling-project/docling-serve/releases/tag/v0.9.0) - 2025-04-25
+
+### Feature
+
+* Expose picture description options ([#148](https://github.com/docling-project/docling-serve/issues/148)) ([`4c9571a`](https://github.com/docling-project/docling-serve/commit/4c9571a052d5ec0044e49225bc5615e13cdb0a56))
+* Add parameters for Kubeflow pipeline engine (WIP) ([#107](https://github.com/docling-project/docling-serve/issues/107)) ([`26bef5b`](https://github.com/docling-project/docling-serve/commit/26bef5bec060f0afd8d358816b68c3f2c0dd4bc2))
+
+### Fix
+
+* Produce image artifacts in referenced mode ([#151](https://github.com/docling-project/docling-serve/issues/151)) ([`71c5fae`](https://github.com/docling-project/docling-serve/commit/71c5fae505366459fd481d2ecdabc5ebed94d49c))
+
+### Documentation
+
+* Vlm and picture description options ([#149](https://github.com/docling-project/docling-serve/issues/149)) ([`91956cb`](https://github.com/docling-project/docling-serve/commit/91956cbf4e91cf82bb4d54ace397cdbbfaf594ba))
+
+## [v0.8.0](https://github.com/docling-project/docling-serve/releases/tag/v0.8.0) - 2025-04-22
+
+### Feature
+
+* Add option for vlm pipeline ([#143](https://github.com/docling-project/docling-serve/issues/143)) ([`ee89ee4`](https://github.com/docling-project/docling-serve/commit/ee89ee4daee5e916bd6a3bdb452f78934cd03f60))
+* Expose more conversion options ([#142](https://github.com/docling-project/docling-serve/issues/142)) ([`6b3d281`](https://github.com/docling-project/docling-serve/commit/6b3d281f02905c195ab75f25bb39f5c4d4e7b680))
+* **UI:** Change UI to use async endpoints ([#131](https://github.com/docling-project/docling-serve/issues/131)) ([`b598872`](https://github.com/docling-project/docling-serve/commit/b598872e5c48928ac44417a11bb7acc0e5c3f0c6))
+
+### Fix
+
+* **UI:** Use https when calling the api ([#139](https://github.com/docling-project/docling-serve/issues/139)) ([`57f9073`](https://github.com/docling-project/docling-serve/commit/57f9073bc0daf72428b068ea28e2bec7cd76c37b))
+* Fix permissions in docker image ([#136](https://github.com/docling-project/docling-serve/issues/136)) ([`c1ce471`](https://github.com/docling-project/docling-serve/commit/c1ce4719c933179ba3c59d73d0584853bbd6fa6a))
+* Picture caption visuals ([#129](https://github.com/docling-project/docling-serve/issues/129)) ([`5dfb75d`](https://github.com/docling-project/docling-serve/commit/5dfb75d3b9a7022d1daad12edbb8ec7bbf9aa264))
+
+### Documentation
+
+* Fix required permissions for oauth2-proxy requests ([#141](https://github.com/docling-project/docling-serve/issues/141)) ([`087417e`](https://github.com/docling-project/docling-serve/commit/087417e5c2387d4ed95500222058f34d8a8702aa))
+* Update deployment examples ([#135](https://github.com/docling-project/docling-serve/issues/135)) ([`525a43f`](https://github.com/docling-project/docling-serve/commit/525a43ff6f04b7cc80f9dd6a0e653a8d8c4ab317))
+* Fix image tag ([#124](https://github.com/docling-project/docling-serve/issues/124)) ([`420162e`](https://github.com/docling-project/docling-serve/commit/420162e674cc38b4c3c13673ffbee4c20a1b15f1))
+
+## [v0.7.0](https://github.com/docling-project/docling-serve/releases/tag/v0.7.0) - 2025-03-31
+
+### Feature
+
+* Expose TLS settings and example deploy with oauth-proxy ([#112](https://github.com/docling-project/docling-serve/issues/112)) ([`7a0faba`](https://github.com/docling-project/docling-serve/commit/7a0fabae07020c2659dbb22c3b0359909051a74c))
+* Offline static files ([#109](https://github.com/docling-project/docling-serve/issues/109)) ([`68772bb`](https://github.com/docling-project/docling-serve/commit/68772bb6f0a87b71094a08ff851f5754c6ca6163))
+* Update to Docling 2.28 ([#106](https://github.com/docling-project/docling-serve/issues/106)) ([`20ec87a`](https://github.com/docling-project/docling-serve/commit/20ec87a63a99145bc0ad7931549af8a0c30db641))
+
+### Fix
+
+* Move ARGs to prevent cache invalidation ([#104](https://github.com/docling-project/docling-serve/issues/104)) ([`e30f458`](https://github.com/docling-project/docling-serve/commit/e30f458923d34c169db7d5a5c296848716e8cac4))
+
+## [v0.6.0](https://github.com/docling-project/docling-serve/releases/tag/v0.6.0) - 2025-03-17
+
+### Feature
+
+* Expose options for new features ([#92](https://github.com/docling-project/docling-serve/issues/92)) ([`ec57b52`](https://github.com/docling-project/docling-serve/commit/ec57b528ed3f8e7b9604ff4cdf06da3d52c714dd))
+
+### Fix
+
+* Allow changes in CORS settings ([#100](https://github.com/docling-project/docling-serve/issues/100)) ([`422c402`](https://github.com/docling-project/docling-serve/commit/422c402bab7f05e46274ede11f234a19a62e093e))
+* Avoid exploding options cache using lru and expose size parameter ([#101](https://github.com/docling-project/docling-serve/issues/101)) ([`ea09028`](https://github.com/docling-project/docling-serve/commit/ea090288d3eec4ea8fbdcd32a6a497a99c89189d))
+* Increase timeout_keep_alive and allow parameter changes ([#98](https://github.com/docling-project/docling-serve/issues/98)) ([`07c48ed`](https://github.com/docling-project/docling-serve/commit/07c48edd5d9437219d9623e3d05bc5166c5bb85a))
+* Add warning when using incompatible parameters ([#99](https://github.com/docling-project/docling-serve/issues/99)) ([`a212547`](https://github.com/docling-project/docling-serve/commit/a212547d28d6588c65e52000dc7bc04f3f77e69e))
+* **ui:** Use --port parameter and avoid failing when image is not found ([#97](https://github.com/docling-project/docling-serve/issues/97)) ([`c76daac`](https://github.com/docling-project/docling-serve/commit/c76daac70c87da412f791666881e48b74688b060))
+
+### Documentation
+
+* Simplify README and move details to docs ([#102](https://github.com/docling-project/docling-serve/issues/102)) ([`fd8e40a`](https://github.com/docling-project/docling-serve/commit/fd8e40a00849771263d9b75b9a56f6caeccb8517))
+
+## [v0.5.1](https://github.com/docling-project/docling-serve/releases/tag/v0.5.1) - 2025-03-10
+
+### Fix
+
+* Submodules in wheels ([#85](https://github.com/docling-project/docling-serve/issues/85)) ([`a92ad48`](https://github.com/docling-project/docling-serve/commit/a92ad48b287bfcb134011dc0fc3f91ee04e067ee))
+
+## [v0.5.0](https://github.com/docling-project/docling-serve/releases/tag/v0.5.0) - 2025-03-07
+
+### Feature
+
+* Async api ([#60](https://github.com/docling-project/docling-serve/issues/60)) ([`82f8900`](https://github.com/docling-project/docling-serve/commit/82f890019745859699c1b01f9ccfb64cb7e37906))
+* Display version in fastapi docs ([#78](https://github.com/docling-project/docling-serve/issues/78)) ([`ed851c9`](https://github.com/docling-project/docling-serve/commit/ed851c95fee5f59305ddc3dcd5c09efce618470b))
+
+### Fix
+
+* Remove uv from image, merge ARG and ENV declarations ([#57](https://github.com/docling-project/docling-serve/issues/57)) ([`c95db36`](https://github.com/docling-project/docling-serve/commit/c95db3643807a4dfb96d93c8e10d6eb486c49a30))
+* **docs:** Remove comma in convert/source curl example ([#73](https://github.com/docling-project/docling-serve/issues/73)) ([`05df073`](https://github.com/docling-project/docling-serve/commit/05df0735d35a589bdc2a11fcdd764a10f700cb6f))
+
+## [v0.4.0](https://github.com/docling-project/docling-serve/releases/tag/v0.4.0) - 2025-02-26
+
+### Feature
+
+* New container images ([#68](https://github.com/docling-project/docling-serve/issues/68)) ([`7e6d9cd`](https://github.com/docling-project/docling-serve/commit/7e6d9cdef398df70a5b4d626aeb523c428c10d56))
+* Render DoclingDocument with npm docling-components in the example UI ([#65](https://github.com/docling-project/docling-serve/issues/65)) ([`c430d9b`](https://github.com/docling-project/docling-serve/commit/c430d9b1a162ab29104d86ebaa1ac5a5488b1f09))
+
+## [v0.3.0](https://github.com/docling-project/docling-serve/releases/tag/v0.3.0) - 2025-02-19
+
+### Feature
+
+* Add new docling-serve cli ([#50](https://github.com/docling-project/docling-serve/issues/50)) ([`ec33a61`](https://github.com/docling-project/docling-serve/commit/ec33a61faa7846b9b7998fbf557ebe39a3b800f6))
+
+### Fix
+
+* Set DOCLING_SERVE_ARTIFACTS_PATH in images ([#53](https://github.com/docling-project/docling-serve/issues/53)) ([`4877248`](https://github.com/docling-project/docling-serve/commit/487724836896576ca4f98e84abf15fd1c383bec8))
+* Set root UI path when behind proxy ([#38](https://github.com/docling-project/docling-serve/issues/38)) ([`c64a450`](https://github.com/docling-project/docling-serve/commit/c64a450bf9ba9947ab180e92bef2763ff710b210))
+* Support python 3.13 and docling updates and switch to uv ([#48](https://github.com/docling-project/docling-serve/issues/48)) ([`ae3b490`](https://github.com/docling-project/docling-serve/commit/ae3b4906f1c0829b1331ea491f3518741cabff71))
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -3,13 +3,13 @@
 Our project welcomes external contributions. If you have an itch, please feel
 free to scratch it.

-To contribute code or documentation, please submit a [pull request](https://github.com/DS4SD/docling-serve/pulls).
+To contribute code or documentation, please submit a [pull request](https://github.com/docling-project/docling-serve/pulls).

 A good way to familiarize yourself with the codebase and contribution process is
-to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/DS4SD/docling-serve/issues).
+to look for and tackle low-hanging fruit in the [issue tracker](https://github.com/docling-project/docling-serve/issues).
 Before embarking on a more ambitious contribution, please quickly [get in touch](#communication) with us.

-For general questions or support requests, please refer to the [discussion section](https://github.com/DS4SD/docling-serve/discussions).
+For general questions or support requests, please refer to the [discussion section](https://github.com/docling-project/docling-serve/discussions).

 **Note: We appreciate your effort, and want to avoid a situation where a contribution
 requires extensive rework (by you or by us), sits in backlog for a long time, or
@@ -17,14 +17,14 @@ cannot be accepted at all!**

 ### Proposing new features

-If you would like to implement a new feature, please [raise an issue](https://github.com/DS4SD/docling-serve/issues)
+If you would like to implement a new feature, please [raise an issue](https://github.com/docling-project/docling-serve/issues)
 before sending a pull request so the feature can be discussed. This is to avoid
 you wasting your valuable time working on a feature that the project developers
 are not interested in accepting into the code base.

 ### Fixing bugs

-If you would like to fix a bug, please [raise an issue](https://github.com/DS4SD/docling-serve/issues) before sending a
+If you would like to fix a bug, please [raise an issue](https://github.com/docling-project/docling-serve/issues) before sending a
 pull request so it can be tracked.

 ### Merge approval
@@ -73,7 +73,7 @@ git commit -s

 ## Communication

-Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling-serve/discussions).
+Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling-serve/discussions).

 ## Developing

--- a/24
+++ b/24
@@ -2,9 +2,6 @@ ARG BASE_IMAGE=quay.io/sclorg/python-312-c9s:c9s

 FROM ${BASE_IMAGE}

-ARG MODELS_LIST="layout tableformer picture_classifier easyocr" \
-    UV_SYNC_EXTRA_ARGS=""
-
 USER 0

 ###################################################################################################
@@ -20,6 +17,8 @@ RUN --mount=type=bind,source=os-packages.txt,target=/tmp/os-packages.txt \
    dnf -y clean all && \
    rm -rf /var/cache/dnf

+RUN /usr/bin/fix-permissions /opt/app-root/src/.cache
+
 ENV TESSDATA_PREFIX=/usr/share/tesseract/tessdata/

 ###################################################################################################
@@ -41,25 +40,32 @@ ENV \
    UV_PROJECT_ENVIRONMENT=/opt/app-root \
    DOCLING_SERVE_ARTIFACTS_PATH=/opt/app-root/src/.cache/docling/models

-RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+ARG UV_SYNC_EXTRA_ARGS=""
+
+RUN --mount=from=ghcr.io/astral-sh/uv:0.7.19,source=/uv,target=/bin/uv \
    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
-    uv sync --frozen --no-install-project --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}
+    umask 002 && \
+    UV_SYNC_ARGS="--frozen --no-install-project --no-dev --all-extras" && \
+    uv sync ${UV_SYNC_ARGS} ${UV_SYNC_EXTRA_ARGS} --no-extra flash-attn && \
+    FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE uv sync ${UV_SYNC_ARGS} ${UV_SYNC_EXTRA_ARGS} --no-build-isolation-package=flash-attn
+
+ARG MODELS_LIST="layout tableformer picture_classifier easyocr"

 RUN echo "Downloading models..." && \
    HF_HUB_DOWNLOAD_TIMEOUT="90" \
    HF_HUB_ETAG_TIMEOUT="90" \
    docling-tools models download -o "${DOCLING_SERVE_ARTIFACTS_PATH}" ${MODELS_LIST} && \
-    chown -R 1001:0 /opt/app-root/src/.cache && \
-    chmod -R g=u /opt/app-root/src/.cache
+    chown -R 1001:0 ${DOCLING_SERVE_ARTIFACTS_PATH} && \
+    chmod -R g=u ${DOCLING_SERVE_ARTIFACTS_PATH}

 COPY --chown=1001:0 ./docling_serve ./docling_serve
-RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+RUN --mount=from=ghcr.io/astral-sh/uv:0.7.19,source=/uv,target=/bin/uv \
    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
-    uv sync --frozen --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}
+    umask 002 && uv sync --frozen --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}

 EXPOSE 5001

--- a/MAINTAINERS.md
+++ b/MAINTAINERS.md
@@ -1,11 +1,11 @@
 # MAINTAINERS

- Christoph Auer - [@cau-git](https://github.com/cau-git)
- Michele Dolfi - [@dolfim-ibm](https://github.com/dolfim-ibm)
- Maxim Lysak - [@maxmnemonic](https://github.com/maxmnemonic)
- Nikos Livathinos - [@nikos-livathinos](https://github.com/nikos-livathinos)
- Ahmed Nassar - [@nassarofficial](https://github.com/nassarofficial)
- Panos Vagenas - [@vagenas](https://github.com/vagenas)
- Peter Staar - [@PeterStaar-IBM](https://github.com/PeterStaar-IBM)
+- Christoph Auer - [`@cau-git`](https://github.com/cau-git)
+- Michele Dolfi - [`@dolfim-ibm`](https://github.com/dolfim-ibm)
+- Maxim Lysak - [`@maxmnemonic`](https://github.com/maxmnemonic)
+- Nikos Livathinos - [`@nikos-livathinos`](https://github.com/nikos-livathinos)
+- Ahmed Nassar - [`@nassarofficial`](https://github.com/nassarofficial)
+- Panos Vagenas - [`@vagenas`](https://github.com/vagenas)
+- Peter Staar - [`@PeterStaar-IBM`](https://github.com/PeterStaar-IBM)

 Maintainers can be contacted at [deepsearch-core@zurich.ibm.com](mailto:deepsearch-core@zurich.ibm.com).
--- a/51
+++ b/51
@@ -17,6 +17,7 @@ else
 endif

 TAG=$(shell git rev-parse HEAD)
+BRANCH_TAG=$(shell git rev-parse --abbrev-ref HEAD)

 action-lint-file:
 	$(CMD_PREFIX) touch .action-lint
@@ -25,25 +26,39 @@ md-lint-file:
 	$(CMD_PREFIX) touch .markdown-lint

 .PHONY: docling-serve-image
-docling-serve-image: Containerfile
+docling-serve-image: Containerfile ## Build docling-serve container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu" -f Containerfile -t ghcr.io/ds4sd/docling-serve:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) ghcr.io/ds4sd/docling-serve:main
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) quay.io/ds4sd/docling-serve:main
+	$(CMD_PREFIX) docker build --load -f Containerfile -t ghcr.io/docling-project/docling-serve:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) ghcr.io/docling-project/docling-serve:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve:$(TAG) quay.io/docling-project/docling-serve:$(BRANCH_TAG)

 .PHONY: docling-serve-cpu-image
 docling-serve-cpu-image: Containerfile ## Build docling-serve "cpu only" container image
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve CPU]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124" -f Containerfile -t ghcr.io/ds4sd/docling-serve-cpu:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cpu:$(TAG) ghcr.io/ds4sd/docling-serve-cpu:main
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cpu:$(TAG) quay.io/ds4sd/docling-serve-cpu:main
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cpu --no-extra flash-attn" -f Containerfile -t ghcr.io/docling-project/docling-serve-cpu:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) ghcr.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cpu:$(TAG) quay.io/docling-project/docling-serve-cpu:$(BRANCH_TAG)

 .PHONY: docling-serve-cu124-image
-docling-serve-cu124-image: Containerfile ## Build docling-serve container image with GPU support
+docling-serve-cu124-image: Containerfile ## Build docling-serve container image with CUDA 12.4 support
 	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.4]"
-	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cpu" -f Containerfile --platform linux/amd64 -t ghcr.io/ds4sd/docling-serve-cu124:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cu124:$(TAG) ghcr.io/ds4sd/docling-serve-cu124:main
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cu124:$(TAG) quay.io/ds4sd/docling-serve-cu124:main
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu124" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu124:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) ghcr.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu124:$(TAG) quay.io/docling-project/docling-serve-cu124:$(BRANCH_TAG)
+
+.PHONY: docling-serve-cu126-image
+docling-serve-cu126-image: Containerfile ## Build docling-serve container image with CUDA 12.6 support
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.6]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu126" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu126:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) ghcr.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu126:$(TAG) quay.io/docling-project/docling-serve-cu126:$(BRANCH_TAG)
+
+.PHONY: docling-serve-cu128-image
+docling-serve-cu128-image: Containerfile ## Build docling-serve container image with CUDA 12.8 support
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.8]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-group pypi --group cu128" -f Containerfile --platform linux/amd64 -t ghcr.io/docling-project/docling-serve-cu128:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) ghcr.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)
+	$(CMD_PREFIX) docker tag ghcr.io/docling-project/docling-serve-cu128:$(TAG) quay.io/docling-project/docling-serve-cu128:$(BRANCH_TAG)

 .PHONY: action-lint
 action-lint: .action-lint ##      Lint GitHub Action workflows
@@ -66,7 +81,7 @@ action-lint: .action-lint ##      Lint GitHub Action workflows
 md-lint: .md-lint ##      Lint markdown files
 .md-lint: $(wildcard */**/*.md) | md-lint-file
 	$(ECHO_PREFIX) printf "  %-12s ./...\n" "[MD LINT]"
-	$(CMD_PREFIX) docker run --rm -v $$(pwd):/workdir davidanson/markdownlint-cli2:v0.14.0 "**/*.md"
+	$(CMD_PREFIX) docker run --rm -v $$(pwd):/workdir davidanson/markdownlint-cli2:v0.16.0 "**/*.md" "#.venv"
 	$(CMD_PREFIX) touch $@

 .PHONY: py-Lint
@@ -84,11 +99,11 @@ run-docling-cpu: ## Run the docling-serve container with CPU support and assign
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
 	$(CMD_PREFIX) docker rm -f docling-serve-cpu 2>/dev/null || true
 	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with CPU support on port 5001...\n" "[RUN CPU]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-cpu -p 5001:5001 ghcr.io/ds4sd/docling-serve-cpu:main
+	$(CMD_PREFIX) docker run -it --name docling-serve-cpu -p 5001:5001 ghcr.io/docling-project/docling-serve-cpu:main

-.PHONY: run-docling-gpu
-run-docling-gpu: ## Run the docling-serve container with GPU support and assign a container name
+.PHONY: run-docling-cu124
+run-docling-cu124: ## Run the docling-serve container with GPU support and assign a container name
 	$(ECHO_PREFIX) printf "  %-12s Removing existing container if it exists...\n" "[CLEANUP]"
-	$(CMD_PREFIX) docker rm -f docling-serve-gpu 2>/dev/null || true
-	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN GPU]"
-	$(CMD_PREFIX) docker run -it --name docling-serve-gpu -p 5001:5001 ghcr.io/ds4sd/docling-serve:main
+	$(CMD_PREFIX) docker rm -f docling-serve-cu124 2>/dev/null || true
+	$(ECHO_PREFIX) printf "  %-12s Running docling-serve container with GPU support on port 5001...\n" "[RUN CUDA 12.4]"
+	$(CMD_PREFIX) docker run -it --name docling-serve-cu124 -p 5001:5001 ghcr.io/docling-project/docling-serve-cu124:main
--- a/README.md
+++ b/README.md
@@ -1,431 +1,84 @@
+<p align="center">
+  <a href="https://github.com/docling-project/docling-serve">
+    <img loading="lazy" alt="Docling" src="https://github.com/docling-project/docling-serve/raw/main/docs/assets/docling-serve-pic.png" width="30%"/>
+  </a>
+</p>
+
 # Docling Serve

- Running [Docling](https://github.com/DS4SD/docling) as an API service.
+Running [Docling](https://github.com/docling-project/docling) as an API service.

-## Usage
+📚 [Docling Serve documentation](./docs/README.md)

-The API provides two endpoints: one for urls, one for files. This is necessary to send files directly in binary format instead of base64-encoded strings.
+- Learning how to [configure the webserver](./docs/configuration.md)
+- Get to know all [runtime options](./docs/usage.md) of the API
+- Explore useful [deployment examples](./docs/deployment.md)
+- And more

-### Common parameters
+> [!NOTE]
+> **Migration to the `v1` API.** Docling Serve now has a stable v1 API. Read more on the [migration to v1](./docs/v1_migration.md).

-On top of the source of file (see below), both endpoints support the same parameters, which are almost the same as the Docling CLI.
+## Getting started

- `from_format` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
- `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
- `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
- `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
- `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
- `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesseract_cli`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`.
- `ocr_lang` (List[str]): List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. Defaults to empty.
- `pdf_backend` (str): PDF backend to use. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`. Defaults to `dlparse_v2`.
- `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`.
- `abort_on_error` (bool): If enabled, abort on error. Defaults to false.
- `return_as_file` (boo): If enabled, return the output as a file. Defaults to false.
- `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
- `include_images` (bool): If enabled, images will be extracted from the document. Defaults to true.
- `images_scale` (float): Scale factor for images. Defaults to 2.0.
+Install the `docling-serve` package and run the server.

-### URL endpoint
+```bash
+# Using the python package
+pip install "docling-serve[ui]"
+docling-serve run --enable-ui

-The endpoint is `/v1alpha/convert/source`, listening for POST requests of JSON payloads.
-
-On top of the above parameters, you must send the URL(s) of the document you want process with either the `http_sources` or `file_sources` fields.
-The first is fetching URL(s) (optionally using with extra headers), the second allows to provide documents as base64-encoded strings.
-No `options` is required, they can be partially or completely omitted.
-
-Simple payload example:
-
-```json
-{
-  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
-}
+# Using container images, e.g. with Podman
+podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=1 quay.io/docling-project/docling-serve
 ```

-<details>
+The server is available at

-<summary>Complete payload example:</summary>
+- API <http://127.0.0.1:5001>
+- API documentation <http://127.0.0.1:5001/docs>
+- UI playground <http://127.0.0.1:5001/ui>
+  ![swagger.png](img/swagger.png)

-```json
-{
-  "options": {
-    "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
-    "to_formats": ["md", "json", "html", "text", "doctags"],
-    "image_export_mode": "placeholder",
-    "do_ocr": true,
-    "force_ocr": false,
-    "ocr_engine": "easyocr",
-    "ocr_lang": ["en"],
-    "pdf_backend": "dlparse_v2",
-    "table_mode": "fast",
-    "abort_on_error": false,
-    "return_as_file": false,
-  },
-  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
-}
-```
+Try it out with a simple conversion:

-</details>
-
-<details>
-
-<summary>CURL example:</summary>
-
-```sh
+```bash
 curl -X 'POST' \
-  'http://localhost:5001/v1alpha/convert/source' \
+  'http://localhost:5001/v1/convert/source' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
-  "options": {
-    "from_formats": [
-      "docx",
-      "pptx",
-      "html",
-      "image",
-      "pdf",
-      "asciidoc",
-      "md",
-      "xlsx"
-    ],
-    "to_formats": ["md", "json", "html", "text", "doctags"],
-    "image_export_mode": "placeholder",
-    "do_ocr": true,
-    "force_ocr": false,
-    "ocr_engine": "easyocr",
-    "ocr_lang": [
-      "fr",
-      "de",
-      "es",
-      "en"
-    ],
-    "pdf_backend": "dlparse_v2",
-    "table_mode": "fast",
-    "abort_on_error": false,
-    "return_as_file": false,
-    "do_table_structure": true,
-    "include_images": true,
-    "images_scale": 2
-  },
-  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
-}'
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+  }'
 ```

-</details>
+### Container images

-<details>
-<summary>Python example:</summary>
+Available container images:

-```python
-import httpx
+| Name | Description | Arch | Size |
+| -----|-------------|------|------|
+| [`ghcr.io/docling-project/docling-serve`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve) <br /> [`quay.io/docling-project/docling-serve`](https://quay.io/repository/docling-project/docling-serve) | Simple image for Docling Serve, installing all packages from the official pypi.org index. | `linux/amd64`, `linux/arm64` | 3.6 GB (arm64) <br /> 8.7 GB (amd64) |
+| [`ghcr.io/docling-project/docling-serve-cpu`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cpu) <br /> [`quay.io/docling-project/docling-serve-cpu`](https://quay.io/repository/docling-project/docling-serve-cpu) | Cpu-only image which installs `torch` from the pytorch cpu index. | `linux/amd64`, `linux/arm64` | 3.6 GB |
+| [`ghcr.io/docling-project/docling-serve-cu124`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu124) <br /> [`quay.io/docling-project/docling-serve-cu124`](https://quay.io/repository/docling-project/docling-serve-cu124) | Cuda 12.4 image which installs `torch` from the pytorch cu124 index. | `linux/amd64` | 8.7 GB |
+| [`ghcr.io/docling-project/docling-serve-cu126`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu126) <br /> [`quay.io/docling-project/docling-serve-cu126`](https://quay.io/repository/docling-project/docling-serve-cu126) | Cuda 12.6 image which installs `torch` from the pytorch cu126 index. | `linux/amd64` | 8.7 GB |
+| [`ghcr.io/docling-project/docling-serve-cu128`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu128) <br /> [`quay.io/docling-project/docling-serve-cu128`](https://quay.io/repository/docling-project/docling-serve-cu128) | Cuda 12.8 image which installs `torch` from the pytorch cu128 index. | `linux/amd64` | 8.7 GB |

-async_client = httpx.AsyncClient(timeout=60.0)
-url = "http://localhost:5001/v1alpha/convert/source"
-payload = {
-  "options": {
-    "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
-    "to_formats": ["md", "json", "html", "text", "doctags"],
-    "image_export_mode": "placeholder",
-    "do_ocr": True,
-    "force_ocr": False,
-    "ocr_engine": "easyocr",
-    "ocr_lang": "en",
-    "pdf_backend": "dlparse_v2",
-    "table_mode": "fast",
-    "abort_on_error": False,
-    "return_as_file": False,
-  },
-  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
-}
+Coming soon: `docling-serve-slim` images will reduce the size by skipping the model weights download.

-response = await async_client_client.post(url, json=payload)
+### Demonstration UI

-data = response.json()
-```
+An easy to use UI is available at the `/ui` endpoint.

-</details>
+![Input controllers in the UI](img/ui-input.png)

-#### File as base64
-
-The `file_sources` argument in the endpoint allows to send files as base64-encoded strings.
-When your PDF or other file type is too large, encoding it and passing it inline to curl
-can lead to an “Argument list too long” error on some systems. To avoid this, we write
-the JSON request body to a file and have curl read from that file.
-
-<details>
-<summary>CURL steps:</summary>
-
-```sh
-# 1. Base64-encode the file
-B64_DATA=$(base64 -w 0 /path/to/file/pdf-to-convert.pdf)
-
-# 2. Build the JSON with your options
-cat <<EOF > /tmp/request_body.json
-{
-  "options": {
-  },
-  "file_sources": [{
-    "base64_string": "${B64_DATA}",
-    "filename": "pdf-to-convert.pdf"
-  }]
-}
-EOF
-
-# 3. POST the request to the docling service
-curl -X POST "localhost:5001/v1alpha/convert/source" \
-     -H "Content-Type: application/json" \
-     -d @/tmp/request_body.json
-```
-
-</details>
-
-### File endpoint
-
-The endpoint is: `/v1alpha/convert/file`, listening for POST requests of Form payloads (necessary as the files are sent as multipart/form data). You can send one or multiple files.
-
-<details>
-<summary>CURL example:</summary>
-
-```sh
-curl -X 'POST' \
-  'http://127.0.0.1:5001/v1alpha/convert/file' \
-  -H 'accept: application/json' \
-  -H 'Content-Type: multipart/form-data' \
-  -F 'ocr_engine=easyocr' \
-  -F 'pdf_backend=dlparse_v2' \
-  -F 'from_formats=pdf' \
-  -F 'from_formats=docx' \
-  -F 'force_ocr=false' \
-  -F 'image_export_mode=embedded' \
-  -F 'ocr_lang=en' \
-  -F 'ocr_lang=pl' \
-  -F 'table_mode=fast' \
-  -F 'files=@2206.01062v1.pdf;type=application/pdf' \
-  -F 'abort_on_error=false' \
-  -F 'to_formats=md' \
-  -F 'to_formats=text' \
-  -F 'return_as_file=false' \
-  -F 'do_ocr=true'
-```
-
-</details>
-
-<details>
-<summary>Python example:</summary>
-
-```python
-import httpx
-
-async_client = httpx.AsyncClient(timeout=60.0)
-url = "http://localhost:5001/v1alpha/convert/file"
-parameters = {
-"from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
-"to_formats": ["md", "json", "html", "text", "doctags"],
-"image_export_mode": "placeholder",
-"do_ocr": True,
-"force_ocr": False,
-"ocr_engine": "easyocr",
-"ocr_lang": ["en"],
-"pdf_backend": "dlparse_v2",
-"table_mode": "fast",
-"abort_on_error": False,
-"return_as_file": False
-}
-
-current_dir = os.path.dirname(__file__)
-file_path = os.path.join(current_dir, '2206.01062v1.pdf')
-
-files = {
-    'files': ('2206.01062v1.pdf', open(file_path, 'rb'), 'application/pdf'),
-}
-
-response = await async_client.post(url, files=files, data={"parameters": json.dumps(parameters)})
-assert response.status_code == 200, "Response should be 200 OK"
-
-data = response.json()
-```
-
-</details>
-
-### Response format
-
-The response can be a JSON Document or a File.
-
- If you process only one file, the response will be a JSON document with the following format:
-
-  ```jsonc
-  {
-    "document": {
-      "md_content": "",
-      "json_content": {},
-      "html_content": "",
-      "text_content": "",
-      "doctags_content": ""
-      },
-    "status": "<success|partial_success|skipped|failure>",
-    "processing_time": 0.0,
-    "timings": {},
-    "errors": []
-  }
-  ```
-
-  Depending on the value you set in `output_formats`, the different items will be populated with their respective results or empty.
-
-  `processing_time` is the Docling processing time in seconds, and `timings` (when enabled in the backend) provides the detailed
-  timing of all the internal Docling components.
-
- If you set the parameter `return_as_file` to True, the response will be a zip file.
- If multiple files are generated (multiple inputs, or one input but multiple outputs with `return_as_file` True), the response will be a zip file.
-
-## Run docling-serve
-
-Clone the repository and run the following from within the cloned directory root.
-
-```bash
-python -m venv venv
-source venv/bin/activate
-pip install "docling-serve[ui]"
-docling-serve run --enable-ui
-```
-
-## Helpers
-
- A full Swagger UI is available at the `/docs` endpoint.
-
-![swagger.png](img/swagger.png)
-
- An easy to use UI is available at the `/ui` endpoint.
-
-![ui-input.png](img/ui-input.png)
-
-![ui-output.png](img/ui-output.png)
-
-## Development
-
-### CPU only
-
-```sh
-# Install uv if not already available
-curl -LsSf https://astral.sh/uv/install.sh | sh
-
-# Install dependencies
-uv sync --extra cpu
-```
-
-### Cuda GPU
-
-For GPU support use the following command:
-
-```sh
-# Install dependencies
-uv sync
-```
-
-### Gradio UI and different OCR backends
-
-`/ui` endpoint using `gradio` and different OCR backends can be enabled via package extras:
-
-```sh
-# Enable ui and rapidocr
-uv sync --extra ui --extra rapidocr
-```
-
-```sh
-# Enable tesserocr
-uv sync --extra tesserocr
-```
-
-See `[project.optional-dependencies]` section in `pyproject.toml` for full list of options and runtime options with `uv run docling-serve --help`.
-
-### Run the server
-
-The `docling-serve` executable is a convenient script for launching the webserver both in
-development and production mode.
-
-```sh
-# Run the server in development mode
-# - reload is enabled by default
-# - listening on the 127.0.0.1 address
-# - ui is enabled by default
-docling-serve dev
-
-# Run the server in production mode
-# - reload is disabled by default
-# - listening on the 0.0.0.0 address
-# - ui is disabled by default
-docling-serve run
-```
-
-### Options
-
-The `docling-serve` executable allows is controlled with both command line
-options and environment variables.
-
-<details>
-<summary>`docling-serve` help message</summary>
-
-```sh
-$ docling-serve dev --help
-                                                                                                              
- Usage: docling-serve dev [OPTIONS]                                                                           
-                                                                                                              
- Run a Docling Serve app in development mode. 🧪                                                              
- This is equivalent to docling-serve run but with reload                                                      
- enabled and listening on the 127.0.0.1 address.                                                              
-                                                                                                              
- Options can be set also with the corresponding ENV variable, with the exception                              
- of --enable-ui, --host and --reload.                                                                         
-                                                                                                              
-╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ --host                                   TEXT     The host to serve on. For local development in localhost │
-│                                                   use 127.0.0.1. To enable public access, e.g. in a        │
-│                                                   container, use all the IP addresses available with       │
-│                                                   0.0.0.0.                                                 │
-│                                                   [default: 127.0.0.1]                                     │
-│ --port                                   INTEGER  The port to serve on. [default: 5001]                    │
-│ --reload           --no-reload                    Enable auto-reload of the server when (code) files       │
-│                                                   change. This is resource intensive, use it only during   │
-│                                                   development.                                             │
-│                                                   [default: reload]                                        │
-│ --root-path                              TEXT     The root path is used to tell your app that it is being  │
-│                                                   served to the outside world with some path prefix set up │
-│                                                   in some termination proxy or similar.                    │
-│ --proxy-headers    --no-proxy-headers             Enable/Disable X-Forwarded-Proto, X-Forwarded-For,       │
-│                                                   X-Forwarded-Port to populate remote address info.        │
-│                                                   [default: proxy-headers]                                 │
-│ --artifacts-path                          PATH     If set to a valid directory, the model weights will be  │
-│                                                    loaded from this path.                                  │
-│                                                    [default: None]                                         │
-│ --enable-ui        --no-enable-ui                 Enable the development UI. [default: enable-ui]          │
-│ --help                                            Show this message and exit.                              │
-╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-```
-
-</details>
-
-#### Environment variables
-
-The environment variables controlling the `uvicorn` execution can be specified with the `UVICORN_` prefix:
-
- `UVICORN_WORKERS`: Number of workers to use.
- `UVICORN_RELOAD`: If `True`, this will enable auto-reload when you modify files, useful for development.
-
-The environment variables controlling specifics of the Docling Serve app can be specified with the
-`DOCLING_SERVE_` prefix:
-
- `DOCLING_SERVE_ARTIFACTS_PATH`: if set Docling will use only the local weights of models, for example `/opt/app-root/src/.cache/docling/models`.
- `DOCLING_SERVE_ENABLE_UI`: If `True`, The Gradio UI will be available at `/ui`.
-
-Others:
-
- `TESSDATA_PREFIX`: Tesseract data location, example `/usr/share/tesseract/tessdata/`.
+![Output visualization in the UI](img/ui-output.png)

 ## Get help and support

-Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).
+Please feel free to connect with us using the [discussion section](https://github.com/docling-project/docling/discussions).

 ## Contributing

-Please read [Contributing to Docling Serve](https://github.com/DS4SD/docling-serve/blob/main/CONTRIBUTING.md) for details.
+Please read [Contributing to Docling Serve](https://github.com/docling-project/docling-serve/blob/main/CONTRIBUTING.md) for details.

 ## References

@@ -433,14 +86,14 @@ If you use Docling in your projects, please consider citing the following:

 ```bib
@techreport{Docling,
-  author = {Deep Search Team},
-  month = {8},
-  title = {Docling Technical Report},
-  url = {https://arxiv.org/abs/2408.09869},
-  eprint = {2408.09869},
-  doi = {10.48550/arXiv.2408.09869},
-  version = {1.0.0},
-  year = {2024}
+  author = {Docling Contributors},
+  month = {1},
+  title = {Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion},
+  url = {https://arxiv.org/abs/2501.17887},
+  eprint = {2501.17887},
+  doi = {10.48550/arXiv.2501.17887},
+  version = {2.0.0},
+  year = {2025}
 }
 ```

--- a/docling_serve/main.py
+++ b/docling_serve/main.py
@@ -30,6 +30,7 @@ logger = logging.getLogger(__name__)
 def version_callback(value: bool) -> None:
    if value:
        docling_serve_version = importlib.metadata.version("docling_serve")
+        docling_jobkit_version = importlib.metadata.version("docling-jobkit")
        docling_version = importlib.metadata.version("docling")
        docling_core_version = importlib.metadata.version("docling-core")
        docling_ibm_models_version = importlib.metadata.version("docling-ibm-models")
@@ -38,6 +39,7 @@ def version_callback(value: bool) -> None:
        py_impl_version = sys.implementation.cache_tag
        py_lang_version = platform.python_version()
        console.print(f"Docling Serve version: {docling_serve_version}")
+        console.print(f"Docling Jobkit version: {docling_jobkit_version}")
        console.print(f"Docling version: {docling_version}")
        console.print(f"Docling Core version: {docling_core_version}")
        console.print(f"Docling IBM Models version: {docling_ibm_models_version}")
@@ -74,18 +76,52 @@ def callback(
 def _run(
    *,
    command: str,
+    # Docling serve parameters
+    artifacts_path: Path | None,
+    enable_ui: bool,
 ) -> None:
    server_type = "development" if command == "dev" else "production"

    console.print(f"Starting {server_type} server 🚀")

-    url = f"http://{uvicorn_settings.host}:{uvicorn_settings.port}"
+    run_subprocess = (
+        uvicorn_settings.workers is not None and uvicorn_settings.workers > 1
+    ) or uvicorn_settings.reload
+
+    run_ssl = (
+        uvicorn_settings.ssl_certfile is not None
+        and uvicorn_settings.ssl_keyfile is not None
+    )
+
+    if run_subprocess and docling_serve_settings.artifacts_path != artifacts_path:
+        err_console.print(
+            "\n[yellow]:warning: The server will run with reload or multiple workers. \n"
+            "The argument [bold]--artifacts-path[/bold] will be ignored, please set the value \n"
+            "using the environment variable [bold]DOCLING_SERVE_ARTIFACTS_PATH[/bold].[/yellow]"
+        )
+
+    if run_subprocess and docling_serve_settings.enable_ui != enable_ui:
+        err_console.print(
+            "\n[yellow]:warning: The server will run with reload or multiple workers. \n"
+            "The argument [bold]--enable-ui[/bold] will be ignored, please set the value \n"
+            "using the environment variable [bold]DOCLING_SERVE_ENABLE_UI[/bold].[/yellow]"
+        )
+
+    # Propagate the settings to the app settings
+    docling_serve_settings.artifacts_path = artifacts_path
+    docling_serve_settings.enable_ui = enable_ui
+
+    # Print documentation
+    protocol = "https" if run_ssl else "http"
+    url = f"{protocol}://{uvicorn_settings.host}:{uvicorn_settings.port}"
    url_docs = f"{url}/docs"
+    url_scalar = f"{url}/scalar"
    url_ui = f"{url}/ui"

    console.print("")
    console.print(f"Server started at [link={url}]{url}[/]")
    console.print(f"Documentation at [link={url_docs}]{url_docs}[/]")
+    console.print(f"Scalar docs at [link={url_docs}]{url_scalar}[/]")
    if docling_serve_settings.enable_ui:
        console.print(f"UI at [link={url_ui}]{url_ui}[/]")

@@ -99,6 +135,7 @@ def _run(
    console.print("")
    console.print("Logs:")

+    # Launch the server
    uvicorn.run(
        app="docling_serve.app:create_app",
        factory=True,
@@ -108,6 +145,10 @@ def _run(
        workers=uvicorn_settings.workers,
        root_path=uvicorn_settings.root_path,
        proxy_headers=uvicorn_settings.proxy_headers,
+        timeout_keep_alive=uvicorn_settings.timeout_keep_alive,
+        ssl_certfile=uvicorn_settings.ssl_certfile,
+        ssl_keyfile=uvicorn_settings.ssl_keyfile,
+        ssl_keyfile_password=uvicorn_settings.ssl_keyfile_password,
    )


@@ -159,6 +200,18 @@ def dev(
            )
        ),
    ] = uvicorn_settings.proxy_headers,
+    timeout_keep_alive: Annotated[
+        int, typer.Option(help="Timeout for the server response.")
+    ] = uvicorn_settings.timeout_keep_alive,
+    ssl_certfile: Annotated[
+        Optional[Path], typer.Option(help="SSL certificate file")
+    ] = uvicorn_settings.ssl_certfile,
+    ssl_keyfile: Annotated[
+        Optional[Path], typer.Option(help="SSL key file")
+    ] = uvicorn_settings.ssl_keyfile,
+    ssl_keyfile_password: Annotated[
+        Optional[str], typer.Option(help="SSL keyfile password")
+    ] = uvicorn_settings.ssl_keyfile_password,
    # docling options
    artifacts_path: Annotated[
        Optional[Path],
@@ -186,12 +239,15 @@ def dev(
    uvicorn_settings.reload = reload
    uvicorn_settings.root_path = root_path
    uvicorn_settings.proxy_headers = proxy_headers
-
-    docling_serve_settings.artifacts_path = artifacts_path
-    docling_serve_settings.enable_ui = enable_ui
+    uvicorn_settings.timeout_keep_alive = timeout_keep_alive
+    uvicorn_settings.ssl_certfile = ssl_certfile
+    uvicorn_settings.ssl_keyfile = ssl_keyfile
+    uvicorn_settings.ssl_keyfile_password = ssl_keyfile_password

    _run(
        command="dev",
+        artifacts_path=artifacts_path,
+        enable_ui=enable_ui,
    )


@@ -251,6 +307,18 @@ def run(
            )
        ),
    ] = uvicorn_settings.proxy_headers,
+    timeout_keep_alive: Annotated[
+        int, typer.Option(help="Timeout for the server response.")
+    ] = uvicorn_settings.timeout_keep_alive,
+    ssl_certfile: Annotated[
+        Optional[Path], typer.Option(help="SSL certificate file")
+    ] = uvicorn_settings.ssl_certfile,
+    ssl_keyfile: Annotated[
+        Optional[Path], typer.Option(help="SSL key file")
+    ] = uvicorn_settings.ssl_keyfile,
+    ssl_keyfile_password: Annotated[
+        Optional[str], typer.Option(help="SSL keyfile password")
+    ] = uvicorn_settings.ssl_keyfile_password,
    # docling options
    artifacts_path: Annotated[
        Optional[Path],
@@ -281,12 +349,15 @@ def run(
    uvicorn_settings.workers = workers
    uvicorn_settings.root_path = root_path
    uvicorn_settings.proxy_headers = proxy_headers
-
-    docling_serve_settings.artifacts_path = artifacts_path
-    docling_serve_settings.enable_ui = enable_ui
+    uvicorn_settings.timeout_keep_alive = timeout_keep_alive
+    uvicorn_settings.ssl_certfile = ssl_certfile
+    uvicorn_settings.ssl_keyfile = ssl_keyfile
+    uvicorn_settings.ssl_keyfile_password = ssl_keyfile_password

    _run(
        command="run",
+        artifacts_path=artifacts_path,
+        enable_ui=enable_ui,
    )


--- a/docling_serve/app.py
+++ b/docling_serve/app.py
@@ -1,16 +1,18 @@
 import asyncio
+import copy
 import importlib.metadata
 import logging
-import tempfile
+import shutil
+import time
 from contextlib import asynccontextmanager
 from io import BytesIO
-from pathlib import Path
-from typing import Annotated, Any, Optional, Union
+from typing import Annotated

 from fastapi import (
    BackgroundTasks,
    Depends,
    FastAPI,
+    Form,
    HTTPException,
    Query,
    UploadFile,
@@ -18,36 +20,57 @@ from fastapi import (
    WebSocketDisconnect,
 )
 from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import RedirectResponse
+from fastapi.openapi.docs import (
+    get_redoc_html,
+    get_swagger_ui_html,
+    get_swagger_ui_oauth2_redirect_html,
+)
+from fastapi.responses import JSONResponse, RedirectResponse
+from fastapi.staticfiles import StaticFiles
+from scalar_fastapi import get_scalar_api_reference

-from docling.datamodel.base_models import DocumentStream, InputFormat
-from docling.document_converter import DocumentConverter
+from docling.datamodel.base_models import DocumentStream
+from docling_jobkit.datamodel.callback import (
+    ProgressCallbackRequest,
+    ProgressCallbackResponse,
+)
+from docling_jobkit.datamodel.http_inputs import FileSource, HttpSource
+from docling_jobkit.datamodel.s3_coords import S3Coordinates
+from docling_jobkit.datamodel.task import Task, TaskSource
+from docling_jobkit.datamodel.task_targets import (
+    InBodyTarget,
+    TaskTarget,
+    ZipTarget,
+)
+from docling_jobkit.orchestrators.base_orchestrator import (
+    BaseOrchestrator,
+    ProgressInvalid,
+    TaskNotFoundError,
+)

-from docling_serve.datamodel.convert import ConvertDocumentsOptions
+from docling_serve.datamodel.convert import ConvertDocumentsRequestOptions
 from docling_serve.datamodel.requests import (
-    ConvertDocumentFileSourcesRequest,
    ConvertDocumentsRequest,
+    FileSourceRequest,
+    HttpSourceRequest,
+    S3SourceRequest,
+    TargetName,
 )
 from docling_serve.datamodel.responses import (
+    ClearResponse,
    ConvertDocumentResponse,
    HealthCheckResponse,
    MessageKind,
+    PresignedUrlConvertDocumentResponse,
    TaskStatusResponse,
    WebsocketMessage,
 )
-from docling_serve.docling_conversion import (
-    convert_documents,
-    converters,
-    get_pdf_pipeline_opts,
-)
-from docling_serve.engines import get_orchestrator
-from docling_serve.engines.async_local.orchestrator import (
-    AsyncLocalOrchestrator,
-    TaskNotFoundError,
-)
 from docling_serve.helper_functions import FormDepends
-from docling_serve.response_preparation import process_results
+from docling_serve.orchestrator_factory import get_async_orchestrator
+from docling_serve.response_preparation import prepare_response
 from docling_serve.settings import docling_serve_settings
+from docling_serve.storage import get_scratch
+from docling_serve.websocket_notifier import WebsocketNotifier


 # Set up custom logging as we'll be intermixes with FastAPI/Uvicorn's logging
@@ -85,18 +108,15 @@ _log = logging.getLogger(__name__)
 # Context manager to initialize and clean up the lifespan of the FastAPI app
@asynccontextmanager
 async def lifespan(app: FastAPI):
-    # Converter with default options
-    pdf_format_option, options_hash = get_pdf_pipeline_opts(ConvertDocumentsOptions())
-    converters[options_hash] = DocumentConverter(
-        format_options={
-            InputFormat.PDF: pdf_format_option,
-            InputFormat.IMAGE: pdf_format_option,
-        }
-    )
+    scratch_dir = get_scratch()

-    converters[options_hash].initialize_pipeline(InputFormat.PDF)
+    orchestrator = get_async_orchestrator()
+    notifier = WebsocketNotifier(orchestrator)
+    orchestrator.bind_notifier(notifier)

-    orchestrator = get_orchestrator()
+    # Warm up processing cache
+    if docling_serve_settings.load_models_at_boot:
+        await orchestrator.warm_up_caches()

    # Start the background queue processor
    queue_task = asyncio.create_task(orchestrator.process_queue())
@@ -110,10 +130,9 @@ async def lifespan(app: FastAPI):
    except asyncio.CancelledError:
        _log.info("Queue processor cancelled.")

-    converters.clear()
-
-    # if WITH_UI:
-    #     gradio_ui.close()
+    # Remove scratch directory in case it was a tempfile
+    if docling_serve_settings.scratch_path is not None:
+        shutil.rmtree(scratch_dir, ignore_errors=True)


 ##################################
@@ -129,15 +148,25 @@ def create_app():  # noqa: C901

        version = "0.0.0"

+    offline_docs_assets = False
+    if (
+        docling_serve_settings.static_path is not None
+        and (docling_serve_settings.static_path).is_dir()
+    ):
+        offline_docs_assets = True
+        _log.info("Found static assets.")
+
    app = FastAPI(
        title="Docling Serve",
+        docs_url=None if offline_docs_assets else "/swagger",
+        redoc_url=None if offline_docs_assets else "/docs",
        lifespan=lifespan,
        version=version,
    )

-    origins = ["*"]
-    methods = ["*"]
-    headers = ["*"]
+    origins = docling_serve_settings.cors_origins
+    methods = docling_serve_settings.cors_methods
+    headers = docling_serve_settings.cors_headers

    app.add_middleware(
        CORSMiddleware,
@@ -154,7 +183,8 @@ def create_app():  # noqa: C901

            from docling_serve.gradio_ui import ui as gradio_ui

-            tmp_output_dir = Path(tempfile.mkdtemp())
+            tmp_output_dir = get_scratch() / "gradio"
+            tmp_output_dir.mkdir(exist_ok=True, parents=True)
            gradio_ui.gradio_output_dir = tmp_output_dir
            app = gr.mount_gradio_app(
                app,
@@ -170,16 +200,182 @@ def create_app():  # noqa: C901
                "or `pip install gradio`"
            )

+    #############################
+    # Offline assets definition #
+    #############################
+    if offline_docs_assets:
+        app.mount(
+            "/static",
+            StaticFiles(directory=docling_serve_settings.static_path),
+            name="static",
+        )
+
+        @app.get("/swagger", include_in_schema=False)
+        async def custom_swagger_ui_html():
+            return get_swagger_ui_html(
+                openapi_url=app.openapi_url,
+                title=app.title + " - Swagger UI",
+                oauth2_redirect_url=app.swagger_ui_oauth2_redirect_url,
+                swagger_js_url="/static/swagger-ui-bundle.js",
+                swagger_css_url="/static/swagger-ui.css",
+            )
+
+        @app.get(app.swagger_ui_oauth2_redirect_url, include_in_schema=False)
+        async def swagger_ui_redirect():
+            return get_swagger_ui_oauth2_redirect_html()
+
+        @app.get("/docs", include_in_schema=False)
+        async def redoc_html():
+            return get_redoc_html(
+                openapi_url=app.openapi_url,
+                title=app.title + " - ReDoc",
+                redoc_js_url="/static/redoc.standalone.js",
+            )
+
+    @app.get("/scalar", include_in_schema=False)
+    async def scalar_html():
+        return get_scalar_api_reference(
+            openapi_url=app.openapi_url,
+            title=app.title,
+            scalar_favicon_url="https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg",
+            # hide_client_button=True,  # not yet released but in main
+        )
+
+    ########################
+    # Async / Sync helpers #
+    ########################
+
+    async def _enque_source(
+        orchestrator: BaseOrchestrator, conversion_request: ConvertDocumentsRequest
+    ) -> Task:
+        sources: list[TaskSource] = []
+        for s in conversion_request.sources:
+            if isinstance(s, FileSourceRequest):
+                sources.append(FileSource.model_validate(s))
+            elif isinstance(s, HttpSourceRequest):
+                sources.append(HttpSource.model_validate(s))
+            elif isinstance(s, S3SourceRequest):
+                sources.append(S3Coordinates.model_validate(s))
+
+        task = await orchestrator.enqueue(
+            sources=sources,
+            options=conversion_request.options,
+            target=conversion_request.target,
+        )
+        return task
+
+    async def _enque_file(
+        orchestrator: BaseOrchestrator,
+        files: list[UploadFile],
+        options: ConvertDocumentsRequestOptions,
+        target: TaskTarget,
+    ) -> Task:
+        _log.info(f"Received {len(files)} files for processing.")
+
+        # Load the uploaded files to Docling DocumentStream
+        file_sources: list[TaskSource] = []
+        for i, file in enumerate(files):
+            buf = BytesIO(file.file.read())
+            suffix = "" if len(file_sources) == 1 else f"_{i}"
+            name = file.filename if file.filename else f"file{suffix}.pdf"
+            file_sources.append(DocumentStream(name=name, stream=buf))
+
+        task = await orchestrator.enqueue(
+            sources=file_sources, options=options, target=target
+        )
+        return task
+
+    async def _wait_task_complete(orchestrator: BaseOrchestrator, task_id: str) -> bool:
+        start_time = time.monotonic()
+        while True:
+            task = await orchestrator.task_status(task_id=task_id)
+            if task.is_completed():
+                return True
+            await asyncio.sleep(5)
+            elapsed_time = time.monotonic() - start_time
+            if elapsed_time > docling_serve_settings.max_sync_wait:
+                return False
+
+    ##########################################
+    # Downgrade openapi 3.1 to 3.0.x helpers #
+    ##########################################
+
+    def ensure_array_items(schema):
+        """Ensure that array items are defined."""
+        if "type" in schema and schema["type"] == "array":
+            if "items" not in schema or schema["items"] is None:
+                schema["items"] = {"type": "string"}
+            elif isinstance(schema["items"], dict):
+                if "type" not in schema["items"]:
+                    schema["items"]["type"] = "string"
+
+    def handle_discriminators(schema):
+        """Ensure that discriminator properties are included in required."""
+        if "discriminator" in schema and "propertyName" in schema["discriminator"]:
+            prop = schema["discriminator"]["propertyName"]
+            if "properties" in schema and prop in schema["properties"]:
+                if "required" not in schema:
+                    schema["required"] = []
+                if prop not in schema["required"]:
+                    schema["required"].append(prop)
+
+    def handle_properties(schema):
+        """Ensure that property 'kind' is included in required."""
+        if "properties" in schema and "kind" in schema["properties"]:
+            if "required" not in schema:
+                schema["required"] = []
+            if "kind" not in schema["required"]:
+                schema["required"].append("kind")
+
+    # Downgrade openapi 3.1 to 3.0.x
+    def downgrade_openapi31_to_30(spec):
+        def strip_unsupported(obj):
+            if isinstance(obj, dict):
+                obj = {
+                    k: strip_unsupported(v)
+                    for k, v in obj.items()
+                    if k not in ("const", "examples", "prefixItems")
+                }
+
+                handle_discriminators(obj)
+                ensure_array_items(obj)
+
+                # Check for oneOf and anyOf to handle nested schemas
+                for key in ["oneOf", "anyOf"]:
+                    if key in obj:
+                        for sub in obj[key]:
+                            handle_discriminators(sub)
+                            ensure_array_items(sub)
+
+                return obj
+            elif isinstance(obj, list):
+                return [strip_unsupported(i) for i in obj]
+            return obj
+
+        if "components" in spec and "schemas" in spec["components"]:
+            for schema_name, schema in spec["components"]["schemas"].items():
+                handle_properties(schema)
+
+        return strip_unsupported(copy.deepcopy(spec))
+
    #############################
    # API Endpoints definitions #
    #############################

+    @app.get("/openapi-3.0.json")
+    def openapi_30():
+        spec = app.openapi()
+        downgraded = downgrade_openapi31_to_30(spec)
+        downgraded["openapi"] = "3.0.3"
+        return JSONResponse(downgraded)
+
    # Favicon
    @app.get("/favicon.ico", include_in_schema=False)
    async def favicon():
-        response = RedirectResponse(
-            url="https://ds4sd.github.io/docling/assets/logo.png"
-        )
+        logo_url = "https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg"
+        if offline_docs_assets:
+            logo_url = "/static/logo.svg"
+        response = RedirectResponse(url=logo_url)
        return response

    @app.get("/health")
@@ -193,7 +389,7 @@ def create_app():  # noqa: C901

    # Convert a document from URL(s)
    @app.post(
-        "/v1alpha/convert/source",
+        "/v1/convert/source",
        response_model=ConvertDocumentResponse,
        responses={
            200: {
@@ -202,37 +398,34 @@ def create_app():  # noqa: C901
            }
        },
    )
-    def process_url(
-        background_tasks: BackgroundTasks, conversion_request: ConvertDocumentsRequest
+    async def process_url(
+        background_tasks: BackgroundTasks,
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+        conversion_request: ConvertDocumentsRequest,
    ):
-        sources: list[Union[str, DocumentStream]] = []
-        headers: Optional[dict[str, Any]] = None
-        if isinstance(conversion_request, ConvertDocumentFileSourcesRequest):
-            for file_source in conversion_request.file_sources:
-                sources.append(file_source.to_document_stream())
-        else:
-            for http_source in conversion_request.http_sources:
-                sources.append(http_source.url)
-                if headers is None and http_source.headers:
-                    headers = http_source.headers
-
-        # Note: results are only an iterator->lazy evaluation
-        results = convert_documents(
-            sources=sources, options=conversion_request.options, headers=headers
+        task = await _enque_source(
+            orchestrator=orchestrator, conversion_request=conversion_request
+        )
+        completed = await _wait_task_complete(
+            orchestrator=orchestrator, task_id=task.task_id
        )

-        # The real processing will happen here
-        response = process_results(
-            background_tasks=background_tasks,
-            conversion_options=conversion_request.options,
-            conv_results=results,
-        )
+        if not completed:
+            # TODO: abort task!
+            return HTTPException(
+                status_code=504,
+                detail=f"Conversion is taking too long. The maximum wait time is configure as DOCLING_SERVE_MAX_SYNC_WAIT={docling_serve_settings.max_sync_wait}.",
+            )

+        task = await orchestrator.get_raw_task(task_id=task.task_id)
+        response = await prepare_response(
+            task=task, orchestrator=orchestrator, background_tasks=background_tasks
+        )
        return response

    # Convert a document from file(s)
    @app.post(
-        "/v1alpha/convert/file",
+        "/v1/convert/file",
        response_model=ConvertDocumentResponse,
        responses={
            200: {
@@ -242,40 +435,46 @@ def create_app():  # noqa: C901
    )
    async def process_file(
        background_tasks: BackgroundTasks,
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
        files: list[UploadFile],
        options: Annotated[
-            ConvertDocumentsOptions, FormDepends(ConvertDocumentsOptions)
+            ConvertDocumentsRequestOptions, FormDepends(ConvertDocumentsRequestOptions)
        ],
+        target_type: Annotated[TargetName, Form()] = TargetName.INBODY,
    ):
-        _log.info(f"Received {len(files)} files for processing.")
-
-        # Load the uploaded files to Docling DocumentStream
-        file_sources = []
-        for file in files:
-            buf = BytesIO(file.file.read())
-            name = file.filename if file.filename else "file.pdf"
-            file_sources.append(DocumentStream(name=name, stream=buf))
-
-        results = convert_documents(sources=file_sources, options=options)
-
-        response = process_results(
-            background_tasks=background_tasks,
-            conversion_options=options,
-            conv_results=results,
+        target = InBodyTarget() if target_type == TargetName.INBODY else ZipTarget()
+        task = await _enque_file(
+            orchestrator=orchestrator, files=files, options=options, target=target
+        )
+        completed = await _wait_task_complete(
+            orchestrator=orchestrator, task_id=task.task_id
        )

+        if not completed:
+            # TODO: abort task!
+            return HTTPException(
+                status_code=504,
+                detail=f"Conversion is taking too long. The maximum wait time is configure as DOCLING_SERVE_MAX_SYNC_WAIT={docling_serve_settings.max_sync_wait}.",
+            )
+
+        task = await orchestrator.get_raw_task(task_id=task.task_id)
+        response = await prepare_response(
+            task=task, orchestrator=orchestrator, background_tasks=background_tasks
+        )
        return response

    # Convert a document from URL(s) using the async api
    @app.post(
-        "/v1alpha/convert/source/async",
+        "/v1/convert/source/async",
        response_model=TaskStatusResponse,
    )
    async def process_url_async(
-        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
        conversion_request: ConvertDocumentsRequest,
    ):
-        task = await orchestrator.enqueue(request=conversion_request)
+        task = await _enque_source(
+            orchestrator=orchestrator, conversion_request=conversion_request
+        )
        task_queue_position = await orchestrator.get_queue_position(
            task_id=task.task_id
        )
@@ -283,18 +482,48 @@ def create_app():  # noqa: C901
            task_id=task.task_id,
            task_status=task.task_status,
            task_position=task_queue_position,
+            task_meta=task.processing_meta,
+        )
+
+    # Convert a document from file(s) using the async api
+    @app.post(
+        "/v1/convert/file/async",
+        response_model=TaskStatusResponse,
+    )
+    async def process_file_async(
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+        background_tasks: BackgroundTasks,
+        files: list[UploadFile],
+        options: Annotated[
+            ConvertDocumentsRequestOptions, FormDepends(ConvertDocumentsRequestOptions)
+        ],
+        target_type: Annotated[TargetName, Form()] = TargetName.INBODY,
+    ):
+        target = InBodyTarget() if target_type == TargetName.INBODY else ZipTarget()
+        task = await _enque_file(
+            orchestrator=orchestrator, files=files, options=options, target=target
+        )
+        task_queue_position = await orchestrator.get_queue_position(
+            task_id=task.task_id
+        )
+        return TaskStatusResponse(
+            task_id=task.task_id,
+            task_status=task.task_status,
+            task_position=task_queue_position,
+            task_meta=task.processing_meta,
        )

    # Task status poll
    @app.get(
-        "/v1alpha/status/poll/{task_id}",
+        "/v1/status/poll/{task_id}",
        response_model=TaskStatusResponse,
    )
    async def task_status_poll(
-        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
        task_id: str,
        wait: Annotated[
-            float, Query(help="Number of seconds to wait for a completed status.")
+            float,
+            Query(description="Number of seconds to wait for a completed status."),
        ] = 0.0,
    ):
        try:
@@ -306,17 +535,19 @@ def create_app():  # noqa: C901
            task_id=task.task_id,
            task_status=task.task_status,
            task_position=task_queue_position,
+            task_meta=task.processing_meta,
        )

    # Task status websocket
    @app.websocket(
-        "/v1alpha/status/ws/{task_id}",
+        "/v1/status/ws/{task_id}",
    )
    async def task_status_ws(
        websocket: WebSocket,
-        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
        task_id: str,
    ):
+        assert isinstance(orchestrator.notifier, WebsocketNotifier)
        await websocket.accept()

        if task_id not in orchestrator.tasks:
@@ -331,7 +562,7 @@ def create_app():  # noqa: C901
        task = orchestrator.tasks[task_id]

        # Track active WebSocket connections for this job
-        orchestrator.task_subscribers[task_id].add(websocket)
+        orchestrator.notifier.task_subscribers[task_id].add(websocket)

        try:
            task_queue_position = await orchestrator.get_queue_position(task_id=task_id)
@@ -339,6 +570,7 @@ def create_app():  # noqa: C901
                task_id=task.task_id,
                task_status=task.task_status,
                task_position=task_queue_position,
+                task_meta=task.processing_meta,
            )
            await websocket.send_text(
                WebsocketMessage(
@@ -353,6 +585,7 @@ def create_app():  # noqa: C901
                    task_id=task.task_id,
                    task_status=task.task_status,
                    task_position=task_queue_position,
+                    task_meta=task.processing_meta,
                )
                await websocket.send_text(
                    WebsocketMessage(
@@ -367,12 +600,12 @@ def create_app():  # noqa: C901
            _log.info(f"WebSocket disconnected for job {task_id}")

        finally:
-            orchestrator.task_subscribers[task_id].remove(websocket)
+            orchestrator.notifier.task_subscribers[task_id].remove(websocket)

    # Task result
    @app.get(
-        "/v1alpha/result/{task_id}",
-        response_model=ConvertDocumentResponse,
+        "/v1/result/{task_id}",
+        response_model=ConvertDocumentResponse | PresignedUrlConvertDocumentResponse,
        responses={
            200: {
                "content": {"application/zip": {}},
@@ -380,15 +613,61 @@ def create_app():  # noqa: C901
        },
    )
    async def task_result(
-        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+        background_tasks: BackgroundTasks,
        task_id: str,
    ):
-        result = await orchestrator.task_result(task_id=task_id)
-        if result is None:
-            raise HTTPException(
-                status_code=404,
-                detail="Task result not found. Please wait for a completion status.",
+        try:
+            task = await orchestrator.get_raw_task(task_id=task_id)
+            response = await prepare_response(
+                task=task, orchestrator=orchestrator, background_tasks=background_tasks
            )
-        return result
+            return response
+        except TaskNotFoundError:
+            raise HTTPException(status_code=404, detail="Task not found.")
+
+    # Update task progress
+    @app.post(
+        "/v1/callback/task/progress",
+        response_model=ProgressCallbackResponse,
+    )
+    async def callback_task_progress(
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+        request: ProgressCallbackRequest,
+    ):
+        try:
+            await orchestrator.receive_task_progress(request=request)
+            return ProgressCallbackResponse(status="ack")
+        except TaskNotFoundError:
+            raise HTTPException(status_code=404, detail="Task not found.")
+        except ProgressInvalid as err:
+            raise HTTPException(
+                status_code=400, detail=f"Invalid progress payload: {err}"
+            )
+
+    #### Clear requests
+
+    # Offload models
+    @app.get(
+        "/v1/clear/converters",
+        response_model=ClearResponse,
+    )
+    async def clear_converters(
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+    ):
+        await orchestrator.clear_converters()
+        return ClearResponse()
+
+    # Clean results
+    @app.get(
+        "/v1/clear/results",
+        response_model=ClearResponse,
+    )
+    async def clear_results(
+        orchestrator: Annotated[BaseOrchestrator, Depends(get_async_orchestrator)],
+        older_then: float = 3600,
+    ):
+        await orchestrator.clear_results(older_than=older_then)
+        return ClearResponse()

    return app
--- a/docling_serve/datamodel/convert.py
+++ b/docling_serve/datamodel/convert.py
@@ -1,174 +1,40 @@
 # Define the input options for the API
-from typing import Annotated, Optional
+from typing import Annotated

-from pydantic import BaseModel, Field
+from pydantic import Field

-from docling.datamodel.base_models import InputFormat, OutputFormat
-from docling.datamodel.pipeline_options import OcrEngine, PdfBackend, TableFormerMode
-from docling_core.types.doc import ImageRefMode
+from docling.datamodel.pipeline_options import (
+    EasyOcrOptions,
+)
+from docling.models.factories import get_ocr_factory
+from docling_jobkit.datamodel.convert import ConvertDocumentsOptions
+
+from docling_serve.settings import docling_serve_settings
+
+ocr_factory = get_ocr_factory(
+    allow_external_plugins=docling_serve_settings.allow_external_plugins
+)
+ocr_engines_enum = ocr_factory.get_enum()


-class ConvertDocumentsOptions(BaseModel):
-    from_formats: Annotated[
-        list[InputFormat],
-        Field(
-            description=(
-                "Input format(s) to convert from. String or list of strings. "
-                f"Allowed values: {', '.join([v.value for v in InputFormat])}. "
-                "Optional, defaults to all formats."
-            ),
-            examples=[[v.value for v in InputFormat]],
-        ),
-    ] = list(InputFormat)
-
-    to_formats: Annotated[
-        list[OutputFormat],
-        Field(
-            description=(
-                "Output format(s) to convert to. String or list of strings. "
-                f"Allowed values: {', '.join([v.value for v in OutputFormat])}. "
-                "Optional, defaults to Markdown."
-            ),
-            examples=[[OutputFormat.MARKDOWN]],
-        ),
-    ] = [OutputFormat.MARKDOWN]
-
-    image_export_mode: Annotated[
-        ImageRefMode,
-        Field(
-            description=(
-                "Image export mode for the document (in case of JSON,"
-                " Markdown or HTML). "
-                f"Allowed values: {', '.join([v.value for v in ImageRefMode])}. "
-                "Optional, defaults to Embedded."
-            ),
-            examples=[ImageRefMode.EMBEDDED.value],
-            # pattern="embedded|placeholder|referenced",
-        ),
-    ] = ImageRefMode.EMBEDDED
-
-    do_ocr: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, the bitmap content will be processed using OCR. "
-                "Boolean. Optional, defaults to true"
-            ),
-            # examples=[True],
-        ),
-    ] = True
-
-    force_ocr: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, replace existing text with OCR-generated "
-                "text over content. Boolean. Optional, defaults to false."
-            ),
-            # examples=[False],
-        ),
-    ] = False
-
-    # TODO: use a restricted list based on what is installed on the system
-    ocr_engine: Annotated[
-        OcrEngine,
+class ConvertDocumentsRequestOptions(ConvertDocumentsOptions):
+    ocr_engine: Annotated[  # type: ignore
+        ocr_engines_enum,
        Field(
            description=(
                "The OCR engine to use. String. "
-                "Allowed values: easyocr, tesseract, rapidocr. "
+                f"Allowed values: {', '.join([v.value for v in ocr_engines_enum])}. "
                "Optional, defaults to easyocr."
            ),
-            examples=[OcrEngine.EASYOCR],
+            examples=[EasyOcrOptions.kind],
        ),
-    ] = OcrEngine.EASYOCR
+    ] = ocr_engines_enum(EasyOcrOptions.kind)  # type: ignore

-    ocr_lang: Annotated[
-        Optional[list[str]],
-        Field(
-            description=(
-                "List of languages used by the OCR engine. "
-                "Note that each OCR engine has "
-                "different values for the language names. String or list of strings. "
-                "Optional, defaults to empty."
-            ),
-            examples=[["fr", "de", "es", "en"]],
-        ),
-    ] = None
-
-    pdf_backend: Annotated[
-        PdfBackend,
-        Field(
-            description=(
-                "The PDF backend to use. String. "
-                f"Allowed values: {', '.join([v.value for v in PdfBackend])}. "
-                f"Optional, defaults to {PdfBackend.DLPARSE_V2.value}."
-            ),
-            examples=[PdfBackend.DLPARSE_V2],
-        ),
-    ] = PdfBackend.DLPARSE_V2
-
-    table_mode: Annotated[
-        TableFormerMode,
-        Field(
-            TableFormerMode.FAST,
-            description=(
-                "Mode to use for table structure, String. "
-                f"Allowed values: {', '.join([v.value for v in TableFormerMode])}. "
-                "Optional, defaults to fast."
-            ),
-            examples=[TableFormerMode.FAST],
-            # pattern="fast|accurate",
-        ),
-    ] = TableFormerMode.FAST
-
-    abort_on_error: Annotated[
-        bool,
-        Field(
-            description=(
-                "Abort on error if enabled. Boolean. Optional, defaults to false."
-            ),
-            # examples=[False],
-        ),
-    ] = False
-
-    return_as_file: Annotated[
-        bool,
-        Field(
-            description=(
-                "Return the output as a zip file "
-                "(will happen anyway if multiple files are generated). "
-                "Boolean. Optional, defaults to false."
-            ),
-            examples=[False],
-        ),
-    ] = False
-
-    do_table_structure: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, the table structure will be extracted. "
-                "Boolean. Optional, defaults to true."
-            ),
-            examples=[True],
-        ),
-    ] = True
-
-    include_images: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, images will be extracted from the document. "
-                "Boolean. Optional, defaults to true."
-            ),
-            examples=[True],
-        ),
-    ] = True
-
-    images_scale: Annotated[
+    document_timeout: Annotated[
        float,
        Field(
-            description="Scale factor for images. Float. Optional, defaults to 2.0.",
-            examples=[2.0],
+            description="The timeout for processing each document, in seconds.",
+            gt=0,
+            le=docling_serve_settings.max_document_timeout,
        ),
-    ] = 2.0
+    ] = docling_serve_settings.max_document_timeout
--- a/docling_serve/datamodel/engines.py
+++ b/docling_serve/datamodel/engines.py
@@ -1,30 +0,0 @@
-import enum
-from typing import Optional
-
-from pydantic import BaseModel
-
-from docling_serve.datamodel.requests import ConvertDocumentsRequest
-from docling_serve.datamodel.responses import ConvertDocumentResponse
-
-
-class TaskStatus(str, enum.Enum):
-    SUCCESS = "success"
-    PENDING = "pending"
-    STARTED = "started"
-    FAILURE = "failure"
-
-
-class AsyncEngine(str, enum.Enum):
-    LOCAL = "local"
-
-
-class Task(BaseModel):
-    task_id: str
-    task_status: TaskStatus = TaskStatus.PENDING
-    request: Optional[ConvertDocumentsRequest]
-    result: Optional[ConvertDocumentResponse] = None
-
-    def is_completed(self) -> bool:
-        if self.task_status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]:
-            return True
-        return False
--- a/docling_serve/datamodel/requests.py
+++ b/docling_serve/datamodel/requests.py
@@ -1,62 +1,72 @@
-import base64
-from io import BytesIO
-from typing import Annotated, Any, Union
+import enum
+from typing import Annotated, Literal

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator
+from pydantic_core import PydanticCustomError
+from typing_extensions import Self

-from docling.datamodel.base_models import DocumentStream
+from docling_jobkit.datamodel.http_inputs import FileSource, HttpSource
+from docling_jobkit.datamodel.s3_coords import S3Coordinates
+from docling_jobkit.datamodel.task_targets import (
+    InBodyTarget,
+    S3Target,
+    TaskTarget,
+    ZipTarget,
+)

-from docling_serve.datamodel.convert import ConvertDocumentsOptions
+from docling_serve.datamodel.convert import ConvertDocumentsRequestOptions
+from docling_serve.settings import AsyncEngine, docling_serve_settings
+
+## Sources


-class DocumentsConvertBase(BaseModel):
-    options: ConvertDocumentsOptions = ConvertDocumentsOptions()
+class FileSourceRequest(FileSource):
+    kind: Literal["file"] = "file"


-class HttpSource(BaseModel):
-    url: Annotated[
-        str,
-        Field(
-            description="HTTP url to process",
-            examples=["https://arxiv.org/pdf/2206.01062"],
-        ),
-    ]
-    headers: Annotated[
-        dict[str, Any],
-        Field(
-            description="Additional headers used to fetch the urls, "
-            "e.g. authorization, agent, etc"
-        ),
-    ] = {}
+class HttpSourceRequest(HttpSource):
+    kind: Literal["http"] = "http"


-class FileSource(BaseModel):
-    base64_string: Annotated[
-        str,
-        Field(
-            description="Content of the file serialized in base64. "
-            "For example it can be obtained via "
-            "`base64 -w 0 /path/to/file/pdf-to-convert.pdf`."
-        ),
-    ]
-    filename: Annotated[
-        str,
-        Field(description="Filename of the uploaded document", examples=["file.pdf"]),
-    ]
-
-    def to_document_stream(self) -> DocumentStream:
-        buf = BytesIO(base64.b64decode(self.base64_string))
-        return DocumentStream(stream=buf, name=self.filename)
+class S3SourceRequest(S3Coordinates):
+    kind: Literal["s3"] = "s3"


-class ConvertDocumentHttpSourcesRequest(DocumentsConvertBase):
-    http_sources: list[HttpSource]
+## Multipart targets
+class TargetName(str, enum.Enum):
+    INBODY = InBodyTarget().kind
+    ZIP = ZipTarget().kind


-class ConvertDocumentFileSourcesRequest(DocumentsConvertBase):
-    file_sources: list[FileSource]
-
-
-ConvertDocumentsRequest = Union[
-    ConvertDocumentFileSourcesRequest, ConvertDocumentHttpSourcesRequest
+## Aliases
+SourceRequestItem = Annotated[
+    FileSourceRequest | HttpSourceRequest | S3SourceRequest, Field(discriminator="kind")
 ]
+
+
+## Complete Source request
+class ConvertDocumentsRequest(BaseModel):
+    options: ConvertDocumentsRequestOptions = ConvertDocumentsRequestOptions()
+    sources: list[SourceRequestItem]
+    target: TaskTarget = InBodyTarget()
+
+    @model_validator(mode="after")
+    def validate_s3_source_and_target(self) -> Self:
+        for source in self.sources:
+            if isinstance(source, S3SourceRequest):
+                if docling_serve_settings.eng_kind != AsyncEngine.KFP:
+                    raise PydanticCustomError(
+                        "error source", 'source kind "s3" requires engine kind "KFP"'
+                    )
+                if self.target.kind != "s3":
+                    raise PydanticCustomError(
+                        "error source", 'source kind "s3" requires target kind "s3"'
+                    )
+        if isinstance(self.target, S3Target):
+            for source in self.sources:
+                if isinstance(source, S3SourceRequest):
+                    return self
+            raise PydanticCustomError(
+                "error target", 'target kind "s3" requires source kind "s3"'
+            )
+        return self
--- a/docling_serve/datamodel/responses.py
+++ b/docling_serve/datamodel/responses.py
@@ -6,6 +6,7 @@ from pydantic import BaseModel
 from docling.datamodel.document import ConversionStatus, ErrorItem
 from docling.utils.profiling import ProfilingItem
 from docling_core.types.doc import DoclingDocument
+from docling_jobkit.datamodel.task_meta import TaskProcessingMeta


 # Status
@@ -13,6 +14,10 @@ class HealthCheckResponse(BaseModel):
    status: str = "ok"


+class ClearResponse(BaseModel):
+    status: str = "ok"
+
+
 class DocumentResponse(BaseModel):
    filename: str
    md_content: Optional[str] = None
@@ -30,6 +35,11 @@ class ConvertDocumentResponse(BaseModel):
    timings: dict[str, ProfilingItem] = {}


+class PresignedUrlConvertDocumentResponse(BaseModel):
+    status: ConversionStatus
+    processing_time: float
+
+
 class ConvertDocumentErrorResponse(BaseModel):
    status: ConversionStatus

@@ -38,6 +48,7 @@ class TaskStatusResponse(BaseModel):
    task_id: str
    task_status: str
    task_position: Optional[int] = None
+    task_meta: Optional[TaskProcessingMeta] = None


 class MessageKind(str, enum.Enum):
--- a/docling_serve/docling_conversion.py
+++ b/docling_serve/docling_conversion.py
@@ -1,199 +0,0 @@
-import hashlib
-import json
-import logging
-from collections.abc import Iterable, Iterator
-from pathlib import Path
-from typing import Any, Optional, Union
-
-from fastapi import HTTPException
-
-from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
-from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
-from docling.backend.pdf_backend import PdfDocumentBackend
-from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
-from docling.datamodel.base_models import DocumentStream, InputFormat
-from docling.datamodel.document import ConversionResult
-from docling.datamodel.pipeline_options import (
-    EasyOcrOptions,
-    OcrEngine,
-    OcrOptions,
-    PdfBackend,
-    PdfPipelineOptions,
-    RapidOcrOptions,
-    TableFormerMode,
-    TesseractOcrOptions,
-)
-from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
-from docling_core.types.doc import ImageRefMode
-
-from docling_serve.datamodel.convert import ConvertDocumentsOptions
-from docling_serve.helper_functions import _to_list_of_strings
-from docling_serve.settings import docling_serve_settings
-
-_log = logging.getLogger(__name__)
-
-
-# Document converters will be preloaded and stored in a dictionary
-converters: dict[bytes, DocumentConverter] = {}
-
-
-# Custom serializer for PdfFormatOption
-# (model_dump_json does not work with some classes)
-def _serialize_pdf_format_option(pdf_format_option: PdfFormatOption) -> str:
-    data = pdf_format_option.model_dump()
-
-    # pipeline_options are not fully serialized by model_dump, dedicated pass
-    if pdf_format_option.pipeline_options:
-        data["pipeline_options"] = pdf_format_option.pipeline_options.model_dump()
-
-        # Replace `artifacts_path` with a string representation
-        data["pipeline_options"]["artifacts_path"] = repr(
-            data["pipeline_options"]["artifacts_path"]
-        )
-
-    # Replace `pipeline_cls` with a string representation
-    data["pipeline_cls"] = repr(data["pipeline_cls"])
-
-    # Replace `backend` with a string representation
-    data["backend"] = repr(data["backend"])
-
-    # Handle `device` in `accelerator_options`
-    if "accelerator_options" in data and "device" in data["accelerator_options"]:
-        data["accelerator_options"]["device"] = repr(
-            data["accelerator_options"]["device"]
-        )
-
-    # Serialize the dictionary to JSON with sorted keys to have consistent hashes
-    return json.dumps(data, sort_keys=True)
-
-
-# Computes the PDF pipeline options and returns the PdfFormatOption and its hash
-def get_pdf_pipeline_opts(  # noqa: C901
-    request: ConvertDocumentsOptions,
-) -> tuple[PdfFormatOption, bytes]:
-    if request.ocr_engine == OcrEngine.EASYOCR:
-        try:
-            import easyocr  # noqa: F401
-        except ImportError:
-            raise HTTPException(
-                status_code=400,
-                detail="The requested OCR engine"
-                f" (ocr_engine={request.ocr_engine.value})"
-                " is not available on this system. Please choose another OCR engine "
-                "or contact your system administrator.",
-            )
-        ocr_options: OcrOptions = EasyOcrOptions(force_full_page_ocr=request.force_ocr)
-    elif request.ocr_engine == OcrEngine.TESSERACT:
-        try:
-            import tesserocr  # noqa: F401
-        except ImportError:
-            raise HTTPException(
-                status_code=400,
-                detail="The requested OCR engine"
-                f" (ocr_engine={request.ocr_engine.value})"
-                " is not available on this system. Please choose another OCR engine "
-                "or contact your system administrator.",
-            )
-        ocr_options = TesseractOcrOptions(force_full_page_ocr=request.force_ocr)
-    elif request.ocr_engine == OcrEngine.RAPIDOCR:
-        try:
-            from rapidocr_onnxruntime import RapidOCR  # noqa: F401
-        except ImportError:
-            raise HTTPException(
-                status_code=400,
-                detail="The requested OCR engine"
-                f" (ocr_engine={request.ocr_engine.value})"
-                " is not available on this system. Please choose another OCR engine "
-                "or contact your system administrator.",
-            )
-        ocr_options = RapidOcrOptions(force_full_page_ocr=request.force_ocr)
-    else:
-        raise RuntimeError(f"Unexpected OCR engine type {request.ocr_engine}")
-
-    if request.ocr_lang is not None:
-        if isinstance(request.ocr_lang, str):
-            ocr_options.lang = _to_list_of_strings(request.ocr_lang)
-        else:
-            ocr_options.lang = request.ocr_lang
-
-    pipeline_options = PdfPipelineOptions(
-        do_ocr=request.do_ocr,
-        ocr_options=ocr_options,
-        do_table_structure=request.do_table_structure,
-    )
-    pipeline_options.table_structure_options.do_cell_matching = True  # do_cell_matching
-    pipeline_options.table_structure_options.mode = TableFormerMode(request.table_mode)
-
-    if request.image_export_mode != ImageRefMode.PLACEHOLDER:
-        pipeline_options.generate_page_images = True
-        if request.images_scale:
-            pipeline_options.images_scale = request.images_scale
-
-    if request.pdf_backend == PdfBackend.DLPARSE_V1:
-        backend: type[PdfDocumentBackend] = DoclingParseDocumentBackend
-    elif request.pdf_backend == PdfBackend.DLPARSE_V2:
-        backend = DoclingParseV2DocumentBackend
-    elif request.pdf_backend == PdfBackend.PYPDFIUM2:
-        backend = PyPdfiumDocumentBackend
-    else:
-        raise RuntimeError(f"Unexpected PDF backend type {request.pdf_backend}")
-
-    if docling_serve_settings.artifacts_path is not None:
-        if str(docling_serve_settings.artifacts_path.absolute()) == "":
-            _log.info(
-                "artifacts_path is an empty path, model weights will be dowloaded "
-                "at runtime."
-            )
-            pipeline_options.artifacts_path = None
-        elif docling_serve_settings.artifacts_path.is_dir():
-            _log.info(
-                "artifacts_path is set to a valid directory. "
-                "No model weights will be downloaded at runtime."
-            )
-            pipeline_options.artifacts_path = docling_serve_settings.artifacts_path
-        else:
-            _log.warning(
-                "artifacts_path is set to an invalid directory. "
-                "The system will download the model weights at runtime."
-            )
-            pipeline_options.artifacts_path = None
-    else:
-        _log.info(
-            "artifacts_path is unset. "
-            "The system will download the model weights at runtime."
-        )
-
-    pdf_format_option = PdfFormatOption(
-        pipeline_options=pipeline_options,
-        backend=backend,
-    )
-
-    serialized_data = _serialize_pdf_format_option(pdf_format_option)
-
-    options_hash = hashlib.sha1(serialized_data.encode()).digest()
-
-    return pdf_format_option, options_hash
-
-
-def convert_documents(
-    sources: Iterable[Union[Path, str, DocumentStream]],
-    options: ConvertDocumentsOptions,
-    headers: Optional[dict[str, Any]] = None,
-):
-    pdf_format_option, options_hash = get_pdf_pipeline_opts(options)
-
-    if options_hash not in converters:
-        format_options: dict[InputFormat, FormatOption] = {
-            InputFormat.PDF: pdf_format_option,
-            InputFormat.IMAGE: pdf_format_option,
-        }
-
-        converters[options_hash] = DocumentConverter(format_options=format_options)
-        _log.info(f"We now have {len(converters)} converters in memory.")
-
-    results: Iterator[ConversionResult] = converters[options_hash].convert_all(
-        sources,
-        headers=headers,
-    )
-
-    return results
--- a/docling_serve/engines/init.py
+++ b/docling_serve/engines/init.py
@@ -1,8 +0,0 @@
-from functools import lru_cache
-
-from docling_serve.engines.async_local.orchestrator import AsyncLocalOrchestrator
-
-
-@lru_cache
-def get_orchestrator() -> AsyncLocalOrchestrator:
-    return AsyncLocalOrchestrator()
--- a/docling_serve/engines/async_local/init.py
+++ b/docling_serve/engines/async_local/init.py
--- a/docling_serve/engines/async_local/orchestrator.py
+++ b/docling_serve/engines/async_local/orchestrator.py
@@ -1,101 +0,0 @@
-import asyncio
-import logging
-import uuid
-from typing import Optional
-
-from fastapi import WebSocket
-
-from docling_serve.datamodel.engines import Task, TaskStatus
-from docling_serve.datamodel.requests import ConvertDocumentsRequest
-from docling_serve.datamodel.responses import (
-    MessageKind,
-    TaskStatusResponse,
-    WebsocketMessage,
-)
-from docling_serve.engines.async_local.worker import AsyncLocalWorker
-from docling_serve.engines.base_orchestrator import BaseOrchestrator
-from docling_serve.settings import docling_serve_settings
-
-_log = logging.getLogger(__name__)
-
-
-class OrchestratorError(Exception):
-    pass
-
-
-class TaskNotFoundError(OrchestratorError):
-    pass
-
-
-class AsyncLocalOrchestrator(BaseOrchestrator):
-    def __init__(self):
-        self.task_queue = asyncio.Queue()
-        self.tasks: dict[str, Task] = {}
-        self.queue_list: list[str] = []
-        self.task_subscribers: dict[str, set[WebSocket]] = {}
-
-    async def enqueue(self, request: ConvertDocumentsRequest) -> Task:
-        task_id = str(uuid.uuid4())
-        task = Task(task_id=task_id, request=request)
-        self.tasks[task_id] = task
-        self.queue_list.append(task_id)
-        self.task_subscribers[task_id] = set()
-        await self.task_queue.put(task_id)
-        return task
-
-    async def queue_size(self) -> int:
-        return self.task_queue.qsize()
-
-    async def get_queue_position(self, task_id: str) -> Optional[int]:
-        return (
-            self.queue_list.index(task_id) + 1 if task_id in self.queue_list else None
-        )
-
-    async def task_status(self, task_id: str, wait: float = 0.0) -> Task:
-        if task_id not in self.tasks:
-            raise TaskNotFoundError()
-        return self.tasks[task_id]
-
-    async def task_result(self, task_id: str):
-        if task_id not in self.tasks:
-            raise TaskNotFoundError()
-        return self.tasks[task_id].result
-
-    async def process_queue(self):
-        # Create a pool of workers
-        workers = []
-        for i in range(docling_serve_settings.eng_loc_num_workers):
-            _log.debug(f"Starting worker {i}")
-            w = AsyncLocalWorker(i, self)
-            worker_task = asyncio.create_task(w.loop())
-            workers.append(worker_task)
-
-        # Wait for all workers to complete (they won't, as they run indefinitely)
-        await asyncio.gather(*workers)
-        _log.debug("All workers completed.")
-
-    async def notify_task_subscribers(self, task_id: str):
-        if task_id not in self.task_subscribers:
-            raise RuntimeError(f"Task {task_id} does not have a subscribers list.")
-
-        task = self.tasks[task_id]
-        task_queue_position = await self.get_queue_position(task_id)
-        msg = TaskStatusResponse(
-            task_id=task.task_id,
-            task_status=task.task_status,
-            task_position=task_queue_position,
-        )
-        for websocket in self.task_subscribers[task_id]:
-            await websocket.send_text(
-                WebsocketMessage(message=MessageKind.UPDATE, task=msg).model_dump_json()
-            )
-            if task.is_completed():
-                await websocket.close()
-
-    async def notify_queue_positions(self):
-        for task_id in self.task_subscribers.keys():
-            # notify only pending tasks
-            if self.tasks[task_id].task_status != TaskStatus.PENDING:
-                continue
-
-            await self.notify_task_subscribers(task_id)
--- a/docling_serve/engines/async_local/worker.py
+++ b/docling_serve/engines/async_local/worker.py
@@ -1,116 +0,0 @@
-import asyncio
-import logging
-import time
-from typing import TYPE_CHECKING, Any, Optional, Union
-
-from fastapi import BackgroundTasks
-
-from docling.datamodel.base_models import DocumentStream
-
-from docling_serve.datamodel.engines import TaskStatus
-from docling_serve.datamodel.requests import ConvertDocumentFileSourcesRequest
-from docling_serve.datamodel.responses import ConvertDocumentResponse
-from docling_serve.docling_conversion import convert_documents
-from docling_serve.response_preparation import process_results
-
-if TYPE_CHECKING:
-    from docling_serve.engines.async_local.orchestrator import AsyncLocalOrchestrator
-
-_log = logging.getLogger(__name__)
-
-
-class AsyncLocalWorker:
-    def __init__(self, worker_id: int, orchestrator: "AsyncLocalOrchestrator"):
-        self.worker_id = worker_id
-        self.orchestrator = orchestrator
-
-    async def loop(self):
-        _log.debug(f"Starting loop for worker {self.worker_id}")
-        while True:
-            task_id: str = await self.orchestrator.task_queue.get()
-            self.orchestrator.queue_list.remove(task_id)
-
-            if task_id not in self.orchestrator.tasks:
-                raise RuntimeError(f"Task {task_id} not found.")
-            task = self.orchestrator.tasks[task_id]
-
-            try:
-                task.task_status = TaskStatus.STARTED
-                _log.info(f"Worker {self.worker_id} processing task {task_id}")
-
-                # Notify clients about task updates
-                await self.orchestrator.notify_task_subscribers(task_id)
-
-                # Notify clients about queue updates
-                await self.orchestrator.notify_queue_positions()
-
-                # Get the current event loop
-                asyncio.get_event_loop()
-
-                # Define a callback function to send progress updates to the client.
-                # TODO: send partial updates, e.g. when a document in the batch is done
-                def run_conversion():
-                    sources: list[Union[str, DocumentStream]] = []
-                    headers: Optional[dict[str, Any]] = None
-                    if isinstance(task.request, ConvertDocumentFileSourcesRequest):
-                        for file_source in task.request.file_sources:
-                            sources.append(file_source.to_document_stream())
-                    else:
-                        for http_source in task.request.http_sources:
-                            sources.append(http_source.url)
-                            if headers is None and http_source.headers:
-                                headers = http_source.headers
-
-                    # Note: results are only an iterator->lazy evaluation
-                    results = convert_documents(
-                        sources=sources,
-                        options=task.request.options,
-                        headers=headers,
-                    )
-
-                    # The real processing will happen here
-                    response = process_results(
-                        background_tasks=BackgroundTasks(),
-                        conversion_options=task.request.options,
-                        conv_results=results,
-                    )
-
-                    return response
-
-                # Run the prediction in a thread to avoid blocking the event loop.
-                start_time = time.monotonic()
-                # future = asyncio.run_coroutine_threadsafe(
-                #     run_conversion(),
-                #     loop=loop
-                # )
-                # response = future.result()
-
-                response = await asyncio.to_thread(
-                    run_conversion,
-                )
-                processing_time = time.monotonic() - start_time
-
-                if not isinstance(response, ConvertDocumentResponse):
-                    _log.error(
-                        f"Worker {self.worker_id} got un-processable "
-                        "result for {task_id}: {type(response)}"
-                    )
-                task.result = response
-                task.request = None
-
-                task.task_status = TaskStatus.SUCCESS
-                _log.info(
-                    f"Worker {self.worker_id} completed job {task_id} "
-                    f"in {processing_time:.2f} seconds"
-                )
-
-            except Exception as e:
-                _log.error(
-                    f"Worker {self.worker_id} failed to process job {task_id}: {e}"
-                )
-                task.task_status = TaskStatus.FAILURE
-
-            finally:
-                await self.orchestrator.notify_task_subscribers(task_id)
-                self.orchestrator.task_queue.task_done()
-                _log.debug(f"Worker {self.worker_id} completely done with {task_id}")
--- a/docling_serve/engines/base_orchestrator.py
+++ b/docling_serve/engines/base_orchestrator.py
@@ -1,21 +0,0 @@
-from abc import ABC, abstractmethod
-
-from docling_serve.datamodel.engines import Task
-
-
-class BaseOrchestrator(ABC):
-    @abstractmethod
-    async def enqueue(self, task) -> Task:
-        pass
-
-    @abstractmethod
-    async def queue_size(self) -> int:
-        pass
-
-    @abstractmethod
-    async def task_status(self, task_id: str) -> Task:
-        pass
-
-    @abstractmethod
-    async def task_result(self, task_id: str):
-        pass
--- a/docling_serve/engines/block_local/init.py
+++ b/docling_serve/engines/block_local/init.py
--- a/docling_serve/gradio_ui.py
+++ b/docling_serve/gradio_ui.py
@@ -1,22 +1,50 @@
+import base64
 import importlib
+import itertools
 import json
 import logging
-import os
+import ssl
 import tempfile
+import time
 from pathlib import Path
+from typing import Optional

+import certifi
 import gradio as gr
-import requests
+import httpx
+
+from docling.datamodel.base_models import FormatToExtensions
+from docling.datamodel.pipeline_options import (
+    PdfBackend,
+    ProcessingPipeline,
+    TableFormerMode,
+    TableStructureOptions,
+)

 from docling_serve.helper_functions import _to_list_of_strings
+from docling_serve.settings import docling_serve_settings, uvicorn_settings

 logger = logging.getLogger(__name__)

+############################
+# Path of static artifacts #
+############################
+
+logo_path = "https://raw.githubusercontent.com/docling-project/docling/refs/heads/main/docs/assets/logo.svg"
+js_components_url = "https://unpkg.com/@docling/docling-components@0.0.7"
+if (
+    docling_serve_settings.static_path is not None
+    and docling_serve_settings.static_path.is_dir()
+):
+    logo_path = str(docling_serve_settings.static_path / "logo.svg")
+    js_components_url = "/static/docling-components.js"
+
+
 ##############################
 # Head JS for web components #
 ##############################
-head = """
-    <script src="https://unpkg.com/@docling/docling-components@0.0.3" type="module"></script>
+head = f"""
+    <script src="{js_components_url}" type="module"></script>
 """

 #################
@@ -57,7 +85,7 @@ css = """
    height: 140px;
 }

-docling-img::part(pages) {
+docling-img {
    gap: 1rem;
 }

@@ -95,8 +123,29 @@ file_output_path = None  # Will be set when a new file is generated
 #############


+def get_api_endpoint() -> str:
+    protocol = "http"
+    if uvicorn_settings.ssl_keyfile is not None:
+        protocol = "https"
+    return f"{protocol}://{docling_serve_settings.api_host}:{uvicorn_settings.port}"
+
+
+def get_ssl_context() -> ssl.SSLContext:
+    ctx = ssl.create_default_context(cafile=certifi.where())
+    kube_sa_ca_cert_path = Path(
+        "/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
+    )
+    if (
+        uvicorn_settings.ssl_keyfile is not None
+        and ".svc." in docling_serve_settings.api_host
+        and kube_sa_ca_cert_path.exists()
+    ):
+        ctx.load_verify_locations(cafile=kube_sa_ca_cert_path)
+    return ctx
+
+
 def health_check():
-    response = requests.get(f"http://localhost:{int(os.getenv('PORT', '5001'))}/health")
+    response = httpx.get(f"{get_api_endpoint()}/health")
    if response.status_code == 200:
        return "Healthy"
    return "Unhealthy"
@@ -112,6 +161,11 @@ def set_outputs_visibility_direct(x, y):
    return content, file


+def set_task_id_visibility(x):
+    task_id_row = gr.Row(visible=x)
+    return task_id_row
+
+
 def set_outputs_visibility_process(x):
    content = gr.Row(visible=not x)
    file = gr.Row(visible=x)
@@ -123,6 +177,7 @@ def set_download_button_label(label_text: gr.State):


 def clear_outputs():
+    task_id_rendered = ""
    markdown_content = ""
    json_content = ""
    json_rendered_content = ""
@@ -131,6 +186,7 @@ def clear_outputs():
    doctags_content = ""

    return (
+        task_id_rendered,
        markdown_content,
        markdown_content,
        json_content,
@@ -150,12 +206,16 @@ def clear_file_input():
    return None


-def auto_set_return_as_file(url_input, file_input, image_export_mode):
+def auto_set_return_as_file(
+    url_input_value: str,
+    file_input_value: Optional[list[str]],
+    image_export_mode_value: str,
+):
    # If more than one input source is provided, return as file
    if (
-        (len(url_input.split(",")) > 1)
-        or (file_input and len(file_input) > 1)
-        or (image_export_mode == "referenced")
+        (len(url_input_value.split(",")) > 1)
+        or (file_input_value and len(file_input_value) > 1)
+        or (image_export_mode_value == "referenced")
    ):
        return True
    else:
@@ -173,10 +233,56 @@ def change_ocr_lang(ocr_engine):
        return "english,chinese"


+def wait_task_finish(task_id: str, return_as_file: bool):
+    conversion_sucess = False
+    task_finished = False
+    task_status = ""
+    ssl_ctx = get_ssl_context()
+    while not task_finished:
+        try:
+            response = httpx.get(
+                f"{get_api_endpoint()}/v1/status/poll/{task_id}?wait=5",
+                verify=ssl_ctx,
+                timeout=15,
+            )
+            task_status = response.json()["task_status"]
+            if task_status == "success":
+                conversion_sucess = True
+                task_finished = True
+
+            if task_status in ("failure", "revoked"):
+                conversion_sucess = False
+                task_finished = True
+                raise RuntimeError(f"Task failed with status {task_status!r}")
+            time.sleep(5)
+        except Exception as e:
+            logger.error(f"Error processing file(s): {e}")
+            conversion_sucess = False
+            task_finished = True
+            raise gr.Error(f"Error processing file(s): {e}", print_exception=False)
+
+    if conversion_sucess:
+        try:
+            response = httpx.get(
+                f"{get_api_endpoint()}/v1/result/{task_id}",
+                timeout=15,
+                verify=ssl_ctx,
+            )
+            output = response_to_output(response, return_as_file)
+            return output
+        except Exception as e:
+            logger.error(f"Error getting task result: {e}")
+
+    raise gr.Error(
+        f"Error getting task result, conversion finished with status: {task_status}"
+    )
+
+
 def process_url(
    input_sources,
    to_formats,
    image_export_mode,
+    pipeline,
    ocr,
    force_ocr,
    ocr_engine,
@@ -185,12 +291,20 @@ def process_url(
    table_mode,
    abort_on_error,
    return_as_file,
+    do_code_enrichment,
+    do_formula_enrichment,
+    do_picture_classification,
+    do_picture_description,
 ):
+    target = {"kind": "zip" if return_as_file else "inbody"}
    parameters = {
-        "http_sources": [{"url": source} for source in input_sources.split(",")],
+        "sources": [
+            {"kind": "http", "url": source} for source in input_sources.split(",")
+        ],
        "options": {
            "to_formats": to_formats,
            "image_export_mode": image_export_mode,
+            "pipeline": pipeline,
            "ocr": ocr,
            "force_ocr": force_ocr,
            "ocr_engine": ocr_engine,
@@ -198,20 +312,27 @@ def process_url(
            "pdf_backend": pdf_backend,
            "table_mode": table_mode,
            "abort_on_error": abort_on_error,
-            "return_as_file": return_as_file,
+            "do_code_enrichment": do_code_enrichment,
+            "do_formula_enrichment": do_formula_enrichment,
+            "do_picture_classification": do_picture_classification,
+            "do_picture_description": do_picture_description,
        },
+        "target": target,
    }
    if (
-        not parameters["http_sources"]
-        or len(parameters["http_sources"]) == 0
-        or parameters["http_sources"][0]["url"] == ""
+        not parameters["sources"]
+        or len(parameters["sources"]) == 0
+        or parameters["sources"][0]["url"] == ""
    ):
        logger.error("No input sources provided.")
        raise gr.Error("No input sources provided.", print_exception=False)
    try:
-        response = requests.post(
-            f"http://localhost:{int(os.getenv('PORT', '5001'))}/v1alpha/convert/source",
+        ssl_ctx = get_ssl_context()
+        response = httpx.post(
+            f"{get_api_endpoint()}/v1/convert/source/async",
            json=parameters,
+            verify=ssl_ctx,
+            timeout=60,
        )
    except Exception as e:
        logger.error(f"Error processing URL: {e}")
@@ -221,14 +342,22 @@ def process_url(
        error_message = data.get("detail", "An unknown error occurred.")
        logger.error(f"Error processing file: {error_message}")
        raise gr.Error(f"Error processing file: {error_message}", print_exception=False)
-    output = response_to_output(response, return_as_file)
-    return output
+
+    task_id_rendered = response.json()["task_id"]
+    return task_id_rendered
+
+
+def file_to_base64(file):
+    with open(file.name, "rb") as f:
+        encoded_string = base64.b64encode(f.read()).decode("utf-8")
+    return encoded_string


 def process_file(
    files,
    to_formats,
    image_export_mode,
+    pipeline,
    ocr,
    force_ocr,
    ocr_engine,
@@ -237,30 +366,49 @@ def process_file(
    table_mode,
    abort_on_error,
    return_as_file,
+    do_code_enrichment,
+    do_formula_enrichment,
+    do_picture_classification,
+    do_picture_description,
 ):
-    if not files or len(files) == 0 or files[0] == "":
+    if not files or len(files) == 0:
        logger.error("No files provided.")
        raise gr.Error("No files provided.", print_exception=False)
-    files_data = [("files", (file.name, open(file.name, "rb"))) for file in files]
+    files_data = [
+        {"kind": "file", "base64_string": file_to_base64(file), "filename": file.name}
+        for file in files
+    ]
+    target = {"kind": "zip" if return_as_file else "inbody"}

    parameters = {
-        "to_formats": to_formats,
-        "image_export_mode": image_export_mode,
-        "ocr": str(ocr).lower(),
-        "force_ocr": str(force_ocr).lower(),
-        "ocr_engine": ocr_engine,
-        "ocr_lang": _to_list_of_strings(ocr_lang),
-        "pdf_backend": pdf_backend,
-        "table_mode": table_mode,
-        "abort_on_error": str(abort_on_error).lower(),
-        "return_as_file": str(return_as_file).lower(),
+        "sources": files_data,
+        "options": {
+            "to_formats": to_formats,
+            "image_export_mode": image_export_mode,
+            "pipeline": pipeline,
+            "ocr": ocr,
+            "force_ocr": force_ocr,
+            "ocr_engine": ocr_engine,
+            "ocr_lang": _to_list_of_strings(ocr_lang),
+            "pdf_backend": pdf_backend,
+            "table_mode": table_mode,
+            "abort_on_error": abort_on_error,
+            "return_as_file": return_as_file,
+            "do_code_enrichment": do_code_enrichment,
+            "do_formula_enrichment": do_formula_enrichment,
+            "do_picture_classification": do_picture_classification,
+            "do_picture_description": do_picture_description,
+        },
+        "target": target,
    }

    try:
-        response = requests.post(
-            f"http://localhost:{int(os.getenv('PORT', '5001'))}/v1alpha/convert/file",
-            files=files_data,
-            data=parameters,
+        ssl_ctx = get_ssl_context()
+        response = httpx.post(
+            f"{get_api_endpoint()}/v1/convert/source/async",
+            json=parameters,
+            verify=ssl_ctx,
+            timeout=60,
        )
    except Exception as e:
        logger.error(f"Error processing file(s): {e}")
@@ -270,8 +418,9 @@ def process_file(
        error_message = data.get("detail", "An unknown error occurred.")
        logger.error(f"Error processing file: {error_message}")
        raise gr.Error(f"Error processing file: {error_message}", print_exception=False)
-    output = response_to_output(response, return_as_file)
-    return output
+
+    task_id_rendered = response.json()["task_id"]
+    return task_id_rendered


 def response_to_output(response, return_as_file):
@@ -302,7 +451,7 @@ def response_to_output(response, return_as_file):
        )
        # Embed document JSON and trigger load at client via an image.
        json_rendered_content = f"""
-            <docling-img id="dclimg" pagenumbers tooltip="parsed"></docling-img>
+            <docling-img id="dclimg" pagenumbers><docling-tooltip></docling-tooltip></docling-img>
            <script id="dcljson" type="application/json" onload="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);">{json_content}</script>
            <img src onerror="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);" />
            """
@@ -342,17 +491,21 @@ with gr.Blocks(
    with gr.Row(elem_id="check_health"):
        # Logo
        with gr.Column(scale=1, min_width=90):
-            gr.Image(
-                "https://ds4sd.github.io/docling/assets/logo.png",
-                height=80,
-                width=80,
-                show_download_button=False,
-                show_label=False,
-                show_fullscreen_button=False,
-                container=False,
-                elem_id="logo",
-                scale=0,
-            )
+            try:
+                gr.Image(
+                    logo_path,
+                    height=80,
+                    width=80,
+                    show_download_button=False,
+                    show_label=False,
+                    show_fullscreen_button=False,
+                    container=False,
+                    elem_id="logo",
+                    scale=0,
+                )
+            except Exception:
+                logger.warning("Logo not found.")
+
        # Title
        with gr.Column(scale=1, min_width=200):
            gr.Markdown(
@@ -381,43 +534,35 @@ with gr.Blocks(
            )

    # URL Processing Tab
-    with gr.Tab("Convert URL(s)"):
+    with gr.Tab("Convert URL"):
        with gr.Row():
            with gr.Column(scale=4):
                url_input = gr.Textbox(
-                    label="Input Sources (comma-separated URLs)",
-                    placeholder="https://arxiv.org/pdf/2206.01062",
+                    label="URL Input Source",
+                    placeholder="https://arxiv.org/pdf/2501.17887",
                )
            with gr.Column(scale=1):
-                url_process_btn = gr.Button("Process URL(s)", scale=1)
+                url_process_btn = gr.Button("Process URL", scale=1)
                url_reset_btn = gr.Button("Reset", scale=1)

    # File Processing Tab
-    with gr.Tab("Convert File(s)"):
+    with gr.Tab("Convert File"):
        with gr.Row():
            with gr.Column(scale=4):
                file_input = gr.File(
                    elem_id="file_input_zone",
-                    label="Upload Files",
+                    label="Upload File",
                    file_types=[
-                        ".pdf",
-                        ".docx",
-                        ".pptx",
-                        ".html",
-                        ".xlsx",
-                        ".asciidoc",
-                        ".txt",
-                        ".md",
-                        ".jpg",
-                        ".jpeg",
-                        ".png",
-                        ".gif",
+                        f".{v}"
+                        for v in itertools.chain.from_iterable(
+                            FormatToExtensions.values()
+                        )
                    ],
                    file_count="multiple",
                    scale=4,
                )
            with gr.Column(scale=1):
-                file_process_btn = gr.Button("Process File(s)", scale=1)
+                file_process_btn = gr.Button("Process File", scale=1)
                file_reset_btn = gr.Button("Reset", scale=1)

    # Options
@@ -426,14 +571,14 @@ with gr.Blocks(
            with gr.Column(scale=1):
                to_formats = gr.CheckboxGroup(
                    [
-                        ("Markdown", "md"),
                        ("Docling (JSON)", "json"),
+                        ("Markdown", "md"),
                        ("HTML", "html"),
                        ("Plain Text", "text"),
                        ("Doc Tags", "doctags"),
                    ],
                    label="To Formats",
-                    value=["md"],
+                    value=["json", "md"],
                )
            with gr.Column(scale=1):
                image_export_mode = gr.Radio(
@@ -445,6 +590,13 @@ with gr.Blocks(
                    label="Image Export Mode",
                    value="embedded",
                )
+        with gr.Row():
+            with gr.Column(scale=1, min_width=200):
+                pipeline = gr.Radio(
+                    [(v.value.capitalize(), v.value) for v in ProcessingPipeline],
+                    label="Pipeline type",
+                    value=ProcessingPipeline.STANDARD.value,
+                )
        with gr.Row():
            with gr.Column(scale=1, min_width=200):
                ocr = gr.Checkbox(label="Enable OCR", value=True)
@@ -465,32 +617,53 @@ with gr.Blocks(
                )
            ocr_engine.change(change_ocr_lang, inputs=[ocr_engine], outputs=[ocr_lang])
        with gr.Row():
-            with gr.Column(scale=2):
+            with gr.Column(scale=4):
                pdf_backend = gr.Radio(
-                    ["pypdfium2", "dlparse_v1", "dlparse_v2"],
+                    [v.value for v in PdfBackend],
                    label="PDF Backend",
-                    value="dlparse_v2",
+                    value=PdfBackend.DLPARSE_V4.value,
                )
            with gr.Column(scale=2):
                table_mode = gr.Radio(
-                    ["fast", "accurate"], label="Table Mode", value="fast"
+                    [(v.value.capitalize(), v.value) for v in TableFormerMode],
+                    label="Table Mode",
+                    value=TableStructureOptions().mode.value,
                )
            with gr.Column(scale=1):
                abort_on_error = gr.Checkbox(label="Abort on Error", value=False)
                return_as_file = gr.Checkbox(label="Return as File", value=False)
+        with gr.Row():
+            with gr.Column():
+                do_code_enrichment = gr.Checkbox(
+                    label="Enable code enrichment", value=False
+                )
+                do_formula_enrichment = gr.Checkbox(
+                    label="Enable formula enrichment", value=False
+                )
+            with gr.Column():
+                do_picture_classification = gr.Checkbox(
+                    label="Enable picture classification", value=False
+                )
+                do_picture_description = gr.Checkbox(
+                    label="Enable picture description", value=False
+                )
+
+    # Task id output
+    with gr.Row(visible=False) as task_id_output:
+        task_id_rendered = gr.Textbox(label="Task id", interactive=False)

    # Document output
    with gr.Row(visible=False) as content_output:
+        with gr.Tab("Docling (JSON)"):
+            output_json = gr.Code(language="json", wrap_lines=True, show_label=False)
+        with gr.Tab("Docling-Rendered"):
+            output_json_rendered = gr.HTML(label="Response")
        with gr.Tab("Markdown"):
            output_markdown = gr.Code(
                language="markdown", wrap_lines=True, show_label=False
            )
        with gr.Tab("Markdown-Rendered"):
            output_markdown_rendered = gr.Markdown(label="Response")
-        with gr.Tab("Docling (JSON)"):
-            output_json = gr.Code(language="json", wrap_lines=True, show_label=False)
-        with gr.Tab("Docling-Rendered"):
-            output_json_rendered = gr.HTML()
        with gr.Tab("HTML"):
            output_html = gr.Code(language="html", wrap_lines=True, show_label=False)
        with gr.Tab("HTML-Rendered"):
@@ -530,14 +703,11 @@ with gr.Blocks(
        set_options_visibility, inputs=[false_bool], outputs=[options]
    ).then(
        set_download_button_label, inputs=[processing_text], outputs=[download_file_btn]
-    ).then(
-        set_outputs_visibility_process,
-        inputs=[return_as_file],
-        outputs=[content_output, file_output],
    ).then(
        clear_outputs,
        inputs=None,
        outputs=[
+            task_id_rendered,
            output_markdown,
            output_markdown_rendered,
            output_json,
@@ -547,12 +717,17 @@ with gr.Blocks(
            output_text,
            output_doctags,
        ],
+    ).then(
+        set_task_id_visibility,
+        inputs=[true_bool],
+        outputs=[task_id_output],
    ).then(
        process_url,
        inputs=[
            url_input,
            to_formats,
            image_export_mode,
+            pipeline,
            ocr,
            force_ocr,
            ocr_engine,
@@ -561,7 +736,21 @@ with gr.Blocks(
            table_mode,
            abort_on_error,
            return_as_file,
+            do_code_enrichment,
+            do_formula_enrichment,
+            do_picture_classification,
+            do_picture_description,
        ],
+        outputs=[
+            task_id_rendered,
+        ],
+    ).then(
+        set_outputs_visibility_process,
+        inputs=[return_as_file],
+        outputs=[content_output, file_output],
+    ).then(
+        wait_task_finish,
+        inputs=[task_id_rendered, return_as_file],
        outputs=[
            output_markdown,
            output_markdown_rendered,
@@ -592,21 +781,20 @@ with gr.Blocks(
        set_outputs_visibility_direct,
        inputs=[false_bool, false_bool],
        outputs=[content_output, file_output],
-    ).then(clear_url_input, inputs=None, outputs=[url_input])
+    ).then(set_task_id_visibility, inputs=[false_bool], outputs=[task_id_output]).then(
+        clear_url_input, inputs=None, outputs=[url_input]
+    )

    # File processing
    file_process_btn.click(
        set_options_visibility, inputs=[false_bool], outputs=[options]
    ).then(
        set_download_button_label, inputs=[processing_text], outputs=[download_file_btn]
-    ).then(
-        set_outputs_visibility_process,
-        inputs=[return_as_file],
-        outputs=[content_output, file_output],
    ).then(
        clear_outputs,
        inputs=None,
        outputs=[
+            task_id_rendered,
            output_markdown,
            output_markdown_rendered,
            output_json,
@@ -616,12 +804,17 @@ with gr.Blocks(
            output_text,
            output_doctags,
        ],
+    ).then(
+        set_task_id_visibility,
+        inputs=[true_bool],
+        outputs=[task_id_output],
    ).then(
        process_file,
        inputs=[
            file_input,
            to_formats,
            image_export_mode,
+            pipeline,
            ocr,
            force_ocr,
            ocr_engine,
@@ -630,7 +823,21 @@ with gr.Blocks(
            table_mode,
            abort_on_error,
            return_as_file,
+            do_code_enrichment,
+            do_formula_enrichment,
+            do_picture_classification,
+            do_picture_description,
        ],
+        outputs=[
+            task_id_rendered,
+        ],
+    ).then(
+        set_outputs_visibility_process,
+        inputs=[return_as_file],
+        outputs=[content_output, file_output],
+    ).then(
+        wait_task_finish,
+        inputs=[task_id_rendered, return_as_file],
        outputs=[
            output_markdown,
            output_markdown_rendered,
@@ -661,4 +868,6 @@ with gr.Blocks(
        set_outputs_visibility_direct,
        inputs=[false_bool, false_bool],
        outputs=[content_output, file_output],
-    ).then(clear_file_input, inputs=None, outputs=[file_input])
+    ).then(set_task_id_visibility, inputs=[false_bool], outputs=[task_id_output]).then(
+        clear_file_input, inputs=None, outputs=[file_input]
+    )
--- a/docling_serve/helper_functions.py
+++ b/docling_serve/helper_functions.py
@@ -1,9 +1,30 @@
 import inspect
+import json
 import re
-from typing import Union
+from typing import Union, get_args, get_origin

 from fastapi import Depends, Form
-from pydantic import BaseModel
+from pydantic import BaseModel, TypeAdapter
+
+
+def is_pydantic_model(type_):
+    try:
+        if inspect.isclass(type_) and issubclass(type_, BaseModel):
+            return True
+
+        origin = get_origin(type_)
+        if origin is Union:
+            args = get_args(type_)
+            return any(
+                inspect.isclass(arg) and issubclass(arg, BaseModel)
+                for arg in args
+                if arg is not type(None)
+            )
+
+    except Exception:
+        pass
+
+    return False


 # Adapted from
@@ -12,25 +33,62 @@ def FormDepends(cls: type[BaseModel]):
    new_parameters = []

    for field_name, model_field in cls.model_fields.items():
+        annotation = model_field.annotation
+        description = model_field.description
+        default = (
+            Form(..., description=description, examples=model_field.examples)
+            if model_field.is_required()
+            else Form(
+                model_field.default,
+                examples=model_field.examples,
+                description=description,
+            )
+        )
+
+        # Flatten nested Pydantic models by accepting them as JSON strings
+        if is_pydantic_model(annotation):
+            annotation = str
+            default = Form(
+                None
+                if model_field.default is None
+                else json.dumps(model_field.default.model_dump(mode="json")),
+                description=description,
+                examples=None
+                if not model_field.examples
+                else [
+                    json.dumps(ex.model_dump(mode="json"))
+                    for ex in model_field.examples
+                ],
+            )
+
        new_parameters.append(
            inspect.Parameter(
                name=field_name,
                kind=inspect.Parameter.POSITIONAL_ONLY,
-                default=(
-                    Form(...)
-                    if model_field.is_required()
-                    else Form(model_field.default)
-                ),
-                annotation=model_field.annotation,
+                default=default,
+                annotation=annotation,
            )
        )

    async def as_form_func(**data):
+        for field_name, model_field in cls.model_fields.items():
+            value = data.get(field_name)
+            annotation = model_field.annotation
+
+            # Parse nested models from JSON string
+            if value is not None and is_pydantic_model(annotation):
+                try:
+                    validator = TypeAdapter(annotation)
+                    data[field_name] = validator.validate_json(value)
+                except Exception as e:
+                    raise ValueError(f"Invalid JSON for field '{field_name}': {e}")
+
        return cls(**data)

    sig = inspect.signature(as_form_func)
    sig = sig.replace(parameters=new_parameters)
    as_form_func.__signature__ = sig  # type: ignore
+
    return Depends(as_form_func)


--- a/docling_serve/orchestrator_factory.py
+++ b/docling_serve/orchestrator_factory.py
@@ -0,0 +1,52 @@
+from functools import lru_cache
+
+from docling_jobkit.orchestrators.base_orchestrator import BaseOrchestrator
+
+from docling_serve.settings import AsyncEngine, docling_serve_settings
+
+
+@lru_cache
+def get_async_orchestrator() -> BaseOrchestrator:
+    if docling_serve_settings.eng_kind == AsyncEngine.LOCAL:
+        from docling_jobkit.convert.manager import (
+            DoclingConverterManager,
+            DoclingConverterManagerConfig,
+        )
+        from docling_jobkit.orchestrators.local.orchestrator import (
+            LocalOrchestrator,
+            LocalOrchestratorConfig,
+        )
+
+        local_config = LocalOrchestratorConfig(
+            num_workers=docling_serve_settings.eng_loc_num_workers,
+        )
+
+        cm_config = DoclingConverterManagerConfig(
+            artifacts_path=docling_serve_settings.artifacts_path,
+            options_cache_size=docling_serve_settings.options_cache_size,
+            enable_remote_services=docling_serve_settings.enable_remote_services,
+            allow_external_plugins=docling_serve_settings.allow_external_plugins,
+            max_num_pages=docling_serve_settings.max_num_pages,
+            max_file_size=docling_serve_settings.max_file_size,
+        )
+        cm = DoclingConverterManager(config=cm_config)
+
+        return LocalOrchestrator(config=local_config, converter_manager=cm)
+    elif docling_serve_settings.eng_kind == AsyncEngine.KFP:
+        from docling_jobkit.orchestrators.kfp.orchestrator import (
+            KfpOrchestrator,
+            KfpOrchestratorConfig,
+        )
+
+        kfp_config = KfpOrchestratorConfig(
+            endpoint=docling_serve_settings.eng_kfp_endpoint,
+            token=docling_serve_settings.eng_kfp_token,
+            ca_cert_path=docling_serve_settings.eng_kfp_ca_cert_path,
+            self_callback_endpoint=docling_serve_settings.eng_kfp_self_callback_endpoint,
+            self_callback_token_path=docling_serve_settings.eng_kfp_self_callback_token_path,
+            self_callback_ca_cert_path=docling_serve_settings.eng_kfp_self_callback_ca_cert_path,
+        )
+
+        return KfpOrchestrator(config=kfp_config)
+
+    raise RuntimeError(f"Engine {docling_serve_settings.eng_kind} not recognized.")
--- a/docling_serve/response_preparation.py
+++ b/docling_serve/response_preparation.py
@@ -1,21 +1,33 @@
+import asyncio
 import logging
 import os
 import shutil
-import tempfile
 import time
 from collections.abc import Iterable
 from pathlib import Path
 from typing import Union

+import httpx
 from fastapi import BackgroundTasks, HTTPException
 from fastapi.responses import FileResponse

 from docling.datamodel.base_models import OutputFormat
 from docling.datamodel.document import ConversionResult, ConversionStatus
 from docling_core.types.doc import ImageRefMode
+from docling_jobkit.datamodel.convert import ConvertDocumentsOptions
+from docling_jobkit.datamodel.task import Task
+from docling_jobkit.datamodel.task_targets import InBodyTarget, PutTarget, TaskTarget
+from docling_jobkit.orchestrators.base_orchestrator import (
+    BaseOrchestrator,
+)

-from docling_serve.datamodel.convert import ConvertDocumentsOptions
-from docling_serve.datamodel.responses import ConvertDocumentResponse, DocumentResponse
+from docling_serve.datamodel.responses import (
+    ConvertDocumentResponse,
+    DocumentResponse,
+    PresignedUrlConvertDocumentResponse,
+)
+from docling_serve.settings import docling_serve_settings
+from docling_serve.storage import get_scratch

 _log = logging.getLogger(__name__)

@@ -28,11 +40,14 @@ def _export_document_as_content(
    export_txt: bool,
    export_doctags: bool,
    image_mode: ImageRefMode,
+    md_page_break_placeholder: str,
 ):
    document = DocumentResponse(filename=conv_res.input.file.name)

    if conv_res.status == ConversionStatus.SUCCESS:
-        new_doc = conv_res.document._make_copy_with_refmode(Path(), image_mode)
+        new_doc = conv_res.document._make_copy_with_refmode(
+            Path(), image_mode, page_no=None
+        )

        # Create the different formats
        if export_json:
@@ -41,12 +56,16 @@ def _export_document_as_content(
            document.html_content = new_doc.export_to_html(image_mode=image_mode)
        if export_txt:
            document.text_content = new_doc.export_to_markdown(
-                strict_text=True, image_mode=image_mode
+                strict_text=True,
+                image_mode=image_mode,
            )
        if export_md:
-            document.md_content = new_doc.export_to_markdown(image_mode=image_mode)
+            document.md_content = new_doc.export_to_markdown(
+                image_mode=image_mode,
+                page_break_placeholder=md_page_break_placeholder or None,
+            )
        if export_doctags:
-            document.doctags_content = new_doc.export_to_document_tokens()
+            document.doctags_content = new_doc.export_to_doctags()
    elif conv_res.status == ConversionStatus.SKIPPED:
        raise HTTPException(status_code=400, detail=conv_res.errors)
    else:
@@ -64,11 +83,18 @@ def _export_documents_as_files(
    export_txt: bool,
    export_doctags: bool,
    image_export_mode: ImageRefMode,
-):
+    md_page_break_placeholder: str,
+) -> ConversionStatus:
    success_count = 0
    failure_count = 0

+    # Default failure in case results is empty
+    conv_result = ConversionStatus.FAILURE
+
+    artifacts_dir = Path("artifacts/")  # will be relative to the fname
+
    for conv_res in conv_results:
+        conv_result = conv_res.status
        if conv_res.status == ConversionStatus.SUCCESS:
            success_count += 1
            doc_filename = conv_res.input.file.stem
@@ -78,7 +104,9 @@ def _export_documents_as_files(
                fname = output_dir / f"{doc_filename}.json"
                _log.info(f"writing JSON output to {fname}")
                conv_res.document.save_as_json(
-                    filename=fname, image_mode=image_export_mode
+                    filename=fname,
+                    image_mode=image_export_mode,
+                    artifacts_dir=artifacts_dir,
                )

            # Export HTML format:
@@ -86,7 +114,9 @@ def _export_documents_as_files(
                fname = output_dir / f"{doc_filename}.html"
                _log.info(f"writing HTML output to {fname}")
                conv_res.document.save_as_html(
-                    filename=fname, image_mode=image_export_mode
+                    filename=fname,
+                    image_mode=image_export_mode,
+                    artifacts_dir=artifacts_dir,
                )

            # Export Text format:
@@ -104,14 +134,17 @@ def _export_documents_as_files(
                fname = output_dir / f"{doc_filename}.md"
                _log.info(f"writing Markdown output to {fname}")
                conv_res.document.save_as_markdown(
-                    filename=fname, image_mode=image_export_mode
+                    filename=fname,
+                    artifacts_dir=artifacts_dir,
+                    image_mode=image_export_mode,
+                    page_break_placeholder=md_page_break_placeholder or None,
                )

            # Export Document Tags format:
            if export_doctags:
                fname = output_dir / f"{doc_filename}.doctags"
                _log.info(f"writing Doc Tags output to {fname}")
-                conv_res.document.save_as_document_tokens(filename=fname)
+                conv_res.document.save_as_doctags(filename=fname)

        else:
            _log.warning(f"Document {conv_res.input.file} failed to convert.")
@@ -121,13 +154,15 @@ def _export_documents_as_files(
        f"Processed {success_count + failure_count} docs, "
        f"of which {failure_count} failed"
    )
+    return conv_result


 def process_results(
-    background_tasks: BackgroundTasks,
    conversion_options: ConvertDocumentsOptions,
+    target: TaskTarget,
    conv_results: Iterable[ConversionResult],
-) -> Union[ConvertDocumentResponse, FileResponse]:
+    work_dir: Path,
+) -> Union[ConvertDocumentResponse, FileResponse, PresignedUrlConvertDocumentResponse]:
    # Let's start by processing the documents
    try:
        start_time = time.monotonic()
@@ -151,7 +186,9 @@ def process_results(
        )

    # We have some results, let's prepare the response
-    response: Union[FileResponse, ConvertDocumentResponse]
+    response: Union[
+        FileResponse, ConvertDocumentResponse, PresignedUrlConvertDocumentResponse
+    ]

    # Booleans to know what to export
    export_json = OutputFormat.JSON in conversion_options.to_formats
@@ -161,7 +198,7 @@ def process_results(
    export_doctags = OutputFormat.DOCTAGS in conversion_options.to_formats

    # Only 1 document was processed, and we are not returning it as a file
-    if len(conv_results) == 1 and not conversion_options.return_as_file:
+    if len(conv_results) == 1 and isinstance(target, InBodyTarget):
        conv_res = conv_results[0]
        document = _export_document_as_content(
            conv_res,
@@ -171,6 +208,7 @@ def process_results(
            export_txt=export_txt,
            export_doctags=export_doctags,
            image_mode=conversion_options.image_export_mode,
+            md_page_break_placeholder=conversion_options.md_page_break_placeholder,
        )

        response = ConvertDocumentResponse(
@@ -183,7 +221,6 @@ def process_results(
    # Multiple documents were processed, or we are forced returning as a file
    else:
        # Temporary directory to store the outputs
-        work_dir = Path(tempfile.mkdtemp(prefix="docling_"))
        output_dir = work_dir / "output"
        output_dir.mkdir(parents=True, exist_ok=True)

@@ -191,7 +228,7 @@ def process_results(
        os.getpid()

        # Export the documents
-        _export_documents_as_files(
+        conv_result = _export_documents_as_files(
            conv_results=conv_results,
            output_dir=output_dir,
            export_json=export_json,
@@ -200,10 +237,10 @@ def process_results(
            export_txt=export_txt,
            export_doctags=export_doctags,
            image_export_mode=conversion_options.image_export_mode,
+            md_page_break_placeholder=conversion_options.md_page_break_placeholder,
        )

        files = os.listdir(output_dir)
-
        if len(files) == 0:
            raise HTTPException(status_code=500, detail="No documents were exported.")

@@ -216,10 +253,69 @@ def process_results(

        # Other cleanups after the response is sent
        # Output directory
-        background_tasks.add_task(shutil.rmtree, work_dir, ignore_errors=True)
+        # background_tasks.add_task(shutil.rmtree, work_dir, ignore_errors=True)

-        response = FileResponse(
-            file_path, filename=file_path.name, media_type="application/zip"
-        )
+        if isinstance(target, PutTarget):
+            try:
+                with open(file_path, "rb") as file_data:
+                    r = httpx.put(str(target.url), files={"file": file_data})
+                    r.raise_for_status()
+                response = PresignedUrlConvertDocumentResponse(
+                    status=conv_result,
+                    processing_time=processing_time,
+                )
+            except Exception as exc:
+                _log.error("An error occour while uploading zip to s3", exc_info=exc)
+                raise HTTPException(
+                    status_code=500, detail="An error occour while uploading zip to s3."
+                )
+        else:
+            response = FileResponse(
+                file_path, filename=file_path.name, media_type="application/zip"
+            )
+
+    return response
+
+
+async def prepare_response(
+    task: Task, orchestrator: BaseOrchestrator, background_tasks: BackgroundTasks
+):
+    if task.results is None:
+        raise HTTPException(
+            status_code=404,
+            detail="Task result not found. Please wait for a completion status.",
+        )
+    assert task.options is not None
+
+    work_dir = get_scratch() / task.task_id
+    response = process_results(
+        conversion_options=task.options,
+        target=task.target,
+        conv_results=task.results,
+        work_dir=work_dir,
+    )
+
+    if work_dir.exists():
+        task.scratch_dir = work_dir
+        if not isinstance(response, FileResponse):
+            _log.warning(
+                f"Task {task.task_id=} produced content in {work_dir=} but the response is not a file."
+            )
+            shutil.rmtree(work_dir, ignore_errors=True)
+
+    if docling_serve_settings.single_use_results:
+        if task.scratch_dir is not None:
+            background_tasks.add_task(
+                shutil.rmtree, task.scratch_dir, ignore_errors=True
+            )
+
+        async def _remove_task_impl():
+            await asyncio.sleep(docling_serve_settings.result_removal_delay)
+            await orchestrator.delete_task(task_id=task.task_id)
+
+        async def _remove_task():
+            asyncio.create_task(_remove_task_impl())  # noqa: RUF006
+
+        background_tasks.add_task(_remove_task)

    return response
--- a/docling_serve/settings.py
+++ b/docling_serve/settings.py
@@ -1,9 +1,11 @@
+import enum
+import sys
 from pathlib import Path
 from typing import Optional, Union

+from pydantic import AnyUrl, model_validator
 from pydantic_settings import BaseSettings, SettingsConfigDict
-
-from docling_serve.datamodel.engines import AsyncEngine
+from typing_extensions import Self


 class UvicornSettings(BaseSettings):
@@ -16,9 +18,18 @@ class UvicornSettings(BaseSettings):
    reload: bool = False
    root_path: str = ""
    proxy_headers: bool = True
+    timeout_keep_alive: int = 60
+    ssl_certfile: Optional[Path] = None
+    ssl_keyfile: Optional[Path] = None
+    ssl_keyfile_password: Optional[str] = None
    workers: Union[int, None] = None


+class AsyncEngine(str, enum.Enum):
+    LOCAL = "local"
+    KFP = "kfp"
+
+
 class DoclingServeSettings(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="DOCLING_SERVE_",
@@ -28,10 +39,54 @@ class DoclingServeSettings(BaseSettings):
    )

    enable_ui: bool = False
+    api_host: str = "localhost"
    artifacts_path: Optional[Path] = None
+    static_path: Optional[Path] = None
+    scratch_path: Optional[Path] = None
+    single_use_results: bool = True
+    result_removal_delay: float = 300  # 5 minutes
+    load_models_at_boot: bool = True
+    options_cache_size: int = 2
+    enable_remote_services: bool = False
+    allow_external_plugins: bool = False
+
+    max_document_timeout: float = 3_600 * 24 * 7  # 7 days
+    max_num_pages: int = sys.maxsize
+    max_file_size: int = sys.maxsize
+
+    max_sync_wait: int = 120  # 2 minutes
+
+    cors_origins: list[str] = ["*"]
+    cors_methods: list[str] = ["*"]
+    cors_headers: list[str] = ["*"]

    eng_kind: AsyncEngine = AsyncEngine.LOCAL
+    # Local engine
    eng_loc_num_workers: int = 2
+    # KFP engine
+    eng_kfp_endpoint: Optional[AnyUrl] = None
+    eng_kfp_token: Optional[str] = None
+    eng_kfp_ca_cert_path: Optional[str] = None
+    eng_kfp_self_callback_endpoint: Optional[str] = None
+    eng_kfp_self_callback_token_path: Optional[Path] = None
+    eng_kfp_self_callback_ca_cert_path: Optional[Path] = None
+
+    eng_kfp_experimental: bool = False
+
+    @model_validator(mode="after")
+    def engine_settings(self) -> Self:
+        # Validate KFP engine settings
+        if self.eng_kind == AsyncEngine.KFP:
+            if self.eng_kfp_endpoint is None:
+                raise ValueError("KFP endpoint is required when using the KFP engine.")
+
+        if self.eng_kind == AsyncEngine.KFP:
+            if not self.eng_kfp_experimental:
+                raise ValueError(
+                    "KFP is not yet working. To enable the development version, you must set DOCLING_SERVE_ENG_KFP_EXPERIMENTAL=true."
+                )
+
+        return self


 uvicorn_settings = UvicornSettings()
--- a/docling_serve/storage.py
+++ b/docling_serve/storage.py
@@ -0,0 +1,16 @@
+import tempfile
+from functools import lru_cache
+from pathlib import Path
+
+from docling_serve.settings import docling_serve_settings
+
+
+@lru_cache
+def get_scratch() -> Path:
+    scratch_dir = (
+        docling_serve_settings.scratch_path
+        if docling_serve_settings.scratch_path is not None
+        else Path(tempfile.mkdtemp(prefix="docling_"))
+    )
+    scratch_dir.mkdir(exist_ok=True, parents=True)
+    return scratch_dir
--- a/docling_serve/websocket_notifier.py
+++ b/docling_serve/websocket_notifier.py
@@ -0,0 +1,54 @@
+from fastapi import WebSocket
+
+from docling_jobkit.datamodel.task_meta import TaskStatus
+from docling_jobkit.orchestrators.base_notifier import BaseNotifier
+from docling_jobkit.orchestrators.base_orchestrator import BaseOrchestrator
+
+from docling_serve.datamodel.responses import (
+    MessageKind,
+    TaskStatusResponse,
+    WebsocketMessage,
+)
+
+
+class WebsocketNotifier(BaseNotifier):
+    def __init__(self, orchestrator: BaseOrchestrator):
+        super().__init__(orchestrator)
+        self.task_subscribers: dict[str, set[WebSocket]] = {}
+
+    async def add_task(self, task_id: str):
+        self.task_subscribers[task_id] = set()
+
+    async def remove_task(self, task_id: str):
+        if task_id in self.task_subscribers:
+            for websocket in self.task_subscribers[task_id]:
+                await websocket.close()
+
+            del self.task_subscribers[task_id]
+
+    async def notify_task_subscribers(self, task_id: str):
+        if task_id not in self.task_subscribers:
+            raise RuntimeError(f"Task {task_id} does not have a subscribers list.")
+
+        task = await self.orchestrator.get_raw_task(task_id=task_id)
+        task_queue_position = await self.orchestrator.get_queue_position(task_id)
+        msg = TaskStatusResponse(
+            task_id=task.task_id,
+            task_status=task.task_status,
+            task_position=task_queue_position,
+            task_meta=task.processing_meta,
+        )
+        for websocket in self.task_subscribers[task_id]:
+            await websocket.send_text(
+                WebsocketMessage(message=MessageKind.UPDATE, task=msg).model_dump_json()
+            )
+            if task.is_completed():
+                await websocket.close()
+
+    async def notify_queue_positions(self):
+        for task_id in self.task_subscribers.keys():
+            # notify only pending tasks
+            if self.orchestrator.tasks[task_id].task_status != TaskStatus.PENDING:
+                continue
+
+            await self.notify_task_subscribers(task_id)
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,9 @@
+# Docling Serve documentation
+
+This documentation pages explore the webserver configurations, runtime options, deployment examples as well as development best practices.
+
+- [Configuration](./configuration.md)
+- [Advance usage](./usage.md)
+- [Deployment](./deployment.md)
+- [Development](./development.md)
+- [`v1` migration](./v1_migration.md)
--- a/docs/assets/docling-serve-pic.png
+++ b/docs/assets/docling-serve-pic.png
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -0,0 +1,81 @@
+# Configuration
+
+The `docling-serve` executable allows to configure the server via command line
+options as well as environment variables.
+Configurations are divided between the settings used for the `uvicorn` asgi
+server and the actual app-specific configurations.
+
+ > [!WARNING]
+> When the server is running with `reload` or with multiple `workers`, uvicorn
+> will spawn multiple subprocesses. This invalidates all the values configured
+> via the CLI command line options. Please use environment variables in this
+> type of deployments.
+
+## Webserver configuration
+
+The following table shows the options which are propagated directly to the
+`uvicorn` webserver runtime.
+
+| CLI option | ENV | Default | Description |
+| -----------|-----|---------|-------------|
+| `--host` | `UVICORN_HOST` | `0.0.0.0` for `run`, `localhost` for `dev` | THe host to serve on. |
+| `--port` | `UVICORN_PORT` | `5001` | The port to serve on. |
+| `--reload` | `UVICORN_RELOAD` | `false` for `run`, `true` for `dev` | Enable auto-reload of the server when (code) files change. |
+| `--workers` | `UVICORN_WORKERS` | `1` | Use multiple worker processes. |
+| `--root-path` | `UVICORN_ROOT_PATH` | `""` | The root path is used to tell your app that it is being served to the outside world with some |
+| `--proxy-headers` | `UVICORN_PROXY_HEADERS` | `true` | Enable/Disable X-Forwarded-Proto, X-Forwarded-For, X-Forwarded-Port to populate remote address info. |
+| `--timeout-keep-alive` | `UVICORN_TIMEOUT_KEEP_ALIVE` | `60` | Timeout for the server response. |
+| `--ssl-certfile` | `UVICORN_SSL_CERTFILE` |  | SSL certificate file. |
+| `--ssl-keyfile` | `UVICORN_SSL_KEYFILE` |  | SSL key file. |
+| `--ssl-keyfile-password` | `UVICORN_SSL_KEYFILE_PASSWORD` |  | SSL keyfile password. |
+
+## Docling Serve configuration
+
+THe following table describes the options to configure the Docling Serve app.
+
+| CLI option | ENV | Default | Description |
+| -----------|-----|---------|-------------|
+| `--artifacts-path` | `DOCLING_SERVE_ARTIFACTS_PATH` | unset | If set to a valid directory, the model weights will be loaded from this path |
+|  | `DOCLING_SERVE_STATIC_PATH` | unset | If set to a valid directory, the static assets for the docs and UI will be loaded from this path |
+|  | `DOCLING_SERVE_SCRATCH_PATH` |  | If set, this directory will be used as scratch workspace, e.g. storing the results before they get requested. If unset, a temporary created is created for this purpose. |
+| `--enable-ui` | `DOCLING_SERVE_ENABLE_UI` | `false` | Enable the demonstrator UI. |
+|  | `DOCLING_SERVE_ENABLE_REMOTE_SERVICES` | `false` | Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |
+|  | `DOCLING_SERVE_ALLOW_EXTERNAL_PLUGINS` | `false` | Allow the selection of third-party plugins. |
+|  | `DOCLING_SERVE_SINGLE_USE_RESULTS` | `true` | If true, results can be accessed only once. If false, the results accumulate in the scratch directory. |
+|  | `DOCLING_SERVE_RESULT_REMOVAL_DELAY` | `300` | When `DOCLING_SERVE_SINGLE_USE_RESULTS` is active, this is the delay before results are removed from the task registry. |
+|  | `DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT` | `604800` (7 days) | The maximum time for processing a document. |
+|  | `DOCLING_SERVE_MAX_NUM_PAGES` |  | The maximum number of pages for a document to be processed. |
+|  | `DOCLING_SERVE_MAX_FILE_SIZE` |  | The maximum file size for a document to be processed. |
+|  | `DOCLING_SERVE_MAX_SYNC_WAIT` | `120` | Max number of seconds a synchronous endpoint is waiting for the task completion. |
+|  | `DOCLING_SERVE_LOAD_MODELS_AT_BOOT` | `True` | If enabled, the models for the default options will be loaded at boot. |
+|  | `DOCLING_SERVE_OPTIONS_CACHE_SIZE` | `2` | How many DocumentConveter objects (including their loaded models) to keep in the cache. |
+|  | `DOCLING_SERVE_CORS_ORIGINS` | `["*"]` | A list of origins that should be permitted to make cross-origin requests. |
+|  | `DOCLING_SERVE_CORS_METHODS` | `["*"]` | A list of HTTP methods that should be allowed for cross-origin requests. |
+|  | `DOCLING_SERVE_CORS_HEADERS` | `["*"]` | A list of HTTP request headers that should be supported for cross-origin requests. |
+|  | `DOCLING_SERVE_ENG_KIND` | `local` | The compute engine to use for the async tasks. Possible values are `local` and `kfp`. See below for more configurations of the engines. |
+
+### Compute engine
+
+Docling Serve can be deployed with several possible of compute engine.
+The selected compute engine will be running all the async jobs.
+
+#### Local engine
+
+The following table describes the options to configure the Docling Serve local engine.
+
+| ENV | Default | Description |
+|-----|---------|-------------|
+| `DOCLING_SERVE_ENG_LOC_NUM_WORKERS` | 2 | Number of workers/threads processing the incoming tasks. |
+
+#### KFP engine
+
+The following table describes the options to configure the Docling Serve KFP engine.
+
+| ENV | Default | Description |
+|-----|---------|-------------|
+| `DOCLING_SERVE_ENG_KFP_ENDPOINT` |  | Must be set to the Kubeflow Pipeline endpoint. When using the in-cluster deployment, make sure to use the cluster endpoint, e.g. `https://NAME.NAMESPACE.svc.cluster.local:8888`  |
+| `DOCLING_SERVE_ENG_KFP_TOKEN` |  | The authentication token for KFP. For in-cluster deployment, the app will load automatically the token of the ServiceAccount. |
+| `DOCLING_SERVE_ENG_KFP_CA_CERT_PATH` |  | Path to the CA certificates for the KFP endpoint. For in-cluster deployment, the app will load automatically the internal CA. |
+| `DOCLING_SERVE_ENG_KFP_SELF_CALLBACK_ENDPOINT` |  | If set, it enables internal callbacks providing status update of the KFP job. Usually something like `https://NAME.NAMESPACE.svc.cluster.local:5001/v1/callback/task/progress`. |
+| `DOCLING_SERVE_ENG_KFP_SELF_CALLBACK_TOKEN_PATH` |  | The token used for authenticating the progress callback. For cluster-internal workloads, use `/run/secrets/kubernetes.io/serviceaccount/token`. |
+| `DOCLING_SERVE_ENG_KFP_SELF_CALLBACK_CA_CERT_PATH` |  | The CA certificate for the progress callback. For cluster-inetrnal workloads, use `/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt`. |
--- a/docs/deploy-examples/compose-gpu.yaml
+++ b/docs/deploy-examples/compose-gpu.yaml
@@ -0,0 +1,15 @@
+services:
+  docling:
+    image: ghcr.io/docling-project/docling-serve-cu124
+    container_name: docling-serve
+    ports:
+      - 5001:5001
+    environment:
+      - DOCLING_SERVE_ENABLE_UI=true
+    deploy:
+      resources:
+        reservations:
+          devices:
+          - driver: nvidia
+            count: all # nvidia-smi 
+            capabilities: [gpu]
--- a/docs/deploy-examples/docling-model-cache-deployment.yaml
+++ b/docs/deploy-examples/docling-model-cache-deployment.yaml
@@ -0,0 +1,47 @@
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 2
+              memory: 4Gi
+            requests:
+              cpu: 250m
+              memory: 1Gi
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+            - name: DOCLING_SERVE_ARTIFACTS_PATH
+              value: '/modelcache'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve-cpu'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
--- a/docs/deploy-examples/docling-model-cache-job.yaml
+++ b/docs/deploy-examples/docling-model-cache-job.yaml
@@ -0,0 +1,33 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: docling-model-cache-load
+spec:
+  selector: {}
+  template:
+    metadata:
+      name: docling-model-load
+    spec:
+      containers:
+        - name: loader
+          image: ghcr.io/docling-project/docling-serve-cpu:main
+          command:
+            - docling-tools
+            - models
+            - download
+            - '--output-dir=/modelcache'
+            - 'layout'
+            - 'tableformer'
+            - 'code_formula'
+            - 'picture_classifier'
+            - 'smolvlm'
+            - 'granite_vision'
+            - 'easyocr'
+          volumeMounts:
+            - name: docling-model-cache
+              mountPath: /modelcache
+      volumes:
+        - name: docling-model-cache
+          persistentVolumeClaim:
+            claimName: docling-model-cache-pvc
+      restartPolicy: Never
--- a/docs/deploy-examples/docling-model-cache-pvc.yaml
+++ b/docs/deploy-examples/docling-model-cache-pvc.yaml
@@ -0,0 +1,11 @@
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: docling-model-cache-pvc
+spec:
+  accessModes:
+    - ReadWriteOnce
+  volumeMode: Filesystem
+  resources:
+    requests:
+      storage: 10Gi
--- a/docs/deploy-examples/docling-serve-oauth.yaml
+++ b/docs/deploy-examples/docling-serve-oauth.yaml
@@ -0,0 +1,192 @@
+# This example deployment configures Docling Serve with a OAuth-Proxy sidecar and TLS termination
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+  annotations:
+    serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"docling-serve"}}'
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: docling-serve-oauth
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: system:auth-delegator
+subjects:
+- kind: ServiceAccount
+  name: docling-serve
+  namespace: docling
+---
+apiVersion: route.openshift.io/v1
+kind: Route
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  to:
+    kind: Service
+    name: docling-serve
+  port:
+    targetPort: oauth
+  tls:
+    termination: Reencrypt
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+  annotations:
+    service.alpha.openshift.io/serving-cert-secret-name: docling-serve-tls
+spec:
+  ports:
+  - name: oauth
+    port: 8443
+    targetPort: oauth
+  - name: http
+    port: 5001
+    targetPort: http
+  selector:
+    app: docling-serve
+    component: docling-serve-api
+---
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      serviceAccountName: docling-serve
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 2000m
+              memory: 4Gi
+            requests:
+              cpu: 800m
+              memory: 1Gi
+          readinessProbe:
+            httpGet:
+              path: /health
+              port: http
+              scheme: HTTPS
+            initialDelaySeconds: 10
+            timeoutSeconds: 2
+            periodSeconds: 5
+            successThreshold: 1
+            failureThreshold: 3
+          livenessProbe:
+            httpGet:
+              path: /health
+              port: http
+              scheme: HTTPS
+            initialDelaySeconds: 3
+            timeoutSeconds: 4
+            periodSeconds: 10
+            successThreshold: 1
+            failureThreshold: 5
+          env:
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+            - name: DOCLING_SERVE_API_HOST
+              value: 'docling-serve.$(NAMESPACE).svc.cluster.local'
+            - name: UVICORN_SSL_CERTFILE
+              value: '/etc/tls/private/tls.crt'
+            - name: UVICORN_SSL_KEYFILE
+              value: '/etc/tls/private/tls.key'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          volumeMounts:
+            - name: proxy-tls
+              mountPath: /etc/tls/private
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve-cpu:fix-ui-with-https'
+        - name: oauth-proxy
+          resources:
+            limits:
+              cpu: 100m
+              memory: 256Mi
+            requests:
+              cpu: 100m
+              memory: 256Mi
+          readinessProbe:
+            httpGet:
+              path: /oauth/healthz
+              port: oauth
+              scheme: HTTPS
+            initialDelaySeconds: 5
+            timeoutSeconds: 1
+            periodSeconds: 5
+            successThreshold: 1
+            failureThreshold: 3
+          livenessProbe:
+            httpGet:
+              path: /oauth/healthz
+              port: oauth
+              scheme: HTTPS
+            initialDelaySeconds: 30
+            timeoutSeconds: 1
+            periodSeconds: 5
+            successThreshold: 1
+            failureThreshold: 3
+          ports:
+            - name: oauth
+              containerPort: 8443
+              protocol: TCP
+          imagePullPolicy: IfNotPresent
+          volumeMounts:
+            - name: proxy-tls
+              mountPath: /etc/tls/private
+          env:
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+          image: 'registry.redhat.io/openshift4/ose-oauth-proxy:v4.13'
+          args:
+            - '--https-address=:8443'
+            - '--provider=openshift'
+            - '--openshift-service-account=docling-serve'
+            - '--upstream=https://docling-serve.$(NAMESPACE).svc.cluster.local:5001'
+            - '--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt'
+            - '--tls-cert=/etc/tls/private/tls.crt'
+            - '--tls-key=/etc/tls/private/tls.key'
+            - '--cookie-secret=SECRET'
+            - '--openshift-delegate-urls={"/": {"group":"route.openshift.io","resource":"routes","verb":"get","name":"docling-serve","namespace":"$(NAMESPACE)"}}'
+            - '--openshift-sar={"namespace":"$(NAMESPACE)","resource":"routes","resourceName":"docling-serve","verb":"get","resourceAPIGroup":"route.openshift.io"}'
+            - '--skip-auth-regex=''(^/health|^/docs)'''
+      volumes:
+        - name: proxy-tls
+          secret:
+            secretName: docling-serve-tls
+            defaultMode: 420
--- a/docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
+++ b/docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
@@ -0,0 +1,76 @@
+# This example deployment configures Docling Serve with a Route + Sticky sessions, a Service and cpu image
+---
+kind: Route
+apiVersion: route.openshift.io/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+  annotations:
+    haproxy.router.openshift.io/disable_cookies: "false" # this annotation enables the sticky sessions
+spec:
+  path: /
+  to:
+    kind: Service
+    name: docling-serve
+  port:
+    targetPort: http
+  tls:
+    termination: edge
+    insecureEdgeTerminationPolicy: Redirect
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  ports:
+  - name: http
+    port: 5001
+    targetPort: http
+  selector:
+    app: docling-serve
+    component: docling-serve-api
+---
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 1
+              memory: 4Gi
+            requests:
+              cpu: 250m
+              memory: 1Gi
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve'
--- a/docs/deploy-examples/docling-serve-simple.yaml
+++ b/docs/deploy-examples/docling-serve-simple.yaml
@@ -0,0 +1,58 @@
+# This example deployment configures Docling Serve with a Service and cuda image
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  ports:
+  - name: http
+    port: 5001
+    targetPort: http
+  selector:
+    app: docling-serve
+    component: docling-serve-api
+---
+kind: Deployment
+apiVersion: apps/v1
+metadata:
+  name: docling-serve
+  labels:
+    app: docling-serve
+    component: docling-serve-api
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: docling-serve
+      component: docling-serve-api
+  template:
+    metadata:
+      labels:
+        app: docling-serve
+        component: docling-serve-api
+    spec:
+      restartPolicy: Always
+      containers:
+        - name: api
+          resources:
+            limits:
+              cpu: 1
+              memory: 4Gi
+              nvidia.com/gpu: 1  # Limit to one GPU
+            requests:
+              cpu: 250m
+              memory: 1Gi
+              nvidia.com/gpu: 1  # Limit to one GPU
+          env:
+            - name: DOCLING_SERVE_ENABLE_UI
+              value: 'true'
+          ports:
+            - name: http
+              containerPort: 5001
+              protocol: TCP
+          imagePullPolicy: Always
+          image: 'ghcr.io/docling-project/docling-serve-cu124'
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -0,0 +1,236 @@
+# Deployment Examples
+
+This document provides deployment examples for running the application in different environments.
+
+Choose the deployment option that best fits your setup.
+
+- **[Local GPU](#local-gpu)**: For deploying the application locally on a machine with a NVIDIA GPU (using Docker Compose).
+- **[OpenShift](#openshift)**: For deploying the application on an OpenShift cluster, designed for cloud-native environments.
+
+---
+
+## Local GPU
+
+### Docker compose
+
+Manifest example: [compose-gpu.yaml](./deploy-examples/compose-gpu.yaml)
+
+This deployment has the following features:
+
+- NVIDIA cuda enabled
+
+Install the app with:
+
+```sh
+docker compose -f docs/deploy-examples/compose-gpu.yaml up -d
+```
+
+For using the API:
+
+```sh
+# Make a test query
+curl -X 'POST' \
+  "localhost:5001/v1/convert/source/async" \
+  -H "accept: application/json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+  }'
+```
+
+<details>
+<summary><b>Requirements</b></summary>
+
+- debian/ubuntu/rhel/fedora/opensuse
+- docker
+- nvidia drivers >=550.54.14
+- nvidia-container-toolkit
+
+Docs:
+
+- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/supported-platforms.html)
+- [CUDA Toolkit Release Notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id6)
+
+</details>
+
+<details>
+<summary><b>Steps</b></summary>
+
+1. Check driver version and which GPU you want to use (0/1/2/3.. and update [compose-gpu.yaml](./deploy-examples/compose-gpu.yaml) file or use `count: all`)
+
+    ```sh
+    nvidia-smi
+    ```
+
+2. Check if the NVIDIA Container Toolkit is installed/updated
+
+    ```sh
+    # debian
+    dpkg -l | grep nvidia-container-toolkit
+    ```
+
+    ```sh
+    # rhel
+    rpm -q nvidia-container-toolkit
+    ```
+
+    NVIDIA Container Toolkit install steps can be found here:
+
+    <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>
+
+3. Check which runtime is being used by Docker
+
+    ```sh
+    # docker
+    docker info | grep -i runtime
+    ```
+
+4. If the default Docker runtime changes back from 'nvidia' to 'default' after restarting the Docker service (optional):
+
+    Backup the daemon.json file:
+
+    ```sh
+    sudo cp /etc/docker/daemon.json /etc/docker/daemon.json.bak
+    ```
+
+    Update the daemon.json file:
+
+    ```sh
+    echo '{
+      "runtimes": {
+        "nvidia": {
+          "path": "nvidia-container-runtime"
+        }
+      },
+      "default-runtime": "nvidia"
+    }' | sudo tee /etc/docker/daemon.json > /dev/null
+    ```
+
+    Restart the Docker service:
+
+    ```sh
+    sudo systemctl restart docker
+    ```
+
+    Confirm 'nvidia' is the default runtime used by Docker by repeating step 3.
+
+5. Run the container:
+
+    ```sh
+    docker compose -f docs/deploy-examples/compose-gpu.yaml up -d
+    ```
+
+</details>
+
+## OpenShift
+
+### Simple deployment
+
+Manifest example: [docling-serve-simple.yaml](./deploy-examples/docling-serve-simple.yaml)
+
+This deployment example has the following features:
+
+- Deployment configuration
+- Service configuration
+- NVIDIA cuda enabled
+
+Install the app with:
+
+```sh
+oc apply -f docs/deploy-examples/docling-serve-simple.yaml
+```
+
+For using the API:
+
+```sh
+# Port-forward the service
+oc port-forward svc/docling-serve 5001:5001
+
+# Make a test query
+curl -X 'POST' \
+  "localhost:5001/v1/convert/source/async" \
+  -H "accept: application/json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+  }'
+```
+
+### Secure deployment with `oauth-proxy`
+
+Manifest example: [docling-serve-oauth.yaml](./deploy-examples/docling-serve-oauth.yaml)
+
+This deployment has the following features:
+
+- TLS encryption between all components (using the cluster-internal CA authority).
+- Authentication via a secure `oauth-proxy` sidecar.
+- Expose the service using a secure OpenShift `Route`
+
+Install the app with:
+
+```sh
+oc apply -f docs/deploy-examples/docling-serve-oauth.yaml
+```
+
+For using the API:
+
+```sh
+# Retrieve the endpoint
+DOCLING_NAME=docling-serve
+DOCLING_ROUTE="https://$(oc get routes ${DOCLING_NAME} --template={{.spec.host}})"
+
+# Retrieve the authentication token
+OCP_AUTH_TOKEN=$(oc whoami --show-token)
+
+# Make a test query
+curl -X 'POST' \
+  "${DOCLING_ROUTE}/v1/convert/source/async" \
+  -H "Authorization: Bearer ${OCP_AUTH_TOKEN}" \
+  -H "accept: application/json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+  }'
+```
+
+### ReplicaSets with `sticky sessions`
+
+Manifest example: [docling-serve-replicas-w-sticky-sessions.yaml](./deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml)
+
+This deployment has the following features:
+
+- Deployment configuration with 3 replicas
+- Service configuration
+- Expose the service using a OpenShift `Route` and enables sticky sessions
+
+Install the app with:
+
+```sh
+oc apply -f docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
+```
+
+For using the API:
+
+```sh
+# Retrieve the endpoint
+DOCLING_NAME=docling-serve
+DOCLING_ROUTE="https://$(oc get routes $DOCLING_NAME --template={{.spec.host}})"
+
+# Make a test query, store the cookie and taskid
+task_id=$(curl -s -X 'POST' \
+    "${DOCLING_ROUTE}/v1/convert/source/async" \
+    -H "accept: application/json" \
+    -H "Content-Type: application/json" \
+    -d '{
+    "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}]
+    }' \
+    -c cookies.txt | grep -oP '"task_id":"\K[^"]+')
+```
+
+```sh
+# Grab the taskid and cookie to check the task status
+curl -v -X 'GET' \
+  "${DOCLING_ROUTE}/v1/status/poll/$task_id?wait=0" \
+  -H "accept: application/json" \
+  -b "cookies.txt"
+```
--- a/docs/development.md
+++ b/docs/development.md
@@ -0,0 +1,57 @@
+# Development
+
+## Install dependencies
+
+### CPU only
+
+```sh
+# Install uv if not already available
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Install dependencies
+uv sync --extra cpu
+```
+
+### Cuda GPU
+
+For GPU support use the following command:
+
+```sh
+# Install dependencies
+uv sync
+```
+
+### Gradio UI and different OCR backends
+
+`/ui` endpoint using `gradio` and different OCR backends can be enabled via package extras:
+
+```sh
+# Enable ui and rapidocr
+uv sync --extra ui --extra rapidocr
+```
+
+```sh
+# Enable tesserocr
+uv sync --extra tesserocr
+```
+
+See `[project.optional-dependencies]` section in `pyproject.toml` for full list of options and runtime options with `uv run docling-serve --help`.
+
+### Run the server
+
+The `docling-serve` executable is a convenient script for launching the webserver both in
+development and production mode.
+
+```sh
+# Run the server in development mode
+# - reload is enabled by default
+# - listening on the 127.0.0.1 address
+# - ui is enabled by default
+docling-serve dev
+
+# Run the server in production mode
+# - reload is disabled by default
+# - listening on the 0.0.0.0 address
+# - ui is disabled by default
+docling-serve run
+```
--- a/docs/pre-loading-models.md
+++ b/docs/pre-loading-models.md
@@ -0,0 +1,103 @@
+# Pre-loading models for docling
+
+This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.
+
+1. We need to create a persistent volume that will store models weights:
+
+    ```yaml
+    apiVersion: v1
+    kind: PersistentVolumeClaim
+    metadata:
+      name: docling-model-cache-pvc
+    spec:
+      accessModes:
+        - ReadWriteOnce
+      volumeMode: Filesystem
+      resources:
+        requests:
+          storage: 10Gi
+    ```
+
+    If you don't want to use default storage class, set your custom storage class with following:
+
+    ```yaml
+    spec:
+      ...
+      storageClassName: <Storage Class Name>
+    ```
+
+    Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml)
+
+2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:
+
+    ```yaml
+    apiVersion: batch/v1
+    kind: Job
+    metadata:
+      name: docling-model-cache-load
+    spec:
+      selector: {}
+      template:
+        metadata:
+          name: docling-model-load
+        spec:
+          containers:
+            - name: loader
+              image: ghcr.io/docling-project/docling-serve-cpu:main
+              command:
+                - docling-tools
+                - models
+                - download
+                - '--output-dir=/modelcache'
+                - 'layout'
+                - 'tableformer'
+                - 'code_formula'
+                - 'picture_classifier'
+                - 'smolvlm'
+                - 'granite_vision'
+                - 'easyocr'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+          restartPolicy: Never
+    ```
+
+    The job will mount previously created persistent volume and execute command similar to how we would load models locally:
+    `docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]`
+
+    In manifest, we specify desired models individually, or we can use `--all` parameter to download all models.
+
+    Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
+
+3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
+    Following additions to deployment should be made:
+
+    ```yaml
+    spec:
+      template:
+        spec:
+          containers:
+            - name: api
+              env:
+              ...
+                - name: DOCLING_SERVE_ARTIFACTS_PATH
+                  value: '/modelcache'
+              volumeMounts:
+                - name: docling-model-cache
+                  mountPath: /modelcache
+          ...
+          volumes:
+            - name: docling-model-cache
+              persistentVolumeClaim:
+                claimName: docling-model-cache-pvc
+    ```
+
+    Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
+
+    Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mounted volume.
+
+    Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -0,0 +1,444 @@
+# Usage
+
+The API provides two endpoints: one for urls, one for files. This is necessary to send files directly in binary format instead of base64-encoded strings.
+
+## Common parameters
+
+On top of the source of file (see below), both endpoints support the same parameters, which are almost the same as the Docling CLI.
+
+- `from_formats` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
+- `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
+- `pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
+- `page_range` (tuple). If specified, only convert a range of pages. The page number starts at 1.
+- `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
+- `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
+- `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
+- `ocr_engine` (str): OCR engine to use. Allowed values: `easyocr`, `tesserocr`, `tesseract`, `rapidocr`, `ocrmac`. Defaults to `easyocr`. To use the `tesserocr` engine, `tesserocr` must be installed where docling-serve is running: `pip install tesserocr`
+- `ocr_lang` (List[str]): List of languages used by the OCR engine. Note that each OCR engine has different values for the language names. Defaults to empty.
+- `pdf_backend` (str): PDF backend to use. Allowed values: `pypdfium2`, `dlparse_v1`, `dlparse_v2`, `dlparse_v4`. Defaults to `dlparse_v4`.
+- `table_mode` (str): Table mode to use. Allowed values: `fast`, `accurate`. Defaults to `fast`.
+- `abort_on_error` (bool): If enabled, abort on error. Defaults to false.
+- `md_page_break_placeholder` (str): Add this placeholder between pages in the markdown output.
+- `do_table_structure` (bool): If enabled, the table structure will be extracted. Defaults to true.
+- `do_code_enrichment` (bool): If enabled, perform OCR code enrichment. Defaults to false.
+- `do_formula_enrichment` (bool): If enabled, perform formula OCR, return LaTeX code. Defaults to false.
+- `do_picture_classification` (bool): If enabled, classify pictures in documents. Defaults to false.
+- `do_picture_description` (bool): If enabled, describe pictures in documents. Defaults to false.
+- `picture_description_area_threshold` (float): Minimum percentage of the area for a picture to be processed with the models. Defaults to 0.05.
+- `picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with `picture_description_api`.
+- `picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with `picture_description_local`.
+- `include_images` (bool): If enabled, images will be extracted from the document. Defaults to false.
+- `images_scale` (float): Scale factor for images. Defaults to 2.0.
+
+## Convert endpoints
+
+### Source endpoint
+
+The endpoint is `/v1/convert/source`, listening for POST requests of JSON payloads.
+
+On top of the above parameters, you must send the URL(s) of the document you want process with either the `http_sources` or `file_sources` fields.
+The first is fetching URL(s) (optionally using with extra headers), the second allows to provide documents as base64-encoded strings.
+No `options` is required, they can be partially or completely omitted.
+
+Simple payload example:
+
+```json
+{
+  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
+}
+```
+
+<details>
+
+<summary>Complete payload example:</summary>
+
+```json
+{
+  "options": {
+    "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
+    "to_formats": ["md", "json", "html", "text", "doctags"],
+    "image_export_mode": "placeholder",
+    "do_ocr": true,
+    "force_ocr": false,
+    "ocr_engine": "easyocr",
+    "ocr_lang": ["en"],
+    "pdf_backend": "dlparse_v2",
+    "table_mode": "fast",
+    "abort_on_error": false,
+  },
+  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
+}
+```
+
+</details>
+
+<details>
+
+<summary>CURL example:</summary>
+
+```sh
+curl -X 'POST' \
+  'http://localhost:5001/v1/convert/source' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "options": {
+    "from_formats": [
+      "docx",
+      "pptx",
+      "html",
+      "image",
+      "pdf",
+      "asciidoc",
+      "md",
+      "xlsx"
+    ],
+    "to_formats": ["md", "json", "html", "text", "doctags"],
+    "image_export_mode": "placeholder",
+    "do_ocr": true,
+    "force_ocr": false,
+    "ocr_engine": "easyocr",
+    "ocr_lang": [
+      "fr",
+      "de",
+      "es",
+      "en"
+    ],
+    "pdf_backend": "dlparse_v2",
+    "table_mode": "fast",
+    "abort_on_error": false,
+    "do_table_structure": true,
+    "include_images": true,
+    "images_scale": 2
+  },
+  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
+}'
+```
+
+</details>
+
+<details>
+<summary>Python example:</summary>
+
+```python
+import httpx
+
+async_client = httpx.AsyncClient(timeout=60.0)
+url = "http://localhost:5001/v1/convert/source"
+payload = {
+  "options": {
+    "from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
+    "to_formats": ["md", "json", "html", "text", "doctags"],
+    "image_export_mode": "placeholder",
+    "do_ocr": True,
+    "force_ocr": False,
+    "ocr_engine": "easyocr",
+    "ocr_lang": "en",
+    "pdf_backend": "dlparse_v2",
+    "table_mode": "fast",
+    "abort_on_error": False,
+  },
+  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
+}
+
+response = await async_client_client.post(url, json=payload)
+
+data = response.json()
+```
+
+</details>
+
+#### File as base64
+
+The `file_sources` argument in the endpoint allows to send files as base64-encoded strings.
+When your PDF or other file type is too large, encoding it and passing it inline to curl
+can lead to an “Argument list too long” error on some systems. To avoid this, we write
+the JSON request body to a file and have curl read from that file.
+
+<details>
+<summary>CURL steps:</summary>
+
+```sh
+# 1. Base64-encode the file
+B64_DATA=$(base64 -w 0 /path/to/file/pdf-to-convert.pdf)
+
+# 2. Build the JSON with your options
+cat <<EOF > /tmp/request_body.json
+{
+  "options": {
+  },
+  "file_sources": [{
+    "base64_string": "${B64_DATA}",
+    "filename": "pdf-to-convert.pdf"
+  }]
+}
+EOF
+
+# 3. POST the request to the docling service
+curl -X POST "localhost:5001/v1/convert/source" \
+     -H "Content-Type: application/json" \
+     -d @/tmp/request_body.json
+```
+
+</details>
+
+### File endpoint
+
+The endpoint is: `/v1/convert/file`, listening for POST requests of Form payloads (necessary as the files are sent as multipart/form data). You can send one or multiple files.
+
+<details>
+<summary>CURL example:</summary>
+
+```sh
+curl -X 'POST' \
+  'http://127.0.0.1:5001/v1/convert/file' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: multipart/form-data' \
+  -F 'ocr_engine=easyocr' \
+  -F 'pdf_backend=dlparse_v2' \
+  -F 'from_formats=pdf' \
+  -F 'from_formats=docx' \
+  -F 'force_ocr=false' \
+  -F 'image_export_mode=embedded' \
+  -F 'ocr_lang=en' \
+  -F 'ocr_lang=pl' \
+  -F 'table_mode=fast' \
+  -F 'files=@2206.01062v1.pdf;type=application/pdf' \
+  -F 'abort_on_error=false' \
+  -F 'to_formats=md' \
+  -F 'to_formats=text' \
+  -F 'do_ocr=true'
+```
+
+</details>
+
+<details>
+<summary>Python example:</summary>
+
+```python
+import httpx
+
+async_client = httpx.AsyncClient(timeout=60.0)
+url = "http://localhost:5001/v1/convert/file"
+parameters = {
+"from_formats": ["docx", "pptx", "html", "image", "pdf", "asciidoc", "md", "xlsx"],
+"to_formats": ["md", "json", "html", "text", "doctags"],
+"image_export_mode": "placeholder",
+"do_ocr": True,
+"force_ocr": False,
+"ocr_engine": "easyocr",
+"ocr_lang": ["en"],
+"pdf_backend": "dlparse_v2",
+"table_mode": "fast",
+"abort_on_error": False,
+}
+
+current_dir = os.path.dirname(__file__)
+file_path = os.path.join(current_dir, '2206.01062v1.pdf')
+
+files = {
+    'files': ('2206.01062v1.pdf', open(file_path, 'rb'), 'application/pdf'),
+}
+
+response = await async_client.post(url, files=files, data=parameters)
+assert response.status_code == 200, "Response should be 200 OK"
+
+data = response.json()
+```
+
+</details>
+
+### Picture description options
+
+When the picture description enrichment is activated, users may specify which model and which execution mode to use for this task. There are two choices for the execution mode: _local_ will run the vision-language model directly, _api_ will invoke an external API endpoint.
+
+The local option is specified with:
+
+```jsonc
+{
+  "picture_description_local": {
+    "repo_id": "",  // Repository id from the Hugging Face Hub.
+    "generation_config": {"max_new_tokens": 200, "do_sample": false},  // HF generation config.
+    "prompt": "Describe this image in a few sentences. ",  // Prompt used when calling the vision-language model.
+  }
+}
+```
+
+The possible values for `generation_config` are documented in the [Hugging Face text generation docs](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig).
+
+The api option is specified with:
+
+```jsonc
+{
+  "picture_description_api": {
+    "url": "",  // Endpoint which accepts openai-api compatible requests.
+    "headers": {},  // Headers used for calling the API endpoint. For example, it could include authentication headers.
+    "params": {},  // Model parameters.
+    "timeout": 20,  // Timeout for the API request.
+    "prompt": "Describe this image in a few sentences. ",  // Prompt used when calling the vision-language model.
+  }
+}
+```
+
+Example URLs are:
+
+- `http://localhost:8000/v1/chat/completions` for the local vllm api, with example `picture_description_api`:
+  - the `HuggingFaceTB/SmolVLM-256M-Instruct` model
+
+    ```json
+    {
+      "url": "http://localhost:8000/v1/chat/completions",
+      "params": {
+        "model": "HuggingFaceTB/SmolVLM-256M-Instruct",
+        "max_completion_tokens": 200,
+      }
+    }
+    ```
+
+  - the `ibm-granite/granite-vision-3.2-2b` model
+
+    ```json
+    {
+      "url": "http://localhost:8000/v1/chat/completions",
+      "params": {
+        "model": "ibm-granite/granite-vision-3.2-2b",
+        "max_completion_tokens": 200,
+      }
+    }
+    ```
+
+- `http://localhost:11434/v1/chat/completions` for the local Ollama api, with example `picture_description_api`:
+  - the `granite3.2-vision:2b` model
+
+    ```json
+    {
+      "url": "http://localhost:11434/v1/chat/completions",
+      "params": {
+        "model": "granite3.2-vision:2b"
+      }
+    }
+    ```
+
+Note that when using `picture_description_api`, the server must be launched with `DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true`.
+
+## Response format
+
+The response can be a JSON Document or a File.
+
+- If you process only one file, the response will be a JSON document with the following format:
+
+  ```jsonc
+  {
+    "document": {
+      "md_content": "",
+      "json_content": {},
+      "html_content": "",
+      "text_content": "",
+      "doctags_content": ""
+      },
+    "status": "<success|partial_success|skipped|failure>",
+    "processing_time": 0.0,
+    "timings": {},
+    "errors": []
+  }
+  ```
+
+  Depending on the value you set in `output_formats`, the different items will be populated with their respective results or empty.
+
+  `processing_time` is the Docling processing time in seconds, and `timings` (when enabled in the backend) provides the detailed
+  timing of all the internal Docling components.
+
+- If you set the parameter `target` to the zip mode, the response will be a zip file.
+- If multiple files are generated (multiple inputs, or one input but multiple outputs with the zip target mode), the response will be a zip file.
+
+## Asynchronous API
+
+Both `/v1/convert/source` and `/v1/convert/file` endpoints are available as asynchronous variants.
+The advantage of the asynchronous endpoints is the possible to interrupt the connection, check for the progress update and fetch the result.
+This approach is more resilient against network instabilities and allows the client application logic to easily interleave conversion with other tasks.
+
+Launch an asynchronous conversion with:
+
+- `POST /v1/convert/source/async` when providing the input as sources.
+- `POST /v1/convert/file/async` when providing the input as multipart-form files.
+
+The response format is a task detail:
+
+```jsonc
+{
+  "task_id": "<task_id>",  // the task_id which can be used for the next operations
+  "task_status": "pending|started|success|failure",  // the task status
+  "task_position": 1,  // the position in the queue
+  "task_meta": null,  // metadata e.g. how many documents are in the total job and how many have been converted
+}
+```
+
+### Polling status
+
+For checking the progress of the conversion task and wait for its completion, use the endpoint:
+
+- `GET /v1/status/poll/{task_id}`
+
+<details>
+<summary>Example waiting loop:</summary>
+
+```python
+import time
+import httpx
+
+# ...
+# response from the async task submission
+task = response.json()
+
+while task["task_status"] not in ("success", "failure"):
+    response = httpx.get(f"{base_url}/status/poll/{task['task_id']}")
+    task = response.json()
+
+    time.sleep(5)
+```
+
+<details>
+
+### Subscribe with websockets
+
+Using websocket you can get the client application being notified about updates of the conversion task.
+To start the websocket connection, use the endpoint:
+
+- `/v1/status/ws/{task_id}`
+
+Websocket messages are JSON object with the following structure:
+
+```jsonc
+{
+  "message": "connection|update|error",  // type of message being sent
+  "task": {},  // the same content of the task description
+  "error": "",  // description of the error
+}
+```
+
+<details>
+<summary>Example websocket usage:</summary>
+
+```python
+from websockets.sync.client import connect
+
+uri = f"ws://{base_url}/v1/status/ws/{task['task_id']}"
+with connect(uri) as websocket:
+    for message in websocket:
+        try:
+            payload = json.loads(message)
+            if payload["message"] == "error":
+                break
+            if payload["message"] == "error" and payload["task"]["task_status"] in ("success", "failure"):
+                break
+        except:
+          break
+```
+
+</details>
+
+### Fetch results
+
+When the task is completed, the result can be fetched with the endpoint:
+
+- `GET /v1/result/{task_id}`
--- a/docs/v1_migration.md
+++ b/docs/v1_migration.md
@@ -0,0 +1,80 @@
+# Migration to the `v1` API
+
+Docling Serve from the initial prototype `v1alpha` API to the stable `v1` API.
+This page provides simple instructions to upgrade your application to the new API.
+
+## API changes
+
+The breaking changes introduced in the `v1` release of Docling Serve are designed to provide a stable schema which
+allows the project to provide new capabilities as new type of input sources, targets and also the definition of callback for event-driven applications.
+
+### Endpoint names
+
+All endpoints are renamed from `/v1alpha/` to `/v1/`.
+
+### Sources
+
+When using the `/v1/convert/source` endpoint, input documents have to be specified with the `sources: []` argument, which is replacing the usage of `file_sources` and `http_sources`.
+
+Old version:
+
+```jsonc
+{
+    "options": {},  // conversion options
+    "file_sources": [  // input documents provided as base64-encoded strings
+        {"base64_string": "abc123...", "filename": "file.pdf"}
+    ],
+    "http_sources": [  // input documents provided as http urls
+        {"url": "https://..."}
+    ]
+}
+```
+
+New version:
+
+```jsonc
+{
+    "options": {},  // conversion options
+    "sources": [
+        // input document provided as base64-encoded string
+        {"kind": "kind", "base64_string": "abc123...", "filename": "file.pdf"},
+        // input document provided as http urls
+        {"kind": "http", "url": "https://..."},
+    ]
+}
+```
+
+### Targets
+
+Switching between output formats, i.e. from the JSON inbody response to the zip archive response, users have to specify the `target` argument, which is replacing the usage of `options.return_as_file`.
+
+Old version:
+
+```jsonc
+{
+    "options": {
+        "return_as_file": true  // <-- to be removed
+    },
+    // ...
+}
+```
+
+New version:
+
+```jsonc
+{
+    "options": {},
+    "target": {"kind": "zip"},  // <-- add this
+    // ...
+}
+```
+
+## Continue with the old API
+
+If you are not able to apply the changes above to your application, please consider pinning of the previous `v0.x` container images, e.g.
+
+```sh
+podman run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=1 quay.io/docling-project/docling-serve:v0.16.1
+```
+
+_Note that the old prototype API will not be supported in new `v1.x` versions._
--- a/os-packages.txt
+++ b/os-packages.txt
@@ -1,6 +1,7 @@
 tesseract
 tesseract-devel
 tesseract-langpack-eng
+tesseract-osd
 leptonica-devel
 libglvnd-glx
 glib2
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "docling-serve"
-version = "0.5.1"  # DO NOT EDIT, updated automatically
+version = "1.1.0"  # DO NOT EDIT, updated automatically
 description = "Running Docling as a service"
 license = {text = "MIT"}
 authors = [
@@ -23,14 +23,20 @@ readme = "README.md"
 classifiers = [
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
-    # "Development Status :: 5 - Production/Stable",
+    "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "Typing :: Typed",
-    "Programming Language :: Python :: 3"
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
 ]
 requires-python = ">=3.10"
 dependencies = [
-    "docling~=2.25.1",
+    "docling~=2.38",
+    "docling-core>=2.44.1",
+    "docling-jobkit[kfp,vlm]~=1.2",
    "fastapi[standard]~=0.115",
    "httpx~=0.28",
    "pydantic~=2.10",
@@ -39,11 +45,14 @@ dependencies = [
    "typer~=0.12",
    "uvicorn[standard]>=0.29.0,<1.0.0",
    "websockets~=14.0",
+    "scalar-fastapi>=1.0.3",
+    "docling-mcp>=1.0.0",
 ]

 [project.optional-dependencies]
 ui = [
-    "gradio~=5.9"
+    "gradio~=5.9",
+    "pydantic<2.11.0",  # fix compatibility between gradio and new pydantic 2.11
 ]
 tesserocr = [
    "tesserocr~=2.7"
@@ -52,6 +61,25 @@ rapidocr = [
    "rapidocr-onnxruntime~=1.4; python_version<'3.13'",
    "onnxruntime~=1.7",
 ]
+flash-attn = [
+  "flash-attn~=2.7.0; sys_platform == 'linux' and platform_machine == 'x86_64'"
+]
+
+[dependency-groups]
+dev = [
+    "asgi-lifespan~=2.0",
+    "mypy~=1.11",
+    "pre-commit-uv~=4.1",
+    "pytest~=8.3",
+    "pytest-asyncio~=0.24",
+    "pytest-check~=2.4",
+    "python-semantic-release~=7.32",
+    "ruff>=0.9.6",
+]
+pypi = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
 cpu = [
  "torch>=2.6.0",
  "torchvision>=0.21.0",
@@ -60,36 +88,54 @@ cu124 = [
  "torch>=2.6.0",
  "torchvision>=0.21.0",
 ]
-
-[dependency-groups]
-dev = [
-    "mypy~=1.11",
-    "pre-commit-uv~=4.1",
-    "pytest~=8.3",
-    "pytest-asyncio~=0.24",
-    "pytest-check~=2.4",
-    "python-semantic-release~=7.32",
-    "ruff>=0.9.6",
+cu126 = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cu128 = [
+  "torch>=2.7.0",
+  "torchvision>=0.22.0",
 ]

 [tool.uv]
 package = true
+default-groups = ["dev", "pypi"]
 conflicts = [
  [
-    { extra = "cpu" },
-    { extra = "cu124" },
+    { group = "pypi" },
+    { group = "cpu" },
+    { group = "cu124" },
+    { group = "cu126" },
+    { group = "cu128" },
  ],
 ]
+environments = ["sys_platform != 'darwin' or platform_machine != 'x86_64'"]
+override-dependencies = [
+  "urllib3~=2.0"
+]

 [tool.uv.sources]
 torch = [
-  { index = "pytorch-cpu", extra = "cpu" },
-  { index = "pytorch-cu124", extra = "cu124" },
+  { index = "pytorch-pypi", group = "pypi" },
+  { index = "pytorch-cpu", group = "cpu" },
+  { index = "pytorch-cu124", group = "cu124" },
+  { index = "pytorch-cu126", group = "cu126" },
+  { index = "pytorch-cu128", group = "cu128" },
 ]
 torchvision = [
-  { index = "pytorch-cpu", extra = "cpu" },
-  { index = "pytorch-cu124", extra = "cu124" },
+  { index = "pytorch-pypi", group = "pypi" },
+  { index = "pytorch-cpu", group = "cpu" },
+  { index = "pytorch-cu124", group = "cu124" },
+  { index = "pytorch-cu126", group = "cu126" },
+  { index = "pytorch-cu128", group = "cu128" },
 ]
+# docling-jobkit = { git = "https://github.com/docling-project/docling-jobkit/", rev = "main" }
+# docling-jobkit = { path = "../docling-jobkit", editable = true }
+
+[[tool.uv.index]]
+name = "pytorch-pypi"
+url = "https://pypi.org/simple"
+explicit = true

 [[tool.uv.index]]
 name = "pytorch-cpu"
@@ -101,6 +147,16 @@ name = "pytorch-cu124"
 url = "https://download.pytorch.org/whl/cu124"
 explicit = true

+[[tool.uv.index]]
+name = "pytorch-cu126"
+url = "https://download.pytorch.org/whl/cu126"
+explicit = true
+
+[[tool.uv.index]]
+name = "pytorch-cu128"
+url = "https://download.pytorch.org/whl/cu128"
+explicit = true
+
 [tool.setuptools.packages.find]
 include = ["docling_serve*"]
 namespaces = true
@@ -109,11 +165,11 @@ namespaces = true
 docling-serve = "docling_serve.__main__:main"

 [project.urls]
-Homepage = "https://github.com/DS4SD/docling-serve"
+Homepage = "https://github.com/docling-project/docling-serve"
 # Documentation = "https://ds4sd.github.io/docling"
-Repository = "https://github.com/DS4SD/docling-serve"
-Issues = "https://github.com/DS4SD/docling-serve/issues"
-Changelog = "https://github.com/DS4SD/docling-serve/blob/main/CHANGELOG.md"
+Repository = "https://github.com/docling-project/docling-serve"
+Issues = "https://github.com/docling-project/docling-serve/issues"
+Changelog = "https://github.com/docling-project/docling-serve/blob/main/CHANGELOG.md"

 [tool.ruff]
 target-version = "py310"
@@ -169,7 +225,7 @@ ignore = [
 max-complexity = 15

 [tool.ruff.lint.isort.sections]
-"docling" = ["docling", "docling_core"]
+"docling" = ["docling", "docling_core", "docling_jobkit"]

 [tool.ruff.lint.isort]
 combine-as-imports = true
@@ -195,6 +251,10 @@ module = [
    "tesserocr.*",
    "rapidocr_onnxruntime.*",
    "requests.*",
+    "kfp.*",
+    "kfp_server_api.*",
+    "mlx_vlm.*",
+    "scalar_fastapi.*",
 ]
 ignore_missing_imports = true

--- a/tests/test_1-file-all-outputs.py
+++ b/tests/test_1-file-all-outputs.py
@@ -16,7 +16,7 @@ async def async_client():
@pytest.mark.asyncio
 async def test_convert_file(async_client):
    """Test convert single file to all outputs"""
-    url = "http://localhost:5001/v1alpha/convert/file"
+    url = "http://localhost:5001/v1/convert/file"
    options = {
        "from_formats": [
            "docx",
@@ -37,7 +37,6 @@ async def test_convert_file(async_client):
        "pdf_backend": "dlparse_v2",
        "table_mode": "fast",
        "abort_on_error": False,
-        "return_as_file": False,
    }

    current_dir = os.path.dirname(__file__)
@@ -47,9 +46,7 @@ async def test_convert_file(async_client):
        "files": ("2206.01062v1.pdf", open(file_path, "rb"), "application/pdf"),
    }

-    response = await async_client.post(
-        url, files=files, data={"options": json.dumps(options)}
-    )
+    response = await async_client.post(url, files=files, data=options)
    assert response.status_code == 200, "Response should be 200 OK"

    data = response.json()
@@ -92,16 +89,11 @@ async def test_convert_file(async_client):
            msg=f'JSON document should contain \'{{\\n  "schema_name": "DoclingDocument\'". Received: {safe_slice(data["document"]["json_content"])}',
        )
    # HTML check
-    check.is_in(
-        "html_content",
-        data.get("document", {}),
-        msg=f"Response should contain 'html_content' key. Received keys: {list(data.get('document', {}).keys())}",
-    )
    if data.get("document", {}).get("html_content") is not None:
        check.is_in(
-            '<!DOCTYPE html>\n<html lang="en">\n<head>',
+            "<!DOCTYPE html>\n<html>\n<head>",
            data["document"]["html_content"],
-            msg=f"HTML document should contain '<!DOCTYPE html>\\n<html lang=\"en'>. Received: {safe_slice(data['document']['html_content'])}",
+            msg=f"HTML document should contain '<!DOCTYPE html>\\n<html>'. Received: {safe_slice(data['document']['html_content'])}",
        )
    # Text check
    check.is_in(
@@ -123,7 +115,7 @@ async def test_convert_file(async_client):
    )
    if data.get("document", {}).get("doctags_content") is not None:
        check.is_in(
-            "<document>\n<section_header_level_1><location>",
+            "<doctag><page_header><loc",
            data["document"]["doctags_content"],
-            msg=f"DocTags document should contain '<document>\\n<section_header_level_1><location>'. Received: {safe_slice(data['document']['doctags_content'])}",
+            msg=f"DocTags document should contain '<doctag><page_header><loc'. Received: {safe_slice(data['document']['doctags_content'])}",
        )
--- a/tests/test_1-file-async.py
+++ b/tests/test_1-file-async.py
@@ -0,0 +1,70 @@
+import json
+import time
+from pathlib import Path
+
+import httpx
+import pytest
+import pytest_asyncio
+
+
+@pytest_asyncio.fixture
+async def async_client():
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_url(async_client):
+    """Test convert URL to all outputs"""
+
+    base_url = "http://localhost:5001/v1"
+    payload = {
+        "to_formats": ["md", "json", "html"],
+        "image_export_mode": "placeholder",
+        "ocr": False,
+        "abort_on_error": False,
+    }
+
+    file_path = Path(__file__).parent / "2206.01062v1.pdf"
+    files = {
+        "files": (file_path.name, file_path.open("rb"), "application/pdf"),
+    }
+
+    for n in range(1):
+        response = await async_client.post(
+            f"{base_url}/convert/file/async", files=files, data=payload
+        )
+        assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    print(json.dumps(task, indent=2))
+
+    while task["task_status"] not in ("success", "failure"):
+        response = await async_client.get(f"{base_url}/status/poll/{task['task_id']}")
+        assert response.status_code == 200, "Response should be 200 OK"
+        task = response.json()
+        print(f"{task['task_status']=}")
+        print(f"{task['task_position']=}")
+
+        time.sleep(2)
+
+    assert task["task_status"] == "success"
+    print(f"Task completed with status {task['task_status']=}")
+
+    result_resp = await async_client.get(f"{base_url}/result/{task['task_id']}")
+    assert result_resp.status_code == 200, "Response should be 200 OK"
+    result = result_resp.json()
+    print("Got result.")
+
+    assert "md_content" in result["document"]
+    assert result["document"]["md_content"] is not None
+    assert len(result["document"]["md_content"]) > 10
+
+    assert "html_content" in result["document"]
+    assert result["document"]["html_content"] is not None
+    assert len(result["document"]["html_content"]) > 10
+
+    assert "json_content" in result["document"]
+    assert result["document"]["json_content"] is not None
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
--- a/tests/test_1-url-all-outputs.py
+++ b/tests/test_1-url-all-outputs.py
@@ -15,7 +15,7 @@ async def async_client():
@pytest.mark.asyncio
 async def test_convert_url(async_client):
    """Test convert URL to all outputs"""
-    url = "http://localhost:5001/v1alpha/convert/source"
+    url = "http://localhost:5001/v1/convert/source"
    payload = {
        "options": {
            "from_formats": [
@@ -37,9 +37,8 @@ async def test_convert_url(async_client):
            "pdf_backend": "dlparse_v2",
            "table_mode": "fast",
            "abort_on_error": False,
-            "return_as_file": False,
        },
-        "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}],
+        "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2206.01062"}],
    }
    print(json.dumps(payload, indent=2))

@@ -93,9 +92,9 @@ async def test_convert_url(async_client):
    )
    if data.get("document", {}).get("html_content") is not None:
        check.is_in(
-            '<!DOCTYPE html>\n<html lang="en">\n<head>',
+            "<!DOCTYPE html>\n<html>\n<head>",
            data["document"]["html_content"],
-            msg=f"HTML document should contain '<!DOCTYPE html>\\n<html lang=\"en'>. Received: {safe_slice(data['document']['html_content'])}",
+            msg=f"HTML document should contain '<!DOCTYPE html>\\n<html>'. Received: {safe_slice(data['document']['html_content'])}",
        )
    # Text check
    check.is_in(
@@ -117,7 +116,7 @@ async def test_convert_url(async_client):
    )
    if data.get("document", {}).get("doctags_content") is not None:
        check.is_in(
-            "<document>\n<section_header_level_1><location>",
+            "<doctag><page_header><loc",
            data["document"]["doctags_content"],
-            msg=f"DocTags document should contain '<document>\\n<section_header_level_1><location>'. Received: {safe_slice(data['document']['doctags_content'])}",
+            msg=f"DocTags document should contain '<doctag><page_header><loc'. Received: {safe_slice(data['document']['doctags_content'])}",
        )
--- a/tests/test_1-url-async-ws.py
+++ b/tests/test_1-url-async-ws.py
@@ -20,17 +20,32 @@ async def test_convert_url(async_client: httpx.AsyncClient):
    doc_filename = Path("tests/2408.09869v5.pdf")
    encoded_doc = base64.b64encode(doc_filename.read_bytes()).decode()

-    base_url = "http://localhost:5001/v1alpha"
+    base_url = "http://localhost:5001/v1"
    payload = {
        "options": {
            "to_formats": ["md", "json"],
            "image_export_mode": "placeholder",
            "ocr": True,
            "abort_on_error": False,
-            "return_as_file": False,
+            # "do_picture_description": True,
+            # "picture_description_api": {
+            #     "url": "http://localhost:11434/v1/chat/completions",
+            #     "params": {
+            #         "model": "granite3.2-vision:2b",
+            #     }
+            # },
+            # "picture_description_local": {
+            #     "repo_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
+            # },
        },
-        # "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}],
-        "file_sources": [{"base64_string": encoded_doc, "filename": doc_filename.name}],
+        # "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}],
+        "sources": [
+            {
+                "kind": "file",
+                "base64_string": encoded_doc,
+                "filename": doc_filename.name,
+            }
+        ],
    }
    # print(json.dumps(payload, indent=2))

@@ -42,7 +57,7 @@ async def test_convert_url(async_client: httpx.AsyncClient):

    task = response.json()

-    uri = f"ws://localhost:5001/v1alpha/status/ws/{task['task_id']}"
+    uri = f"ws://localhost:5001/v1/status/ws/{task['task_id']}"
    with connect(uri) as websocket:
        for message in websocket:
            print(message)
--- a/tests/test_1-url-async.py
+++ b/tests/test_1-url-async.py
@@ -25,20 +25,19 @@ async def test_convert_url(async_client):
        "https://arxiv.org/pdf/2311.18481",
    ]

-    base_url = "http://localhost:5001/v1alpha"
+    base_url = "http://localhost:5001/v1"
    payload = {
        "options": {
            "to_formats": ["md", "json"],
            "image_export_mode": "placeholder",
            "ocr": True,
            "abort_on_error": False,
-            "return_as_file": False,
        },
-        "http_sources": [{"url": random.choice(example_docs)}],
+        "sources": [{"kind": "http", "url": random.choice(example_docs)}],
    }
    print(json.dumps(payload, indent=2))

-    for n in range(5):
+    for n in range(3):
        response = await async_client.post(
            f"{base_url}/convert/source/async", json=payload
        )
--- a/tests/test_2-files-all-outputs.py
+++ b/tests/test_2-files-all-outputs.py
@@ -1,4 +1,3 @@
-import json
 import os

 import httpx
@@ -16,7 +15,7 @@ async def async_client():
@pytest.mark.asyncio
 async def test_convert_file(async_client):
    """Test convert single file to all outputs"""
-    url = "http://localhost:5001/v1alpha/convert/file"
+    url = "http://localhost:5001/v1/convert/file"
    options = {
        "from_formats": [
            "docx",
@@ -37,7 +36,6 @@ async def test_convert_file(async_client):
        "pdf_backend": "dlparse_v2",
        "table_mode": "fast",
        "abort_on_error": False,
-        "return_as_file": False,
    }

    current_dir = os.path.dirname(__file__)
@@ -48,9 +46,7 @@ async def test_convert_file(async_client):
        ("files", ("2408.09869v5.pdf", open(file_path, "rb"), "application/pdf")),
    ]

-    response = await async_client.post(
-        url, files=files, data={"options": json.dumps(options)}
-    )
+    response = await async_client.post(url, files=files, data=options)
    assert response.status_code == 200, "Response should be 200 OK"

    # Check for zip file attachment
--- a/tests/test_2-urls-all-outputs.py
+++ b/tests/test_2-urls-all-outputs.py
@@ -13,7 +13,7 @@ async def async_client():
@pytest.mark.asyncio
 async def test_convert_url(async_client):
    """Test convert URL to all outputs"""
-    url = "http://localhost:5001/v1alpha/convert/source"
+    url = "http://localhost:5001/v1/convert/source"
    payload = {
        "options": {
            "from_formats": [
@@ -35,12 +35,12 @@ async def test_convert_url(async_client):
            "pdf_backend": "dlparse_v2",
            "table_mode": "fast",
            "abort_on_error": False,
-            "return_as_file": False,
        },
-        "http_sources": [
-            {"url": "https://arxiv.org/pdf/2206.01062"},
-            {"url": "https://arxiv.org/pdf/2408.09869"},
+        "sources": [
+            {"kind": "http", "url": "https://arxiv.org/pdf/2206.01062"},
+            {"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"},
        ],
+        "target": {"kind": "zip"},
    }

    response = await async_client.post(url, json=payload)
--- a/tests/test_2-urls-async-all-outputs.py
+++ b/tests/test_2-urls-async-all-outputs.py
@@ -0,0 +1,88 @@
+import json
+import time
+
+import httpx
+import pytest
+import pytest_asyncio
+from pytest_check import check
+
+
+@pytest_asyncio.fixture
+async def async_client():
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_url(async_client):
+    """Test convert URL to all outputs"""
+    base_url = "http://localhost:5001/v1"
+    payload = {
+        "options": {
+            "from_formats": [
+                "docx",
+                "pptx",
+                "html",
+                "image",
+                "pdf",
+                "asciidoc",
+                "md",
+                "xlsx",
+            ],
+            "to_formats": ["md", "json", "html", "text", "doctags"],
+            "image_export_mode": "placeholder",
+            "ocr": True,
+            "force_ocr": False,
+            "ocr_engine": "easyocr",
+            "ocr_lang": ["en"],
+            "pdf_backend": "dlparse_v2",
+            "table_mode": "fast",
+            "abort_on_error": False,
+        },
+        "sources": [
+            {"kind": "http", "url": "https://arxiv.org/pdf/2206.01062"},
+            {"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"},
+        ],
+        "target": {"kind": "zip"},
+    }
+
+    response = await async_client.post(f"{base_url}/convert/source/async", json=payload)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    print(json.dumps(task, indent=2))
+
+    while task["task_status"] not in ("success", "failure"):
+        response = await async_client.get(f"{base_url}/status/poll/{task['task_id']}")
+        assert response.status_code == 200, "Response should be 200 OK"
+        task = response.json()
+        print(f"{task['task_status']=}")
+        print(f"{task['task_position']=}")
+
+        time.sleep(2)
+
+    assert task["task_status"] == "success"
+
+    result_resp = await async_client.get(f"{base_url}/result/{task['task_id']}")
+    assert result_resp.status_code == 200, "Response should be 200 OK"
+
+    # Check for zip file attachment
+    content_disposition = result_resp.headers.get("content-disposition")
+
+    with check:
+        assert content_disposition is not None, (
+            "Content-Disposition header should be present"
+        )
+    with check:
+        assert "attachment" in content_disposition, "Response should be an attachment"
+    with check:
+        assert 'filename="converted_docs.zip"' in content_disposition, (
+            "Attachment filename should be 'converted_docs.zip'"
+        )
+
+    content_type = result_resp.headers.get("content-type")
+    with check:
+        assert content_type == "application/zip", (
+            "Content-Type should be 'application/zip'"
+        )
--- a/tests/test_fastapi_endpoints.py
+++ b/tests/test_fastapi_endpoints.py
@@ -0,0 +1,193 @@
+import asyncio
+import io
+import json
+import os
+import zipfile
+
+import pytest
+import pytest_asyncio
+from asgi_lifespan import LifespanManager
+from httpx import ASGITransport, AsyncClient
+from pytest_check import check
+
+from docling_core.types.doc import DoclingDocument, PictureItem
+
+from docling_serve.app import create_app
+
+
+@pytest.fixture(scope="session")
+def event_loop():
+    return asyncio.get_event_loop()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def app():
+    app = create_app()
+
+    async with LifespanManager(app) as manager:
+        print("Launching lifespan of app.")
+        yield manager.app
+
+
+@pytest_asyncio.fixture(scope="session")
+async def client(app):
+    async with AsyncClient(
+        transport=ASGITransport(app=app), base_url="http://app.io"
+    ) as client:
+        print("Client is ready")
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_health(client: AsyncClient):
+    response = await client.get("/health")
+    assert response.status_code == 200
+    assert response.json() == {"status": "ok"}
+
+
+@pytest.mark.asyncio
+async def test_convert_file(client: AsyncClient):
+    """Test convert single file to all outputs"""
+
+    endpoint = "/v1/convert/file"
+    options = {
+        "from_formats": [
+            "docx",
+            "pptx",
+            "html",
+            "image",
+            "pdf",
+            "asciidoc",
+            "md",
+            "xlsx",
+        ],
+        "to_formats": ["md", "json", "html", "text", "doctags"],
+        "image_export_mode": "placeholder",
+        "ocr": True,
+        "force_ocr": False,
+        "ocr_engine": "easyocr",
+        "ocr_lang": ["en"],
+        "pdf_backend": "dlparse_v2",
+        "table_mode": "fast",
+        "abort_on_error": False,
+    }
+
+    current_dir = os.path.dirname(__file__)
+    file_path = os.path.join(current_dir, "2206.01062v1.pdf")
+
+    files = {
+        "files": ("2206.01062v1.pdf", open(file_path, "rb"), "application/pdf"),
+    }
+
+    response = await client.post(endpoint, files=files, data=options)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    data = response.json()
+
+    # Response content checks
+    # Helper function to safely slice strings
+    def safe_slice(value, length=100):
+        if isinstance(value, str):
+            return value[:length]
+        return str(value)  # Convert non-string values to string for debug purposes
+
+    # Document check
+    check.is_in(
+        "document",
+        data,
+        msg=f"Response should contain 'document' key. Received keys: {list(data.keys())}",
+    )
+    # MD check
+    check.is_in(
+        "md_content",
+        data.get("document", {}),
+        msg=f"Response should contain 'md_content' key. Received keys: {list(data.get('document', {}).keys())}",
+    )
+    if data.get("document", {}).get("md_content") is not None:
+        check.is_in(
+            "## DocLayNet: ",
+            data["document"]["md_content"],
+            msg=f"Markdown document should contain 'DocLayNet: '. Received: {safe_slice(data['document']['md_content'])}",
+        )
+    # JSON check
+    check.is_in(
+        "json_content",
+        data.get("document", {}),
+        msg=f"Response should contain 'json_content' key. Received keys: {list(data.get('document', {}).keys())}",
+    )
+    if data.get("document", {}).get("json_content") is not None:
+        check.is_in(
+            '{"schema_name": "DoclingDocument"',
+            json.dumps(data["document"]["json_content"]),
+            msg=f'JSON document should contain \'{{\\n  "schema_name": "DoclingDocument\'". Received: {safe_slice(data["document"]["json_content"])}',
+        )
+    # HTML check
+    check.is_in(
+        "html_content",
+        data.get("document", {}),
+        msg=f"Response should contain 'html_content' key. Received keys: {list(data.get('document', {}).keys())}",
+    )
+    if data.get("document", {}).get("html_content") is not None:
+        check.is_in(
+            "<!DOCTYPE html>\n<html>\n<head>",
+            data["document"]["html_content"],
+            msg=f"HTML document should contain '<!DOCTYPE html>\n<html>\n<head>'. Received: {safe_slice(data['document']['html_content'])}",
+        )
+    # Text check
+    check.is_in(
+        "text_content",
+        data.get("document", {}),
+        msg=f"Response should contain 'text_content' key. Received keys: {list(data.get('document', {}).keys())}",
+    )
+    if data.get("document", {}).get("text_content") is not None:
+        check.is_in(
+            "DocLayNet: A Large Human-Annotated Dataset",
+            data["document"]["text_content"],
+            msg=f"Text document should contain 'DocLayNet: A Large Human-Annotated Dataset'. Received: {safe_slice(data['document']['text_content'])}",
+        )
+    # DocTags check
+    check.is_in(
+        "doctags_content",
+        data.get("document", {}),
+        msg=f"Response should contain 'doctags_content' key. Received keys: {list(data.get('document', {}).keys())}",
+    )
+    if data.get("document", {}).get("doctags_content") is not None:
+        check.is_in(
+            "<doctag><page_header>",
+            data["document"]["doctags_content"],
+            msg=f"DocTags document should contain '<doctag><page_header>'. Received: {safe_slice(data['document']['doctags_content'])}",
+        )
+
+
+@pytest.mark.asyncio
+async def test_referenced_artifacts(client: AsyncClient):
+    """Test that paths in the zip file are relative to the zip file root."""
+
+    endpoint = "/v1/convert/file"
+    options = {
+        "to_formats": ["json"],
+        "image_export_mode": "referenced",
+        "target_type": "zip",
+        "ocr": False,
+    }
+
+    current_dir = os.path.dirname(__file__)
+    file_path = os.path.join(current_dir, "2206.01062v1.pdf")
+
+    files = {
+        "files": ("2206.01062v1.pdf", open(file_path, "rb"), "application/pdf"),
+    }
+
+    response = await client.post(endpoint, files=files, data=options)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_file:
+        namelist = zip_file.namelist()
+        for file in namelist:
+            if file.endswith(".json"):
+                doc = DoclingDocument.model_validate(json.loads(zip_file.read(file)))
+                for item, _level in doc.iterate_items():
+                    if isinstance(item, PictureItem):
+                        assert item.image is not None
+                        print(f"{item.image.uri}=")
+                        assert str(item.image.uri) in namelist
--- a/tests/test_file_opts.py
+++ b/tests/test_file_opts.py
@@ -0,0 +1,77 @@
+import asyncio
+import json
+import os
+
+import pytest
+import pytest_asyncio
+from asgi_lifespan import LifespanManager
+from httpx import ASGITransport, AsyncClient
+
+from docling_core.types import DoclingDocument
+from docling_core.types.doc.document import PictureDescriptionData
+
+from docling_serve.app import create_app
+
+
+@pytest.fixture(scope="session")
+def event_loop():
+    return asyncio.get_event_loop()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def app():
+    app = create_app()
+
+    async with LifespanManager(app) as manager:
+        print("Launching lifespan of app.")
+        yield manager.app
+
+
+@pytest_asyncio.fixture(scope="session")
+async def client(app):
+    async with AsyncClient(
+        transport=ASGITransport(app=app), base_url="http://app.io"
+    ) as client:
+        print("Client is ready")
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_file(client: AsyncClient):
+    """Test convert single file to all outputs"""
+
+    endpoint = "/v1/convert/file"
+    options = {
+        "to_formats": ["md", "json"],
+        "image_export_mode": "placeholder",
+        "ocr": False,
+        "do_picture_description": True,
+        "picture_description_api": json.dumps(
+            {
+                "url": "http://localhost:11434/v1/chat/completions",  # ollama
+                "params": {"model": "granite3.2-vision:2b"},
+                "timeout": 60,
+                "prompt": "Describe this image in a few sentences. ",
+            }
+        ),
+    }
+
+    current_dir = os.path.dirname(__file__)
+    file_path = os.path.join(current_dir, "2206.01062v1.pdf")
+
+    files = {
+        "files": ("2206.01062v1.pdf", open(file_path, "rb"), "application/pdf"),
+    }
+
+    response = await client.post(endpoint, files=files, data=options)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    data = response.json()
+
+    doc = DoclingDocument.model_validate(data["document"]["json_content"])
+
+    for pic in doc.pictures:
+        for ann in pic.annotations:
+            if isinstance(ann, PictureDescriptionData):
+                print(f"{pic.self_ref}")
+                print(ann.text)
--- a/tests/test_results_clear.py
+++ b/tests/test_results_clear.py
@@ -0,0 +1,133 @@
+import asyncio
+import base64
+import json
+from pathlib import Path
+
+import pytest
+import pytest_asyncio
+from asgi_lifespan import LifespanManager
+from httpx import ASGITransport, AsyncClient
+
+from docling_serve.app import create_app
+from docling_serve.settings import docling_serve_settings
+
+
+@pytest.fixture(scope="session")
+def event_loop():
+    return asyncio.get_event_loop()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def app():
+    app = create_app()
+
+    async with LifespanManager(app) as manager:
+        print("Launching lifespan of app.")
+        yield manager.app
+
+
+@pytest_asyncio.fixture(scope="session")
+async def client(app):
+    async with AsyncClient(
+        transport=ASGITransport(app=app), base_url="http://app.io"
+    ) as client:
+        print("Client is ready")
+        yield client
+
+
+async def convert_file(client: AsyncClient):
+    doc_filename = Path("tests/2408.09869v5.pdf")
+    encoded_doc = base64.b64encode(doc_filename.read_bytes()).decode()
+
+    payload = {
+        "options": {
+            "to_formats": ["json"],
+        },
+        "sources": [
+            {
+                "kind": "file",
+                "base64_string": encoded_doc,
+                "filename": doc_filename.name,
+            }
+        ],
+    }
+
+    response = await client.post("/v1/convert/source/async", json=payload)
+    assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    print(json.dumps(task, indent=2))
+
+    while task["task_status"] not in ("success", "failure"):
+        response = await client.get(f"/v1/status/poll/{task['task_id']}")
+        assert response.status_code == 200, "Response should be 200 OK"
+        task = response.json()
+        print(f"{task['task_status']=}")
+        print(f"{task['task_position']=}")
+
+        await asyncio.sleep(2)
+
+    assert task["task_status"] == "success"
+
+    return task
+
+
+@pytest.mark.asyncio
+async def test_clear_results(client: AsyncClient):
+    """Test removal of task."""
+
+    # Set long delay deletion
+    docling_serve_settings.result_removal_delay = 100
+
+    # Convert and wait for completion
+    task = await convert_file(client)
+
+    # Get result once
+    result_response = await client.get(f"/v1/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result 1 ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    # Get result twice
+    result_response = await client.get(f"/v1/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result 2 ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    # Clear
+    clear_response = await client.get("/v1/clear/results?older_then=0")
+    assert clear_response.status_code == 200, "Response should be 200 OK"
+    print("Clear ok.")
+
+    # Get deleted result
+    result_response = await client.get(f"/v1/result/{task['task_id']}")
+    assert result_response.status_code == 404, "Response should be removed"
+    print("Result was no longer found.")
+
+
+@pytest.mark.asyncio
+async def test_delay_remove(client: AsyncClient):
+    """Test automatic removal of task with delay."""
+
+    # Set short delay deletion
+    docling_serve_settings.result_removal_delay = 5
+
+    # Convert and wait for completion
+    task = await convert_file(client)
+
+    # Get result once
+    result_response = await client.get(f"/v1/result/{task['task_id']}")
+    assert result_response.status_code == 200, "Response should be 200 OK"
+    print("Result ok.")
+    result = result_response.json()
+    assert result["document"]["json_content"]["schema_name"] == "DoclingDocument"
+
+    print("Sleeping to wait the automatic task deletion.")
+    await asyncio.sleep(10)
+
+    # Get deleted result
+    result_response = await client.get(f"/v1/result/{task['task_id']}")
+    assert result_response.status_code == 404, "Response should be removed"
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
github-actions[bot]	ce15e0302b	chore: bump version to 1.1.0 [skip ci]	2025-07-30 15:53:01 +00:00
Michele Dolfi	ecb1874a50	feat: Add docling-mcp in the distribution (#290 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-30 15:39:11 +02:00
Michele Dolfi	1333f71c9c	fix: referenced paths relative to zip root (#289 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-30 14:49:26 +02:00
Tiago Santana	ec594d84fe	feat: add 3.0 openapi endpoint (#287 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>	2025-07-30 14:08:59 +02:00
Tiago Santana	3771c1b554	feat: add new source and target (#270 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>	2025-07-29 14:44:49 +02:00
github-actions[bot]	24db461b14	chore: bump version to 1.0.1 [skip ci]	2025-07-21 07:34:14 +00:00
Michele Dolfi	8706706e87	fix: docling update v2.42.0 (#277 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-21 08:47:40 +02:00
Michele Dolfi	766adb2481	docs: typo in README (#276 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-18 14:37:54 +02:00
Michele Dolfi	8222cf8955	ci: add spellchecker with custom vocabulary and fix typos (#268 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-15 14:17:35 +02:00
github-actions[bot]	b922824e5b	chore: bump version to 1.0.0 [skip ci]	2025-07-14 11:25:06 +00:00
Michele Dolfi	56e328baf7	feat!: v1 api with list of sources and target (#249 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-14 13:19:49 +02:00
Michele Dolfi	daa924a77e	feat!: use orchestrators from jobkit (#248 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-10 15:47:22 +02:00
Eugene	e63197e89e	chore: bump uv to 0.7.19 in container (#266 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-07-10 15:10:21 +02:00
github-actions[bot]	767ce0982b	chore: bump version to 0.16.1 [skip ci]	2025-07-07 16:17:50 +00:00
Michele Dolfi	bfde1a0991	fix: upgrade deps including, docling v2.40.0 with locks in models init (#264 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-07-07 17:13:45 +02:00
VIktor Kuropiantnyk	eb3892ee14	fix: missing tesseract osd (#263 ) Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-07-07 16:36:43 +02:00
tassadarliu	93b84712b2	docs: fix typo (#259 ) Signed-off-by: tassadarliu <rhapsodyn@gmail.com>	2025-07-07 08:47:34 +02:00
Yishen Miao	c45b937064	docs: change the doc example (#258 ) Signed-off-by: Yishen Miao <mys721tx@gmail.com>	2025-07-07 08:47:21 +02:00
Francisco Arceo	50e431f30f	docs: Update typo (#247 ) Signed-off-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-06-27 16:58:37 +02:00
Michele Dolfi	149a8cb1c0	fix: properly load models at boot (#244 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-27 12:20:38 +02:00
github-actions[bot]	5f9c20a985	chore: bump version to 0.16.0 [skip ci]	2025-06-25 09:52:08 +00:00
Michele Dolfi	80755a7d59	docs: Update example resources and improve README (#231 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-25 07:56:14 +02:00
Michele Dolfi	30aca92298	feat: package updates and more cuda images (#229 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-24 16:59:05 +02:00
github-actions[bot]	717fb3a8d8	chore: bump version to 0.15.0 [skip ci]	2025-06-17 15:00:38 +00:00
Michele Dolfi	873d05aefe	feat: use redocs and scalar as api docs (#228 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 16:54:00 +02:00
Ryan Fernandes	196c5ce42a	fix: "tesserocr" instead of "tesseract_cli" in usage docs (#223 ) Signed-off-by: Ryan Fernandes <ryan@fernandes.us>	2025-06-17 16:53:51 +02:00
github-actions[bot]	b5c5f47892	chore: bump version to 0.14.0 [skip ci]	2025-06-17 13:10:27 +00:00
23Ro	d5455b7f66	fix: Typo in Headline (#220 ) Signed-off-by: 23Ro <m.n@23ro.de>	2025-06-17 14:55:27 +02:00
Michele Dolfi	7a682494d6	chore: dco advisor (#224 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-17 09:38:56 +02:00
Eugene	524f6a8997	feat: Read supported file extensions from docling (#214 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-06-05 09:38:28 +02:00
github-actions[bot]	9ccf8e3b5e	chore: bump version to 0.13.0 [skip ci]	2025-06-04 12:24:40 +00:00
Michele Dolfi	ffea34732b	feat: upgrade docling to 2.36 (#212 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-04 14:20:34 +02:00
github-actions[bot]	b299af002b	chore: bump version to 0.12.0 [skip ci]	2025-06-03 16:30:28 +00:00
Michele Dolfi	c4c41f16df	feat: Export annotations in markdown and html (Docling upgrade) (#202 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 18:24:27 +02:00
Michele Dolfi	7066f3520a	fix: processing complex params in multipart-form (#210 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-06-03 18:24:05 +02:00
Rui Dias Gomes	6a8190c315	docs: add openshift replicasets examples (#209 ) Signed-off-by: Rui-Dias-Gomes <rui.dias.gomes@ibm.com> Co-authored-by: Rui-Dias-Gomes <rui.dias.gomes@ibm.com>	2025-06-03 17:43:41 +02:00
github-actions[bot]	060ecd8b0e	chore: bump version to 0.11.0 [skip ci]	2025-05-23 13:45:54 +00:00
Michele Dolfi	32b8a809f3	feat: page break placeholder in markdown exports options (#194 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-23 15:26:27 +02:00
Michele Dolfi	de002dfcdc	feat: clear results registry (#192 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-23 14:30:57 +02:00
Michele Dolfi	abe5aa03f5	feat: Upgrade to Docling 2.33.0 (#198 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-22 17:00:29 +02:00
VIktor Kuropiantnyk	3f090b7d15	docs: Example and instructions on how to load model weights to persistent volume (#197 ) Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>	2025-05-21 13:04:46 +02:00
Michele Dolfi	21c1791e42	docs: async api usage and fixes (#195 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-19 13:57:35 +02:00
Michele Dolfi	00be428490	feat: api to trigger offloading the models (#188 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-14 15:02:18 +02:00
Kasper Dinkla	3ff1b2f983	feat: Figure annotations @ docling components 0.0.7 (#181 ) Signed-off-by: DKL <dkl@zurich.ibm.com>	2025-05-08 16:31:10 +02:00
Michele Dolfi	8406fb9b59	fix: usage of hashlib for FIPS (#171 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-05-02 15:00:10 +02:00
github-actions[bot]	a2dcb0a20f	chore: bump version to 0.10.1 [skip ci]	2025-04-30 16:04:30 +00:00
Michele Dolfi	36787bc061	fix: avoid missing specialized keys in the options hash (#166 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-30 13:14:34 +02:00
Michele Dolfi	509f4889f8	fix: allow users to set the area threshold for picture descriptions (#165 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-04-30 12:37:24 +02:00
Michele Dolfi	919cf5c041	fix: expose max wait time in sync endpoints (#164 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-30 12:30:11 +02:00
Michele Dolfi	35c2630c61	fix: add flash-attn for cuda images (#161 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-29 16:58:33 +02:00
github-actions[bot]	382d675631	chore: bump version to 0.10.0 [skip ci]	2025-04-28 10:06:42 +00:00
Michele Dolfi	c65f3c654c	feat: add support for file upload and return as file in async endpoints (#152 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-28 11:18:19 +02:00
nkh0472	829effec1a	docs: fix new default pdf_backend (#158 ) Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>	2025-04-28 09:46:13 +02:00
nkh0472	494d66f992	chore: typo fix (#156 ) Signed-off-by: nkh0472 <67589323+nkh0472@users.noreply.github.com>	2025-04-28 08:41:26 +02:00
Quang Nam Ta	14bafb2628	docs: fixing small typo in docs (#155 ) Signed-off-by: Quang Nam Ta <work.quangnamta@gmail.com>	2025-04-28 08:35:40 +02:00
github-actions[bot]	37e2e1ad09	chore: bump version to 0.9.0 [skip ci]	2025-04-25 07:56:40 +00:00
Michele Dolfi	71c5fae505	fix: produce image artifacts in referenced mode (#151 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-24 17:33:36 +02:00
Michele Dolfi	91956cbf4e	docs: vlm and picture description options (#149 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-24 14:42:06 +02:00
Michele Dolfi	4c9571a052	feat: expose picture description options (#148 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com> Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>	2025-04-24 13:49:44 +02:00
Tiago Santana	41624af09f	test: add tests with fastapi client (#147 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>	2025-04-24 10:25:29 +02:00
Michele Dolfi	26bef5bec0	feat: Add parameters for Kubeflow pipeline engine (WIP) (#107 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-23 14:59:53 +02:00
github-actions[bot]	40bb21d347	chore: bump version to 0.8.0 [skip ci]	2025-04-22 13:04:33 +00:00
Michele Dolfi	ee89ee4dae	feat: Add option for vlm pipeline (#143 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-22 14:46:33 +02:00
Michele Dolfi	6b3d281f02	feat: Expose more conversion options (#142 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-22 10:41:47 +02:00
Tiago Santana	b598872e5c	feat(UI): change UI to use async endpoints (#131 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-19 19:59:07 +02:00
Michele Dolfi	087417e5c2	docs: fix required permissions for oauth2-proxy requests (#141 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-19 18:46:28 +02:00
Michele Dolfi	57f9073bc0	fix(UI): use https when calling the api (#139 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-19 17:35:54 +02:00
Rui Dias Gomes	525a43ff6f	docs: update deployment examples (#135 ) Signed-off-by: rmdg88 <rmdg88@gmail.com> Signed-off-by: Rui Dias Gomes <66125272+rmdg88@users.noreply.github.com>	2025-04-17 14:29:34 +02:00
Michele Dolfi	c1ce4719c9	fix: fix permissions in docker image (#136 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-04-17 14:27:43 +02:00
Kasper Dinkla	5dfb75d3b9	fix: picture caption visuals (#129 ) Signed-off-by: DKL <dkl@zurich.ibm.com>	2025-04-15 13:17:00 +02:00
Michele Dolfi	420162e674	docs: fix image tag (#124 ) Signed-off-by: Michele Dolfi <97102151+dolfim-ibm@users.noreply.github.com>	2025-04-11 16:19:39 +02:00
github-actions[bot]	ff75bab21b	chore: bump version to 0.7.0 [skip ci]	2025-03-31 13:44:01 +00:00
Michele Dolfi	7a0fabae07	feat: Expose TLS settings and example deploy with oauth-proxy (#112 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-31 14:51:30 +02:00
Maxim Lysak	9ffe49a359	chore: Readme picture (#108 ) Signed-off-by: Maksym Lysak <mly@zurich.ibm.com> Co-authored-by: Maksym Lysak <mly@zurich.ibm.com>	2025-03-31 08:29:09 -04:00
Michele Dolfi	68772bb6f0	feat: Offline static files (#109 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-26 18:54:54 -04:00
Michele Dolfi	20ec87a63a	feat: Update to Docling 2.28 (#106 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-24 20:00:25 -04:00
Eugene	e30f458923	fix: Move ARGs to prevent cache invalidation (#104 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-03-22 12:31:42 +01:00
github-actions[bot]	03e405638f	chore: bump version to 0.6.0 [skip ci]	2025-03-17 12:43:23 +00:00
Michele Dolfi	fd8e40a008	docs: simplify README and move details to docs (#102 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-17 13:40:12 +01:00
Michele Dolfi	422c402bab	fix: allow changes in CORS settings (#100 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-17 09:49:17 +01:00
Michele Dolfi	ea090288d3	fix: avoid exploding options cache using lru and expose size parameter (#101 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-17 08:52:29 +01:00
Michele Dolfi	07c48edd5d	fix: increase timeout_keep_alive and allow parameter changes (#98 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-16 09:03:40 +01:00
Michele Dolfi	a212547d28	fix: add warning when using incompatible parameters (#99 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-16 09:03:22 +01:00
Michele Dolfi	c76daac70c	fix(ui): use --port parameter and avoid failing when image is not found (#97 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-16 09:02:53 +01:00
Michele Dolfi	7994b19b9f	chore: move to docling-project gh org (#95 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-14 14:04:31 +01:00
Tiago Santana	ec57b528ed	feat: expose options for new features (#92 ) Signed-off-by: Tiago Santana <54704492+SantanaTiago@users.noreply.github.com>	2025-03-13 17:09:59 +01:00