chore: bump version to 0.5.1 [skip ci]

ci: Speed up python linting (#64 )
Signed-off-by: Eugene <fogaprod@gmail.com>
2025-11-29 08:33:50 +00:00 · 2025-03-10 17:31:51 +00:00 · 2025-03-10 18:05:33 +01:00 · 2025-03-10 16:58:10 +01:00 · 2025-03-10 16:57:48 +01:00 · 2025-03-10 16:19:34 +01:00
56 changed files with 6204 additions and 5762 deletions
--- a/docling_serve/.env.example
+++ b/docling_serve/.env.example
@@ -1,3 +1,3 @@
 TESSDATA_PREFIX=/usr/share/tesseract/tessdata/
 UVICORN_WORKERS=2
-RELOAD=True
+UVICORN_RELOAD=True
--- a/.flake8
+++ b/.flake8
@@ -1,7 +0,0 @@
-[flake8]
-max-line-length = 88
-exclude = test/*
-max-complexity = 18
-docstring-convention = google
-ignore = W503,E203
-classmethod-decorators = classmethod,validator
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,12 @@
+<!-- Thank you for contributing to Docling! -->
+
+<!-- STEPS TO FOLLOW:
+  1. Add a description of the changes (frequently the same as the commit description)
+  2. Enter the issue number next to "Resolves #" below (if there is no tracking issue resolved, **remove that section**)
+  3. Make sure the PR title follows the **Commit Message Formatting**: https://www.conventionalcommits.org/en/v1.0.0/#summary.
+-->
+
+<!-- Uncomment this section with the issue number if an issue is being resolved
+**Issue resolved by this Pull Request:**
+Resolves #
+--->
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -0,0 +1,23 @@
+# Security and Disclosure Information Policy for the Docling Project
+
+The Docling team and community take security bugs seriously. We appreciate your efforts to responsibly disclose your findings, and will make every effort to acknowledge your contributions.
+
+## Reporting a Vulnerability
+
+If you think you've identified a security issue in an Docling project repository, please DO NOT report the issue publicly via the GitHub issue tracker, etc.
+
+Instead, send an email with as many details as possible to [deepsearch-core@zurich.ibm.com](mailto:deepsearch-core@zurich.ibm.com). This is a private mailing list for the maintainers team.
+
+Please do not create a public issue.
+
+## Security Vulnerability Response
+
+Each report is acknowledged and analyzed by the core maintainers within 3 working days.
+
+Any vulnerability information shared with core maintainers stays within the Docling project and will not be disseminated to other projects unless it is necessary to get the issue fixed.
+
+After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
+
+## Security Alerts
+
+We will send announcements of security vulnerabilities and steps to remediate on the [Docling announcements](https://github.com/DS4SD/docling/discussions/categories/announcements).
--- a/.github/actions/setup-poetry/action.yml
+++ b/.github/actions/setup-poetry/action.yml
@@ -1,19 +0,0 @@
-name: 'Set up Poetry and install'
-description: 'Set up a specific version of Poetry and install dependencies using caching.'
-inputs:
-  python-version:
-    description: "Version range or exact version of Python or PyPy to use, using SemVer's version range syntax."
-    default: '3.11'
-runs:
-  using: 'composite'
-  steps:
-    - name: Install poetry
-      run: pipx install poetry==1.8.3
-      shell: bash
-    - uses: actions/setup-python@v4
-      with:
-        python-version: ${{ inputs.python-version }}
-        cache: 'poetry'
-    - name: Install dependencies
-      run: poetry install --all-extras
-      shell: bash
--- a/.github/mergify.yml
+++ b/.github/mergify.yml
@@ -0,0 +1,9 @@
+merge_protections:
+  - name: Enforce conventional commit
+    description: Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
+    if:
+      - base = main
+    success_conditions:
+      - "title ~=
+        ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\\(.+\
+        \\))?(!)?:"
--- a/.github/scripts/release.sh
+++ b/.github/scripts/release.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+set -e  # trigger failure on error - do not remove!
+set -x  # display command on output
+
+if [ -z "${TARGET_VERSION}" ]; then
+    >&2 echo "No TARGET_VERSION specified"
+    exit 1
+fi
+CHGLOG_FILE="${CHGLOG_FILE:-CHANGELOG.md}"
+
+# update package version
+uvx --from=toml-cli toml set --toml-path=pyproject.toml project.version "${TARGET_VERSION}"
+uv lock --upgrade-package docling-serve
+
+# collect release notes
+REL_NOTES=$(mktemp)
+uv run --no-sync semantic-release changelog --unreleased >> "${REL_NOTES}"
+
+# update changelog
+TMP_CHGLOG=$(mktemp)
+TARGET_TAG_NAME="v${TARGET_VERSION}"
+RELEASE_URL="$(gh repo view --json url -q ".url")/releases/tag/${TARGET_TAG_NAME}"
+printf "## [${TARGET_TAG_NAME}](${RELEASE_URL}) - $(date -Idate)\n\n" >> "${TMP_CHGLOG}"
+cat "${REL_NOTES}" >> "${TMP_CHGLOG}"
+if [ -f "${CHGLOG_FILE}" ]; then
+    printf "\n" | cat - "${CHGLOG_FILE}" >> "${TMP_CHGLOG}"
+fi
+mv "${TMP_CHGLOG}" "${CHGLOG_FILE}"
+
+# push changes
+git config --global user.name 'github-actions[bot]'
+git config --global user.email 'github-actions[bot]@users.noreply.github.com'
+git add pyproject.toml uv.lock "${CHGLOG_FILE}"
+COMMIT_MSG="chore: bump version to ${TARGET_VERSION} [skip ci]"
+git commit -m "${COMMIT_MSG}"
+git push origin main
+
+# create GitHub release (incl. Git tag)
+gh release create "${TARGET_TAG_NAME}" -F "${REL_NOTES}"
--- a/.github/workflows/cd.yml
+++ b/.github/workflows/cd.yml
@@ -0,0 +1,59 @@
+name: "Run CD"
+
+on:
+  workflow_dispatch:
+
+jobs:
+  code-checks:
+    uses: ./.github/workflows/job-checks.yml
+  pre-release-check:
+    runs-on: ubuntu-latest
+    outputs:
+      TARGET_TAG_V: ${{ steps.version_check.outputs.TRGT_VERSION }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # for fetching tags, required for semantic-release
+      - name: Install uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+      - name: Install dependencies
+        run: uv sync --only-dev
+      - name: Check version of potential release
+        id: version_check
+        run: |
+          TRGT_VERSION=$(uv run --no-sync semantic-release print-version)
+          echo "TRGT_VERSION=${TRGT_VERSION}" >> "$GITHUB_OUTPUT"
+          echo "${TRGT_VERSION}"
+      - name: Check notes of potential release
+        run: uv run --no-sync semantic-release changelog --unreleased
+  release:
+    needs: [code-checks, pre-release-check]
+    if: needs.pre-release-check.outputs.TARGET_TAG_V != ''
+    environment: auto-release
+    runs-on: ubuntu-latest
+    concurrency: release
+    steps:
+      - uses: actions/create-github-app-token@v1
+        id: app-token
+        with:
+          app-id: ${{ vars.CI_APP_ID }}
+          private-key: ${{ secrets.CI_PRIVATE_KEY }}
+      - uses: actions/checkout@v4
+        with:
+          token: ${{ steps.app-token.outputs.token }}
+          fetch-depth: 0  # for fetching tags, required for semantic-release
+      - name: Install uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          enable-cache: true
+      - name: Install dependencies
+        run: uv sync --only-dev
+      - name: Run release script
+        env:
+          GH_TOKEN: ${{ steps.app-token.outputs.token }}
+          TARGET_VERSION: ${{ needs.pre-release-check.outputs.TARGET_TAG_V }}
+          CHGLOG_FILE: CHANGELOG.md
+        run: ./.github/scripts/release.sh
+        shell: bash
--- a/.github/workflows/checks.yml
+++ b/.github/workflows/checks.yml
@@ -1,34 +0,0 @@
-name: Run linter checks
-on:
-  push:
-    branches: ["main"]
-  pull_request:
-    branches: ["main"]
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  py-lint:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ['3.11']
-    steps:
-      - uses: actions/checkout@v4
-      - uses: ./.github/actions/setup-poetry
-        with:
-          python-version: ${{ matrix.python-version }}
-      - name: Run styling check
-        run: poetry run pre-commit run --all-files
-
-  markdown-lint:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - name: markdownlint-cli2-action
-        uses: DavidAnson/markdownlint-cli2-action@v16
-        with:
-          globs: "**/*.md"
-
--- a/.github/workflows/ci-images-dryrun.yml
+++ b/.github/workflows/ci-images-dryrun.yml
@@ -0,0 +1,41 @@
+name: Dry run docling-serve image building
+
+on:
+  workflow_call:
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  build_image:
+    name: Build ${{ matrix.spec.name }} container image
+    strategy:
+      matrix:
+        spec:
+          - name: ds4sd/docling-serve
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+            platforms: linux/amd64, linux/arm64
+          - name: ds4sd/docling-serve-cpu
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cu124
+            platforms: linux/amd64, linux/arm64
+          - name: ds4sd/docling-serve-cu124
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+            platforms: linux/amd64
+
+    permissions:
+      packages: write
+      contents: read
+      attestations: write
+      id-token: write
+
+    uses: ./.github/workflows/job-image.yml
+    with:
+      publish: false
+      build_args: ${{ matrix.spec.build_args }}
+      ghcr_image_name: ${{ matrix.spec.name }}
+      quay_image_name: ""
+      platforms: ${{ matrix.spec.platforms }}
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,25 @@
+name: "Run CI"
+
+on:
+  push:
+    branches: ["main"]
+  pull_request:
+    branches: ["main"]
+
+jobs:
+  code-checks:
+    # if: ${{ github.event_name == 'push' || (github.event.pull_request.head.repo.full_name != 'DS4SD/docling-serve' && github.event.pull_request.head.repo.full_name != 'ds4sd/docling-serve') }}
+    uses: ./.github/workflows/job-checks.yml
+    permissions:
+      packages: write
+      contents: read
+      attestations: write
+      id-token: write
+
+  build-images:
+    uses: ./.github/workflows/ci-images-dryrun.yml
+    permissions:
+      packages: write
+      contents: read
+      attestations: write
+      id-token: write
--- a/.github/workflows/images-dryrun.yml
+++ b/.github/workflows/images-dryrun.yml
@@ -1,105 +0,0 @@
-name: Dry run docling-serve image building
-
-on:
-  pull_request:
-    branches: ["main"]
-
-env:
-  GHCR_REGISTRY: ghcr.io
-  GHCR_DOCLING_SERVE_CPU_IMAGE_NAME: ds4sd/docling-serve-cpu
-  GHCR_DOCLING_SERVE_GPU_IMAGE_NAME: ds4sd/docling-serve
-
-jobs:
-  build_cpu_image:
-    name: Build docling-serve "CPU only" container image
-    runs-on: ubuntu-latest
-    permissions:
-      packages: write
-      contents: read
-      attestations: write
-      id-token: write
-
-    steps:
-      - name: Check out the repo
-        uses: actions/checkout@v4
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Cache Docker layers
-        uses: actions/cache@v4
-        with:
-          path: /tmp/.buildx-cache
-          key: ${{ runner.os }}-buildx-${{ github.sha }}
-          restore-keys: |
-            ${{ runner.os }}-buildx-
-
-      - name: Extract metadata (tags, labels) for docling-serve (CPU only) ghcr image
-        id: ghcr_serve_cpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_CPU_IMAGE_NAME }}
-
-      - name: Build docling-serve-cpu image
-        id: build-serve-cpu-ghcr
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: false
-          tags: ${{ steps.ghcr_serve_cpu_meta.outputs.tags }}
-          labels: ${{ steps.ghcr_serve_cpu_meta.outputs.labels }}
-          platforms: linux/amd64, linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=true
-
-      - name: Remove Local Docker Images
-        run: |
-          docker image prune -af
-
-  build_gpu_image:
-    name: Build docling-serve (with GPU support) container image
-    runs-on: ubuntu-latest
-    permissions:
-      packages: write
-      contents: read
-      attestations: write
-      id-token: write
-
-    steps:
-      - name: Check out the repo
-        uses: actions/checkout@v4
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Cache Docker layers
-        uses: actions/cache@v4
-        with:
-          path: /tmp/.buildx-cache
-          key: ${{ runner.os }}-buildx-${{ github.sha }}
-          restore-keys: |
-            ${{ runner.os }}-buildx-
-
-      - name: Extract metadata (tags, labels) for docling-serve (GPU) ghcr image
-        id: ghcr_serve_gpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_GPU_IMAGE_NAME }}
-
-      - name: Build docling-serve (GPU) image
-        id: build-serve-gpu-ghcr
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: false
-          tags: ${{ steps.ghcr_serve_gpu_meta.outputs.tags }}
-          labels: ${{ steps.ghcr_serve_gpu_meta.outputs.labels }}
-          platforms: linux/amd64,linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=false
--- a/.github/workflows/images.yml
+++ b/.github/workflows/images.yml
@@ -4,193 +4,44 @@ on:
  push:
    branches:
      - main
-    tags:
-      - 'v*'
+  release:
+    types: [published]

-env:
-  GHCR_REGISTRY: ghcr.io
-  GHCR_DOCLING_SERVE_CPU_IMAGE_NAME: ds4sd/docling-serve-cpu
-  GHCR_DOCLING_SERVE_GPU_IMAGE_NAME: ds4sd/docling-serve
-  QUAY_REGISTRY: quay.io
-  QUAY_DOCLING_SERVE_CPU_IMAGE_NAME: ds4sd/docling-serve-cpu
-  QUAY_DOCLING_SERVE_GPU_IMAGE_NAME: ds4sd/docling-serve
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true

 jobs:
-  build_and_publish_cpu_images:
-    name: Push docling-serve "CPU only" container image to GHCR and QUAY
-    runs-on: ubuntu-latest
-    environment: registry-creds
+  build_and_publish_images:
+    name: Build and push ${{ matrix.spec.name }} container image to GHCR and QUAY
+    strategy:
+      matrix:
+        spec:
+          - name: ds4sd/docling-serve
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu
+            platforms: linux/amd64, linux/arm64
+          - name: ds4sd/docling-serve-cpu
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cu124
+            platforms: linux/amd64, linux/arm64
+          - name: ds4sd/docling-serve-cu124
+            build_args: |
+              UV_SYNC_EXTRA_ARGS=--no-extra cpu
+            platforms: linux/amd64
+
    permissions:
      packages: write
      contents: read
      attestations: write
      id-token: write
+    secrets: inherit

-    steps:
-      - name: Check out the repo
-        uses: actions/checkout@v4
-
-      - name: Log in to the GHCR container image registry
-        uses: docker/login-action@v3
-        with:
-          registry: ${{ env.GHCR_REGISTRY }}
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Log in to the Quay container image registry
-        uses: docker/login-action@v3
-        with:
-          registry: ${{ env.QUAY_REGISTRY }}
-          username: ${{ secrets.QUAY_USERNAME }}
-          password: ${{ secrets.QUAY_TOKEN }}
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Cache Docker layers
-        uses: actions/cache@v4
-        with:
-          path: /tmp/.buildx-cache
-          key: ${{ runner.os }}-buildx-${{ github.sha }}
-          restore-keys: |
-            ${{ runner.os }}-buildx-
-
-      - name: Extract metadata (tags, labels) for docling-serve (CPU only) ghcr image
-        id: ghcr_serve_cpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_CPU_IMAGE_NAME }}
-
-      - name: Build and push docling-serve-cpu image to ghcr.io
-        id: push-serve-cpu-ghcr
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: ${{ steps.ghcr_serve_cpu_meta.outputs.tags }}
-          labels: ${{ steps.ghcr_serve_cpu_meta.outputs.labels }}
-          platforms: linux/amd64, linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=true
-
-      - name: Generate artifact attestation
-        uses: actions/attest-build-provenance@v1
-        with:
-          subject-name: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_CPU_IMAGE_NAME}}
-          subject-digest: ${{ steps.push-serve-cpu-ghcr.outputs.digest }}
-          push-to-registry: true
-
-      - name: Extract metadata (tags, labels) for docling-serve (CPU only) quay image
-        id: quay_serve_cpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.QUAY_REGISTRY }}/${{ env.QUAY_DOCLING_SERVE_CPU_IMAGE_NAME }}
-
-      - name: Build and push docling-serve-cpu image to quay.io
-        id: push-serve-cpu-quay
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: ${{ steps.quay_serve_cpu_meta.outputs.tags }}
-          labels: ${{ steps.quay_serve_cpu_meta.outputs.labels }}
-          platforms: linux/amd64, linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=true
-      - name: Remove Local Docker Images
-        run: |
-          docker image prune -af
-
-  build_and_publish_gpu_images:
-    name: Push docling-serve (with GPU support) container image to GHCR and QUAY
-    runs-on: ubuntu-latest
-    environment: registry-creds
-    permissions:
-      packages: write
-      contents: read
-      attestations: write
-      id-token: write
-
-    steps:
-      - name: Check out the repo
-        uses: actions/checkout@v4
-
-      - name: Log in to the GHCR container image registry
-        uses: docker/login-action@v3
-        with:
-          registry: ${{ env.GHCR_REGISTRY }}
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Log in to the Quay container image registry
-        uses: docker/login-action@v3
-        with:
-          registry: ${{ env.QUAY_REGISTRY }}
-          username: ${{ secrets.QUAY_USERNAME }}
-          password: ${{ secrets.QUAY_TOKEN }}
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-
-      - name: Cache Docker layers
-        uses: actions/cache@v4
-        with:
-          path: /tmp/.buildx-cache
-          key: ${{ runner.os }}-buildx-${{ github.sha }}
-          restore-keys: |
-            ${{ runner.os }}-buildx-
-
-      - name: Extract metadata (tags, labels) for docling-serve (GPU) ghcr image
-        id: ghcr_serve_gpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_GPU_IMAGE_NAME }}
-
-      - name: Build and push docling-serve (GPU) image to ghcr.io
-        id: push-serve-gpu-ghcr
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: ${{ steps.ghcr_serve_gpu_meta.outputs.tags }}
-          labels: ${{ steps.ghcr_serve_gpu_meta.outputs.labels }}
-          platforms: linux/amd64,linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=false
-
-      - name: Generate artifact attestation
-        uses: actions/attest-build-provenance@v1
-        with:
-          subject-name: ${{ env.GHCR_REGISTRY }}/${{ env.GHCR_DOCLING_SERVE_GPU_IMAGE_NAME}}
-          subject-digest: ${{ steps.push-serve-gpu-ghcr.outputs.digest }}
-          push-to-registry: true
-
-      - name: Extract metadata (tags, labels) for docling-serve (GPU) quay image
-        id: quay_serve_gpu_meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ${{ env.QUAY_REGISTRY }}/${{ env.QUAY_DOCLING_SERVE_GPU_IMAGE_NAME }}
-
-      - name: Build and push docling-serve (GPU) image to quay.io
-        id: push-serve-gpu-quay
-        uses: docker/build-push-action@v5
-        with:
-          context: .
-          push: true
-          tags: ${{ steps.quay_serve_gpu_meta.outputs.tags }}
-          labels: ${{ steps.quay_serve_gpu_meta.outputs.labels }}
-          platforms: linux/amd64,linux/arm64
-          cache-from: type=gha
-          cache-to: type=gha,mode=max
-          file: Containerfile
-          build-args: |
-            --build-arg CPU_ONLY=false
+    uses: ./.github/workflows/job-image.yml
+    with:
+      publish: true
+      environment: registry-creds
+      build_args: ${{ matrix.spec.build_args }}
+      ghcr_image_name: ${{ matrix.spec.name }}
+      quay_image_name: ${{ matrix.spec.name }}
+      platforms: ${{ matrix.spec.platforms }}
--- a/.github/workflows/job-build.yml
+++ b/.github/workflows/job-build.yml
@@ -0,0 +1,29 @@
+name: Run checks
+
+on:
+  workflow_call:
+
+jobs:
+  build-package:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.12']
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          enable-cache: true
+      - name: Install dependencies
+        run: uv sync --all-extras --no-extra cu124
+      - name: Build package
+        run: uv build
+      - name: Check content of wheel
+        run: unzip -l dist/*.whl
+      - name: Store the distribution packages
+        uses: actions/upload-artifact@v4
+        with:
+          name: python-package-distributions
+          path: dist/
--- a/.github/workflows/job-checks.yml
+++ b/.github/workflows/job-checks.yml
@@ -0,0 +1,67 @@
+name: Run checks
+
+on:
+  workflow_call:
+
+jobs:
+  py-lint:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.12']
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          enable-cache: true
+
+      - name: pre-commit cache key
+        run: echo "PY=$(python -VV | sha256sum | cut -d' ' -f1)" >> "$GITHUB_ENV"
+      - uses: actions/cache@v4
+        with:
+          path: ~/.cache/pre-commit
+          key: pre-commit|${{ env.PY }}|${{ hashFiles('.pre-commit-config.yaml') }}
+
+      - name: Install dependencies
+        run: uv sync --frozen --all-extras --no-extra cu124
+
+      - name: Run styling check
+        run: pre-commit run --all-files
+
+  build-package:
+    uses: ./.github/workflows/job-build.yml
+
+  test-package:
+    needs:
+      - build-package
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.12']
+    steps:
+      - name: Download all the dists
+        uses: actions/download-artifact@v4
+        with:
+          name: python-package-distributions
+          path: dist/
+      - name: Install uv and set the python version
+        uses: astral-sh/setup-uv@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          enable-cache: true
+      - name: Install package
+        run: uv pip install dist/*.whl
+      - name: Create the server
+        run: python -c 'from docling_serve.app import create_app; create_app()'
+
+  markdown-lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: markdownlint-cli2-action
+        uses: DavidAnson/markdownlint-cli2-action@v16
+        with:
+          globs: "**/*.md"
+
--- a/.github/workflows/job-image.yml
+++ b/.github/workflows/job-image.yml
@@ -0,0 +1,141 @@
+name: Build docling-serve container image
+
+on:
+  workflow_call:
+    inputs:
+      build_args:
+        type: string
+        description: "Extra build arguments for the build."
+        default: ""
+      ghcr_image_name:
+        type: string
+        description: "Name of the image for GHCR."
+      quay_image_name:
+        type: string
+        description: "Name of the image Quay."
+      platforms:
+        type: string
+        description: "Platform argument for building images."
+        default: linux/amd64, linux/arm64
+      publish:
+        type: boolean
+        description: "If true, the images will be published."
+        default: false
+      environment:
+        type: string
+        description: "GH Action environment"
+        default: ""
+
+env:
+  GHCR_REGISTRY: ghcr.io
+  QUAY_REGISTRY: quay.io
+
+jobs:
+  image:
+    runs-on: ubuntu-latest
+    permissions:
+      packages: write
+      contents: read
+      attestations: write
+      id-token: write
+    environment: ${{ inputs.environment }}
+
+    steps:
+      - name: Free up space in github runner
+        # Free space as indicated here : https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
+        run: |
+            df -h
+            sudo rm -rf "/usr/local/share/boost"
+            sudo rm -rf "$AGENT_TOOLSDIRECTORY"
+            sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup
+            # shellcheck disable=SC2046
+            sudo docker rmi "$(docker image ls -aq)" >/dev/null 2>&1 || true
+            df -h
+
+      - name: Check out the repo
+        uses: actions/checkout@v4
+
+      - name: Log in to the GHCR container image registry
+        if: ${{ inputs.publish }}
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.GHCR_REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Log in to the Quay container image registry
+        if: ${{ inputs.publish }}
+        uses: docker/login-action@v3
+        with:
+          registry: ${{ env.QUAY_REGISTRY }}
+          username: ${{ secrets.QUAY_USERNAME }}
+          password: ${{ secrets.QUAY_TOKEN }}
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Cache Docker layers
+        uses: actions/cache@v4
+        with:
+          path: /tmp/.buildx-cache
+          key: ${{ runner.os }}-buildx-${{ github.sha }}
+          restore-keys: |
+            ${{ runner.os }}-buildx-
+
+      - name: Extract metadata (tags, labels) for docling-serve ghcr image
+        id: ghcr_meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.GHCR_REGISTRY }}/${{ inputs.ghcr_image_name }}
+
+      - name: Build and push image to ghcr.io
+        id: ghcr_push
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: ${{ inputs.publish }}
+          tags: ${{ steps.ghcr_meta.outputs.tags }}
+          labels: ${{ steps.ghcr_meta.outputs.labels }}
+          platforms: ${{ inputs.platforms}}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+          file: Containerfile
+          build-args: ${{ inputs.build_args }}
+
+      - name: Generate artifact attestation
+        if: ${{ inputs.publish }}
+        uses: actions/attest-build-provenance@v1
+        with:
+          subject-name: ${{ env.GHCR_REGISTRY }}/${{ inputs.ghcr_image_name }}
+          subject-digest: ${{ steps.ghcr_push.outputs.digest }}
+          push-to-registry: true
+
+      - name: Extract metadata (tags, labels) for docling-serve quay image
+        if: ${{ inputs.publish }}
+        id: quay_meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ${{ env.QUAY_REGISTRY }}/${{ inputs.quay_image_name }}
+
+      - name: Build and push image to quay.io
+        if: ${{ inputs.publish }}
+        # id: push-serve-cpu-quay
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          push: ${{ inputs.publish }}
+          tags: ${{ steps.quay_meta.outputs.tags }}
+          labels: ${{ steps.quay_meta.outputs.labels }}
+          platforms: ${{ inputs.platforms}}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+          file: Containerfile
+          build-args: ${{ inputs.build_args }}
+      
+      # - name: Inspect the image details
+      #   run: |
+      #     echo "${{ steps.ghcr_push.outputs.metadata }}"
+
+      - name: Remove Local Docker Images
+        run: |
+          docker image prune -af
--- a/.github/workflows/pypi.yml
+++ b/.github/workflows/pypi.yml
@@ -0,0 +1,34 @@
+name: "Build and publish package"
+
+on:
+  release:
+    types: [published]
+
+permissions:
+  contents: read
+
+jobs:
+
+  build-package:
+    uses: ./.github/workflows/job-build.yml
+
+  build-and-publish:
+    needs:
+      - build-package
+    runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/docling-serve  # Replace <package-name> with your PyPI project name
+    permissions:
+      id-token: write  # IMPORTANT: mandatory for trusted publishing
+    steps:
+      - name: Download all the dists
+        uses: actions/download-artifact@v4
+        with:
+          name: python-package-distributions
+          path: dist/
+      - name: Publish distribution 📦 to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        with:
+          # currently not working with reusable workflows
+          attestations: false
--- a/.gitignore
+++ b/.gitignore
@@ -1,5 +1,7 @@
 model_artifacts/
 scratch/
+.md-lint
+actionlint

 # Created by https://www.toptal.com/developers/gitignore/api/python,macos,virtualenv,pycharm,visualstudiocode,emacs,vim,jupyternotebooks
 # Edit at https://www.toptal.com/developers/gitignore?templates=python,macos,virtualenv,pycharm,visualstudiocode,emacs,vim,jupyternotebooks
--- a/.markdownlint-cli2.yaml
+++ b/.markdownlint-cli2.yaml
@@ -4,5 +4,7 @@ config:
  first-line-heading: false
  MD033:
    allowed_elements: ["details", "summary"]
+  MD024:
+    siblings_only: true
 globs:
  - "**/*.md"
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,49 +1,24 @@
 fail_fast: true
 repos:
-  - repo: local
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.9.6
    hooks:
-      - id: system
-        name: Black
-        entry: poetry run black docling_serve tests
-        pass_filenames: false
-        language: system
-        files: '\.py$'
-  - repo: local
-    hooks:
-      - id: system
-        name: isort
-        entry: poetry run isort docling_serve tests
-        pass_filenames: false
-        language: system
-        files: '\.py$'
-  - repo: local
-    hooks:
-      - id: autoflake
-        name: autoflake
-        entry: poetry run autoflake docling_serve tests
-        pass_filenames: false
-        language: system
-        files: '\.py$'
-  - repo: local
-    hooks:
-      - id: system
-        name: flake8
-        entry: poetry run flake8 docling_serve
-        pass_filenames: false
-        language: system
-        files: '\.py$'
+      # Run the Ruff formatter.
+      - id: ruff-format
+        args: [--config=pyproject.toml]
+      # Run the Ruff linter.
+      - id: ruff
+        args: [--exit-non-zero-on-fix, --fix, --config=pyproject.toml]
  - repo: local
    hooks:
      - id: system
        name: MyPy
-        entry: poetry run mypy docling_serve
+        entry: uv run --no-sync mypy docling_serve
        pass_filenames: false
        language: system
        files: '\.py$'
-  - repo: local
+  - repo: https://github.com/astral-sh/uv-pre-commit
+    # uv version.
+    rev: 0.6.1
    hooks:
-      - id: system
-        name: Poetry check
-        entry: poetry check --lock
-        pass_filenames: false
-        language: system
+      - id: uv-lock
--- a/.python-version
+++ b/.python-version
@@ -0,0 +1 @@
+3.12
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,36 @@
+## [v0.5.1](https://github.com/DS4SD/docling-serve/releases/tag/v0.5.1) - 2025-03-10
+
+### Fix
+
+* Submodules in wheels ([#85](https://github.com/DS4SD/docling-serve/issues/85)) ([`a92ad48`](https://github.com/DS4SD/docling-serve/commit/a92ad48b287bfcb134011dc0fc3f91ee04e067ee))
+
+## [v0.5.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.5.0) - 2025-03-07
+
+### Feature
+
+* Async api ([#60](https://github.com/DS4SD/docling-serve/issues/60)) ([`82f8900`](https://github.com/DS4SD/docling-serve/commit/82f890019745859699c1b01f9ccfb64cb7e37906))
+* Display version in fastapi docs ([#78](https://github.com/DS4SD/docling-serve/issues/78)) ([`ed851c9`](https://github.com/DS4SD/docling-serve/commit/ed851c95fee5f59305ddc3dcd5c09efce618470b))
+
+### Fix
+
+* Remove uv from image, merge ARG and ENV declarations ([#57](https://github.com/DS4SD/docling-serve/issues/57)) ([`c95db36`](https://github.com/DS4SD/docling-serve/commit/c95db3643807a4dfb96d93c8e10d6eb486c49a30))
+* **docs:** Remove comma in convert/source curl example ([#73](https://github.com/DS4SD/docling-serve/issues/73)) ([`05df073`](https://github.com/DS4SD/docling-serve/commit/05df0735d35a589bdc2a11fcdd764a10f700cb6f))
+
+## [v0.4.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.4.0) - 2025-02-26
+
+### Feature
+
+* New container images ([#68](https://github.com/DS4SD/docling-serve/issues/68)) ([`7e6d9cd`](https://github.com/DS4SD/docling-serve/commit/7e6d9cdef398df70a5b4d626aeb523c428c10d56))
+* Render DoclingDocument with npm docling-components in the example UI ([#65](https://github.com/DS4SD/docling-serve/issues/65)) ([`c430d9b`](https://github.com/DS4SD/docling-serve/commit/c430d9b1a162ab29104d86ebaa1ac5a5488b1f09))
+
+## [v0.3.0](https://github.com/DS4SD/docling-serve/releases/tag/v0.3.0) - 2025-02-19
+
+### Feature
+
+* Add new docling-serve cli ([#50](https://github.com/DS4SD/docling-serve/issues/50)) ([`ec33a61`](https://github.com/DS4SD/docling-serve/commit/ec33a61faa7846b9b7998fbf557ebe39a3b800f6))
+
+### Fix
+
+* Set DOCLING_SERVE_ARTIFACTS_PATH in images ([#53](https://github.com/DS4SD/docling-serve/issues/53)) ([`4877248`](https://github.com/DS4SD/docling-serve/commit/487724836896576ca4f98e84abf15fd1c383bec8))
+* Set root UI path when behind proxy ([#38](https://github.com/DS4SD/docling-serve/issues/38)) ([`c64a450`](https://github.com/DS4SD/docling-serve/commit/c64a450bf9ba9947ab180e92bef2763ff710b210))
+* Support python 3.13 and docling updates and switch to uv ([#48](https://github.com/DS4SD/docling-serve/issues/48)) ([`ae3b490`](https://github.com/DS4SD/docling-serve/commit/ae3b4906f1c0829b1331ea491f3518741cabff71))
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -142,8 +142,7 @@ poetry add NAME

 We use the following tools to enforce code style:

- iSort, to sort imports
- Black, to format code
+- ruff, to sort imports and format code

 We run a series of checks on the code base on every commit, using `pre-commit`. To install the hooks, run:

@@ -157,4 +156,4 @@ To run the checks on-demand, run:
 pre-commit run --all-files
 ```

-Note: Checks like `Black` and `isort` will "fail" if they modify files. This is because `pre-commit` doesn't like to see files modified by their Hooks. In these cases, `git add` the modified files and `git commit` again.
+Note: Formatting checks like `ruff` will "fail" if they modify files. This is because `pre-commit` doesn't like to see files modified by their Hooks. In these cases, `git add` the modified files and `git commit` again.
--- a/55
+++ b/55
@@ -2,7 +2,8 @@ ARG BASE_IMAGE=quay.io/sclorg/python-312-c9s:c9s

 FROM ${BASE_IMAGE}

-ARG CPU_ONLY=false
+ARG MODELS_LIST="layout tableformer picture_classifier easyocr" \
+    UV_SYNC_EXTRA_ARGS=""

 USER 0

@@ -29,33 +30,37 @@ USER 1001

 WORKDIR /opt/app-root/src

-# On container environments, always set a thread budget to avoid undesired thread congestion.
-ENV OMP_NUM_THREADS=4
+ENV \
+    # On container environments, always set a thread budget to avoid undesired thread congestion.
+    OMP_NUM_THREADS=4 \
+    LANG=en_US.UTF-8 \
+    LC_ALL=en_US.UTF-8 \
+    PYTHONIOENCODING=utf-8 \
+    UV_COMPILE_BYTECODE=1 \
+    UV_LINK_MODE=copy \
+    UV_PROJECT_ENVIRONMENT=/opt/app-root \
+    DOCLING_SERVE_ARTIFACTS_PATH=/opt/app-root/src/.cache/docling/models

-ENV LANG=en_US.UTF-8
-ENV LC_ALL=en_US.UTF-8
-ENV PYTHONIOENCODING=utf-8
+RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
+    --mount=type=bind,source=uv.lock,target=uv.lock \
+    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
+    uv sync --frozen --no-install-project --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}

-ENV WITH_UI=True
+RUN echo "Downloading models..." && \
+    HF_HUB_DOWNLOAD_TIMEOUT="90" \
+    HF_HUB_ETAG_TIMEOUT="90" \
+    docling-tools models download -o "${DOCLING_SERVE_ARTIFACTS_PATH}" ${MODELS_LIST} && \
+    chown -R 1001:0 /opt/app-root/src/.cache && \
+    chmod -R g=u /opt/app-root/src/.cache

-COPY --chown=1001:0 pyproject.toml poetry.lock models_download.py README.md ./
-
-RUN pip install --no-cache-dir poetry && \
-    # We already are in a virtual environment, so we don't need to create a new one, only activate it.
-    poetry config virtualenvs.create false && \
-    source /opt/app-root/bin/activate && \
-    if [ "$CPU_ONLY" = "true" ]; then \
-        poetry install --no-root --no-cache --no-interaction --all-extras --with cpu --without dev; \
-    else \
-        poetry install --no-root --no-cache --no-interaction --all-extras --without dev; \
-    fi && \
-    echo "Downloading models..." && \
-    python models_download.py && \
-    chown -R 1001:0 /opt/app-root/src && \
-    chmod -R g=u /opt/app-root/src
-
-COPY --chown=1001:0 --chmod=664 ./docling_serve ./docling_serve
+COPY --chown=1001:0 ./docling_serve ./docling_serve
+RUN --mount=from=ghcr.io/astral-sh/uv:0.6.1,source=/uv,target=/bin/uv \
+    --mount=type=cache,target=/opt/app-root/src/.cache/uv,uid=1001 \
+    --mount=type=bind,source=uv.lock,target=uv.lock \
+    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
+    uv sync --frozen --no-dev --all-extras ${UV_SYNC_EXTRA_ARGS}

 EXPOSE 5001

-CMD ["python", "-m", "docling_serve"]
+CMD ["docling-serve", "run"]
--- a/32
+++ b/32
@@ -24,19 +24,26 @@ action-lint-file:
 md-lint-file:
 	$(CMD_PREFIX) touch .markdown-lint

+.PHONY: docling-serve-image
+docling-serve-image: Containerfile
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124 --no-extra cpu" -f Containerfile -t ghcr.io/ds4sd/docling-serve:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) ghcr.io/ds4sd/docling-serve:main
+	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) quay.io/ds4sd/docling-serve:main
+
 .PHONY: docling-serve-cpu-image
 docling-serve-cpu-image: Containerfile ## Build docling-serve "cpu only" container image
-	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve CPU ONLY]"
-	$(CMD_PREFIX) docker build --build-arg CPU_ONLY=true -f Containerfile --platform linux/amd64 -t ghcr.io/ds4sd/docling-serve-cpu:$(TAG) .
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve CPU]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cu124" -f Containerfile -t ghcr.io/ds4sd/docling-serve-cpu:$(TAG) .
 	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cpu:$(TAG) ghcr.io/ds4sd/docling-serve-cpu:main
 	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cpu:$(TAG) quay.io/ds4sd/docling-serve-cpu:main

-.PHONY: docling-serve-gpu-image
-docling-serve-gpu-image: Containerfile ## Build docling-serve container image with GPU support
-	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with GPU]"
-	$(CMD_PREFIX) docker build --build-arg CPU_ONLY=false -f Containerfile --platform linux/amd64 -t ghcr.io/ds4sd/docling-serve:$(TAG) .
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) ghcr.io/ds4sd/docling-serve:main
-	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve:$(TAG) quay.io/ds4sd/docling-serve:main
+.PHONY: docling-serve-cu124-image
+docling-serve-cu124-image: Containerfile ## Build docling-serve container image with GPU support
+	$(ECHO_PREFIX) printf "  %-12s Containerfile\n" "[docling-serve with Cuda 12.4]"
+	$(CMD_PREFIX) docker build --load --build-arg "UV_SYNC_EXTRA_ARGS=--no-extra cpu" -f Containerfile --platform linux/amd64 -t ghcr.io/ds4sd/docling-serve-cu124:$(TAG) .
+	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cu124:$(TAG) ghcr.io/ds4sd/docling-serve-cu124:main
+	$(CMD_PREFIX) docker tag ghcr.io/ds4sd/docling-serve-cu124:$(TAG) quay.io/ds4sd/docling-serve-cu124:main

 .PHONY: action-lint
 action-lint: .action-lint ##      Lint GitHub Action workflows
@@ -65,13 +72,12 @@ md-lint: .md-lint ##      Lint markdown files
 .PHONY: py-Lint
 py-lint: ##      Lint Python files
 	$(ECHO_PREFIX) printf "  %-12s ./...\n" "[PY LINT]"
-	$(CMD_PREFIX) if ! which poetry $(PIPE_DEV_NULL) ; then \
-		echo "Please install poetry." ; \
-		echo "pip install poetry" ; \
+	$(CMD_PREFIX) if ! which uv $(PIPE_DEV_NULL) ; then \
+		echo "Please install uv." ; \
 		exit 1 ; \
 	fi
-	$(CMD_PREFIX) poetry install --all-extras
-	$(CMD_PREFIX) poetry run pre-commit run --all-files
+	$(CMD_PREFIX) uv sync --extra ui
+	$(CMD_PREFIX) uv run pre-commit run --all-files

 .PHONY: run-docling-cpu
 run-docling-cpu: ## Run the docling-serve container with CPU support and assign a container name
--- a/README.md
+++ b/README.md
@@ -104,7 +104,7 @@ curl -X 'POST' \
    "return_as_file": false,
    "do_table_structure": true,
    "include_images": true,
-    "images_scale": 2,
+    "images_scale": 2
  },
  "http_sources": [{"url": "https://arxiv.org/pdf/2206.01062"}]
 }'
@@ -276,6 +276,17 @@ The response can be a JSON Document or a File.
 - If you set the parameter `return_as_file` to True, the response will be a zip file.
 - If multiple files are generated (multiple inputs, or one input but multiple outputs with `return_as_file` True), the response will be a zip file.

+## Run docling-serve
+
+Clone the repository and run the following from within the cloned directory root.
+
+```bash
+python -m venv venv
+source venv/bin/activate
+pip install "docling-serve[ui]"
+docling-serve run --enable-ui
+```
+
 ## Helpers

 - A full Swagger UI is available at the `/docs` endpoint.
@@ -293,11 +304,11 @@ The response can be a JSON Document or a File.
 ### CPU only

 ```sh
-# Install poetry if not already available
-curl -sSL https://install.python-poetry.org | python3 -
+# Install uv if not already available
+curl -LsSf https://astral.sh/uv/install.sh | sh

 # Install dependencies
-poetry install --with cpu
+uv sync --extra cpu
 ```

 ### Cuda GPU
@@ -306,29 +317,107 @@ For GPU support use the following command:

 ```sh
 # Install dependencies
-poetry install
+uv sync
 ```

+### Gradio UI and different OCR backends
+
+`/ui` endpoint using `gradio` and different OCR backends can be enabled via package extras:
+
+```sh
+# Enable ui and rapidocr
+uv sync --extra ui --extra rapidocr
+```
+
+```sh
+# Enable tesserocr
+uv sync --extra tesserocr
+```
+
+See `[project.optional-dependencies]` section in `pyproject.toml` for full list of options and runtime options with `uv run docling-serve --help`.
+
 ### Run the server

-The [start_server.sh](./start_server.sh) executable is a convenient script for launching the local webserver.
+The `docling-serve` executable is a convenient script for launching the webserver both in
+development and production mode.

 ```sh
-# Run the server
-bash start_server.sh
+# Run the server in development mode
+# - reload is enabled by default
+# - listening on the 127.0.0.1 address
+# - ui is enabled by default
+docling-serve dev

-# Run the server with live reload
-RELOAD=true bash start_server.sh
+# Run the server in production mode
+# - reload is disabled by default
+# - listening on the 0.0.0.0 address
+# - ui is disabled by default
+docling-serve run
 ```

-### Environment variables
+### Options

-The following variables are available:
+The `docling-serve` executable allows is controlled with both command line
+options and environment variables.

-`TESSDATA_PREFIX`: Tesseract data location, example `/usr/share/tesseract/tessdata/`.
-`UVICORN_WORKERS`: Number of workers to use.
-`RELOAD`: If `True`, this will enable auto-reload when you modify files, useful for development.
-`WITH_UI`: If `True`, The Gradio UI will be available at `/ui`.
+<details>
+<summary>`docling-serve` help message</summary>
+
+```sh
+$ docling-serve dev --help
+                                                                                                              
+ Usage: docling-serve dev [OPTIONS]                                                                           
+                                                                                                              
+ Run a Docling Serve app in development mode. 🧪                                                              
+ This is equivalent to docling-serve run but with reload                                                      
+ enabled and listening on the 127.0.0.1 address.                                                              
+                                                                                                              
+ Options can be set also with the corresponding ENV variable, with the exception                              
+ of --enable-ui, --host and --reload.                                                                         
+                                                                                                              
+╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────╮
+│ --host                                   TEXT     The host to serve on. For local development in localhost │
+│                                                   use 127.0.0.1. To enable public access, e.g. in a        │
+│                                                   container, use all the IP addresses available with       │
+│                                                   0.0.0.0.                                                 │
+│                                                   [default: 127.0.0.1]                                     │
+│ --port                                   INTEGER  The port to serve on. [default: 5001]                    │
+│ --reload           --no-reload                    Enable auto-reload of the server when (code) files       │
+│                                                   change. This is resource intensive, use it only during   │
+│                                                   development.                                             │
+│                                                   [default: reload]                                        │
+│ --root-path                              TEXT     The root path is used to tell your app that it is being  │
+│                                                   served to the outside world with some path prefix set up │
+│                                                   in some termination proxy or similar.                    │
+│ --proxy-headers    --no-proxy-headers             Enable/Disable X-Forwarded-Proto, X-Forwarded-For,       │
+│                                                   X-Forwarded-Port to populate remote address info.        │
+│                                                   [default: proxy-headers]                                 │
+│ --artifacts-path                          PATH     If set to a valid directory, the model weights will be  │
+│                                                    loaded from this path.                                  │
+│                                                    [default: None]                                         │
+│ --enable-ui        --no-enable-ui                 Enable the development UI. [default: enable-ui]          │
+│ --help                                            Show this message and exit.                              │
+╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+```
+
+</details>
+
+#### Environment variables
+
+The environment variables controlling the `uvicorn` execution can be specified with the `UVICORN_` prefix:
+
+- `UVICORN_WORKERS`: Number of workers to use.
+- `UVICORN_RELOAD`: If `True`, this will enable auto-reload when you modify files, useful for development.
+
+The environment variables controlling specifics of the Docling Serve app can be specified with the
+`DOCLING_SERVE_` prefix:
+
+- `DOCLING_SERVE_ARTIFACTS_PATH`: if set Docling will use only the local weights of models, for example `/opt/app-root/src/.cache/docling/models`.
+- `DOCLING_SERVE_ENABLE_UI`: If `True`, The Gradio UI will be available at `/ui`.
+
+Others:
+
+- `TESSDATA_PREFIX`: Tesseract data location, example `/usr/share/tesseract/tessdata/`.

 ## Get help and support

--- a/docling_serve/main.py
+++ b/docling_serve/main.py
@@ -1,20 +1,299 @@
-import os
+import importlib.metadata
+import logging
+import platform
+import sys
+import warnings
+from pathlib import Path
+from typing import Annotated, Any, Optional, Union

-from docling_serve.app import app
-from docling_serve.helper_functions import _str_to_bool
+import typer
+import uvicorn
+from rich.console import Console

-# Launch the FastAPI server
-if __name__ == "__main__":
-    from uvicorn import run
+from docling_serve.settings import docling_serve_settings, uvicorn_settings

-    port = int(os.getenv("PORT", "5001"))
-    workers = int(os.getenv("UVICORN_WORKERS", "1"))
-    reload = _str_to_bool(os.getenv("RELOAD", "False"))
-    run(
-        app,
-        host="0.0.0.0",
-        port=port,
-        workers=workers,
-        timeout_keep_alive=600,
-        reload=reload,
+warnings.filterwarnings(action="ignore", category=UserWarning, module="pydantic|torch")
+warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr")
+
+
+err_console = Console(stderr=True)
+console = Console()
+
+app = typer.Typer(
+    no_args_is_help=True,
+    rich_markup_mode="rich",
+)
+
+logger = logging.getLogger(__name__)
+
+
+def version_callback(value: bool) -> None:
+    if value:
+        docling_serve_version = importlib.metadata.version("docling_serve")
+        docling_version = importlib.metadata.version("docling")
+        docling_core_version = importlib.metadata.version("docling-core")
+        docling_ibm_models_version = importlib.metadata.version("docling-ibm-models")
+        docling_parse_version = importlib.metadata.version("docling-parse")
+        platform_str = platform.platform()
+        py_impl_version = sys.implementation.cache_tag
+        py_lang_version = platform.python_version()
+        console.print(f"Docling Serve version: {docling_serve_version}")
+        console.print(f"Docling version: {docling_version}")
+        console.print(f"Docling Core version: {docling_core_version}")
+        console.print(f"Docling IBM Models version: {docling_ibm_models_version}")
+        console.print(f"Docling Parse version: {docling_parse_version}")
+        console.print(f"Python: {py_impl_version} ({py_lang_version})")
+        console.print(f"Platform: {platform_str}")
+        raise typer.Exit()
+
+
+@app.callback()
+def callback(
+    version: Annotated[
+        Union[bool, None],
+        typer.Option(help="Show the version and exit.", callback=version_callback),
+    ] = None,
+    verbose: Annotated[
+        int,
+        typer.Option(
+            "--verbose",
+            "-v",
+            count=True,
+            help="Set the verbosity level. -v for info logging, -vv for debug logging.",
+        ),
+    ] = 0,
+) -> None:
+    if verbose == 0:
+        logging.basicConfig(level=logging.WARNING)
+    elif verbose == 1:
+        logging.basicConfig(level=logging.INFO)
+    elif verbose == 2:
+        logging.basicConfig(level=logging.DEBUG)
+
+
+def _run(
+    *,
+    command: str,
+) -> None:
+    server_type = "development" if command == "dev" else "production"
+
+    console.print(f"Starting {server_type} server 🚀")
+
+    url = f"http://{uvicorn_settings.host}:{uvicorn_settings.port}"
+    url_docs = f"{url}/docs"
+    url_ui = f"{url}/ui"
+
+    console.print("")
+    console.print(f"Server started at [link={url}]{url}[/]")
+    console.print(f"Documentation at [link={url_docs}]{url_docs}[/]")
+    if docling_serve_settings.enable_ui:
+        console.print(f"UI at [link={url_ui}]{url_ui}[/]")
+
+    if command == "dev":
+        console.print("")
+        console.print(
+            "Running in development mode, for production use: "
+            "[bold]docling-serve run[/]",
+        )
+
+    console.print("")
+    console.print("Logs:")
+
+    uvicorn.run(
+        app="docling_serve.app:create_app",
+        factory=True,
+        host=uvicorn_settings.host,
+        port=uvicorn_settings.port,
+        reload=uvicorn_settings.reload,
+        workers=uvicorn_settings.workers,
+        root_path=uvicorn_settings.root_path,
+        proxy_headers=uvicorn_settings.proxy_headers,
    )
+
+
+@app.command()
+def dev(
+    *,
+    # uvicorn options
+    host: Annotated[
+        str,
+        typer.Option(
+            help=(
+                "The host to serve on. For local development in localhost "
+                "use [blue]127.0.0.1[/blue]. To enable public access, "
+                "e.g. in a container, use all the IP addresses "
+                "available with [blue]0.0.0.0[/blue]."
+            )
+        ),
+    ] = "127.0.0.1",
+    port: Annotated[
+        int,
+        typer.Option(help="The port to serve on."),
+    ] = uvicorn_settings.port,
+    reload: Annotated[
+        bool,
+        typer.Option(
+            help=(
+                "Enable auto-reload of the server when (code) files change. "
+                "This is [bold]resource intensive[/bold], "
+                "use it only during development."
+            )
+        ),
+    ] = True,
+    root_path: Annotated[
+        str,
+        typer.Option(
+            help=(
+                "The root path is used to tell your app that it is being served "
+                "to the outside world with some [bold]path prefix[/bold] "
+                "set up in some termination proxy or similar."
+            )
+        ),
+    ] = uvicorn_settings.root_path,
+    proxy_headers: Annotated[
+        bool,
+        typer.Option(
+            help=(
+                "Enable/Disable X-Forwarded-Proto, X-Forwarded-For, "
+                "X-Forwarded-Port to populate remote address info."
+            )
+        ),
+    ] = uvicorn_settings.proxy_headers,
+    # docling options
+    artifacts_path: Annotated[
+        Optional[Path],
+        typer.Option(
+            help=(
+                "If set to a valid directory, "
+                "the model weights will be loaded from this path."
+            )
+        ),
+    ] = docling_serve_settings.artifacts_path,
+    enable_ui: Annotated[bool, typer.Option(help="Enable the development UI.")] = True,
+) -> Any:
+    """
+    Run a [bold]Docling Serve[/bold] app in [yellow]development[/yellow] mode. 🧪
+
+    This is equivalent to [bold]docling-serve run[/bold] but with [bold]reload[/bold]
+    enabled and listening on the [blue]127.0.0.1[/blue] address.
+
+    Options can be set also with the corresponding ENV variable, with the exception
+    of --enable-ui, --host and --reload.
+    """
+
+    uvicorn_settings.host = host
+    uvicorn_settings.port = port
+    uvicorn_settings.reload = reload
+    uvicorn_settings.root_path = root_path
+    uvicorn_settings.proxy_headers = proxy_headers
+
+    docling_serve_settings.artifacts_path = artifacts_path
+    docling_serve_settings.enable_ui = enable_ui
+
+    _run(
+        command="dev",
+    )
+
+
+@app.command()
+def run(
+    *,
+    host: Annotated[
+        str,
+        typer.Option(
+            help=(
+                "The host to serve on. For local development in localhost "
+                "use [blue]127.0.0.1[/blue]. To enable public access, "
+                "e.g. in a container, use all the IP addresses "
+                "available with [blue]0.0.0.0[/blue]."
+            )
+        ),
+    ] = uvicorn_settings.host,
+    port: Annotated[
+        int,
+        typer.Option(help="The port to serve on."),
+    ] = uvicorn_settings.port,
+    reload: Annotated[
+        bool,
+        typer.Option(
+            help=(
+                "Enable auto-reload of the server when (code) files change. "
+                "This is [bold]resource intensive[/bold], "
+                "use it only during development."
+            )
+        ),
+    ] = uvicorn_settings.reload,
+    workers: Annotated[
+        Union[int, None],
+        typer.Option(
+            help=(
+                "Use multiple worker processes. "
+                "Mutually exclusive with the --reload flag."
+            )
+        ),
+    ] = uvicorn_settings.workers,
+    root_path: Annotated[
+        str,
+        typer.Option(
+            help=(
+                "The root path is used to tell your app that it is being served "
+                "to the outside world with some [bold]path prefix[/bold] "
+                "set up in some termination proxy or similar."
+            )
+        ),
+    ] = uvicorn_settings.root_path,
+    proxy_headers: Annotated[
+        bool,
+        typer.Option(
+            help=(
+                "Enable/Disable X-Forwarded-Proto, X-Forwarded-For, "
+                "X-Forwarded-Port to populate remote address info."
+            )
+        ),
+    ] = uvicorn_settings.proxy_headers,
+    # docling options
+    artifacts_path: Annotated[
+        Optional[Path],
+        typer.Option(
+            help=(
+                "If set to a valid directory, "
+                "the model weights will be loaded from this path."
+            )
+        ),
+    ] = docling_serve_settings.artifacts_path,
+    enable_ui: Annotated[
+        bool, typer.Option(help="Enable the development UI.")
+    ] = docling_serve_settings.enable_ui,
+) -> Any:
+    """
+    Run a [bold]Docling Serve[/bold] app in [green]production[/green] mode. 🚀
+
+    This is equivalent to [bold]docling-serve dev[/bold] but with [bold]reload[/bold]
+    disabled and listening on the [blue]0.0.0.0[/blue] address.
+
+    Options can be set also with the corresponding ENV variable, e.g. UVICORN_PORT
+    or DOCLING_SERVE_ENABLE_UI.
+    """
+
+    uvicorn_settings.host = host
+    uvicorn_settings.port = port
+    uvicorn_settings.reload = reload
+    uvicorn_settings.workers = workers
+    uvicorn_settings.root_path = root_path
+    uvicorn_settings.proxy_headers = proxy_headers
+
+    docling_serve_settings.artifacts_path = artifacts_path
+    docling_serve_settings.enable_ui = enable_ui
+
+    _run(
+        command="run",
+    )
+
+
+def main() -> None:
+    app()
+
+
+# Launch the CLI when calling python -m docling_serve
+if __name__ == "__main__":
+    main()
--- a/docling_serve/app.py
+++ b/docling_serve/app.py
@@ -1,38 +1,53 @@
+import asyncio
+import importlib.metadata
 import logging
-import os
 import tempfile
 from contextlib import asynccontextmanager
 from io import BytesIO
 from pathlib import Path
-from typing import Annotated, Any, Dict, List, Optional, Union
+from typing import Annotated, Any, Optional, Union
+
+from fastapi import (
+    BackgroundTasks,
+    Depends,
+    FastAPI,
+    HTTPException,
+    Query,
+    UploadFile,
+    WebSocket,
+    WebSocketDisconnect,
+)
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import RedirectResponse

 from docling.datamodel.base_models import DocumentStream, InputFormat
 from docling.document_converter import DocumentConverter
-from dotenv import load_dotenv
-from fastapi import BackgroundTasks, FastAPI, UploadFile
-from fastapi.middleware.cors import CORSMiddleware
-from fastapi.responses import RedirectResponse
-from pydantic import BaseModel

-from docling_serve.docling_conversion import (
+from docling_serve.datamodel.convert import ConvertDocumentsOptions
+from docling_serve.datamodel.requests import (
    ConvertDocumentFileSourcesRequest,
-    ConvertDocumentsOptions,
    ConvertDocumentsRequest,
+)
+from docling_serve.datamodel.responses import (
+    ConvertDocumentResponse,
+    HealthCheckResponse,
+    MessageKind,
+    TaskStatusResponse,
+    WebsocketMessage,
+)
+from docling_serve.docling_conversion import (
    convert_documents,
    converters,
    get_pdf_pipeline_opts,
 )
-from docling_serve.helper_functions import FormDepends, _str_to_bool
-from docling_serve.response_preparation import ConvertDocumentResponse, process_results
-
-# Load local env vars if present
-load_dotenv()
-
-WITH_UI = _str_to_bool(os.getenv("WITH_UI", "False"))
-if WITH_UI:
-    import gradio as gr
-
-    from docling_serve.gradio_ui import ui as gradio_ui
+from docling_serve.engines import get_orchestrator
+from docling_serve.engines.async_local.orchestrator import (
+    AsyncLocalOrchestrator,
+    TaskNotFoundError,
+)
+from docling_serve.helper_functions import FormDepends
+from docling_serve.response_preparation import process_results
+from docling_serve.settings import docling_serve_settings


 # Set up custom logging as we'll be intermixes with FastAPI/Uvicorn's logging
@@ -70,8 +85,6 @@ _log = logging.getLogger(__name__)
 # Context manager to initialize and clean up the lifespan of the FastAPI app
@asynccontextmanager
 async def lifespan(app: FastAPI):
-    # settings = Settings()
-
    # Converter with default options
    pdf_format_option, options_hash = get_pdf_pipeline_opts(ConvertDocumentsOptions())
    converters[options_hash] = DocumentConverter(
@@ -83,142 +96,299 @@ async def lifespan(app: FastAPI):

    converters[options_hash].initialize_pipeline(InputFormat.PDF)

+    orchestrator = get_orchestrator()
+
+    # Start the background queue processor
+    queue_task = asyncio.create_task(orchestrator.process_queue())
+
    yield

+    # Cancel the background queue processor on shutdown
+    queue_task.cancel()
+    try:
+        await queue_task
+    except asyncio.CancelledError:
+        _log.info("Queue processor cancelled.")
+
    converters.clear()
-    if WITH_UI:
-        gradio_ui.close()
+
+    # if WITH_UI:
+    #     gradio_ui.close()


 ##################################
 # App creation and configuration #
 ##################################

-app = FastAPI(
-    title="Docling Serve",
-    lifespan=lifespan,
-)

-origins = ["*"]
-methods = ["*"]
-headers = ["*"]
+def create_app():  # noqa: C901
+    try:
+        version = importlib.metadata.version("docling_serve")
+    except importlib.metadata.PackageNotFoundError:
+        _log.warning("Unable to get docling_serve version, falling back to 0.0.0")

-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=origins,
-    allow_credentials=True,
-    allow_methods=methods,
-    allow_headers=headers,
-)
+        version = "0.0.0"

-# Mount the Gradio app
-if WITH_UI:
-    tmp_output_dir = Path(tempfile.mkdtemp())
-    gradio_ui.gradio_output_dir = tmp_output_dir
-    app = gr.mount_gradio_app(
-        app, gradio_ui, path="/ui", allowed_paths=["./logo.png", tmp_output_dir]
+    app = FastAPI(
+        title="Docling Serve",
+        lifespan=lifespan,
+        version=version,
    )

+    origins = ["*"]
+    methods = ["*"]
+    headers = ["*"]

-#############################
-# API Endpoints definitions #
-#############################
-
-
-# Favicon
-@app.get("/favicon.ico", include_in_schema=False)
-async def favicon():
-    response = RedirectResponse(url="https://ds4sd.github.io/docling/assets/logo.png")
-    return response
-
-
-# Status
-class HealthCheckResponse(BaseModel):
-    status: str = "ok"
-
-
-@app.get("/health")
-def health() -> HealthCheckResponse:
-    return HealthCheckResponse()
-
-
-# API readiness compatibility for OpenShift AI Workbench
-@app.get("/api", include_in_schema=False)
-def api_check() -> HealthCheckResponse:
-    return HealthCheckResponse()
-
-
-# Convert a document from URL(s)
-@app.post(
-    "/v1alpha/convert/source",
-    response_model=ConvertDocumentResponse,
-    responses={
-        200: {
-            "content": {"application/zip": {}},
-            # "description": "Return the JSON item or an image.",
-        }
-    },
-)
-def process_url(
-    background_tasks: BackgroundTasks, conversion_request: ConvertDocumentsRequest
-):
-    sources: List[Union[str, DocumentStream]] = []
-    headers: Optional[Dict[str, Any]] = None
-    if isinstance(conversion_request, ConvertDocumentFileSourcesRequest):
-        for file_source in conversion_request.file_sources:
-            sources.append(file_source.to_document_stream())
-    else:
-        for http_source in conversion_request.http_sources:
-            sources.append(http_source.url)
-            if headers is None and http_source.headers:
-                headers = http_source.headers
-
-    # Note: results are only an iterator->lazy evaluation
-    results = convert_documents(
-        sources=sources, options=conversion_request.options, headers=headers
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=origins,
+        allow_credentials=True,
+        allow_methods=methods,
+        allow_headers=headers,
    )

-    # The real processing will happen here
-    response = process_results(
-        background_tasks=background_tasks,
-        conversion_options=conversion_request.options,
-        conv_results=results,
+    # Mount the Gradio app
+    if docling_serve_settings.enable_ui:
+        try:
+            import gradio as gr
+
+            from docling_serve.gradio_ui import ui as gradio_ui
+
+            tmp_output_dir = Path(tempfile.mkdtemp())
+            gradio_ui.gradio_output_dir = tmp_output_dir
+            app = gr.mount_gradio_app(
+                app,
+                gradio_ui,
+                path="/ui",
+                allowed_paths=["./logo.png", tmp_output_dir],
+                root_path="/ui",
+            )
+        except ImportError:
+            _log.warning(
+                "Docling Serve enable_ui is activated, but gradio is not installed. "
+                "Install it with `pip install docling-serve[ui]` "
+                "or `pip install gradio`"
+            )
+
+    #############################
+    # API Endpoints definitions #
+    #############################
+
+    # Favicon
+    @app.get("/favicon.ico", include_in_schema=False)
+    async def favicon():
+        response = RedirectResponse(
+            url="https://ds4sd.github.io/docling/assets/logo.png"
+        )
+        return response
+
+    @app.get("/health")
+    def health() -> HealthCheckResponse:
+        return HealthCheckResponse()
+
+    # API readiness compatibility for OpenShift AI Workbench
+    @app.get("/api", include_in_schema=False)
+    def api_check() -> HealthCheckResponse:
+        return HealthCheckResponse()
+
+    # Convert a document from URL(s)
+    @app.post(
+        "/v1alpha/convert/source",
+        response_model=ConvertDocumentResponse,
+        responses={
+            200: {
+                "content": {"application/zip": {}},
+                # "description": "Return the JSON item or an image.",
+            }
+        },
    )
+    def process_url(
+        background_tasks: BackgroundTasks, conversion_request: ConvertDocumentsRequest
+    ):
+        sources: list[Union[str, DocumentStream]] = []
+        headers: Optional[dict[str, Any]] = None
+        if isinstance(conversion_request, ConvertDocumentFileSourcesRequest):
+            for file_source in conversion_request.file_sources:
+                sources.append(file_source.to_document_stream())
+        else:
+            for http_source in conversion_request.http_sources:
+                sources.append(http_source.url)
+                if headers is None and http_source.headers:
+                    headers = http_source.headers

-    return response
+        # Note: results are only an iterator->lazy evaluation
+        results = convert_documents(
+            sources=sources, options=conversion_request.options, headers=headers
+        )

+        # The real processing will happen here
+        response = process_results(
+            background_tasks=background_tasks,
+            conversion_options=conversion_request.options,
+            conv_results=results,
+        )

-# Convert a document from file(s)
-@app.post(
-    "/v1alpha/convert/file",
-    response_model=ConvertDocumentResponse,
-    responses={
-        200: {
-            "content": {"application/zip": {}},
-        }
-    },
-)
-async def process_file(
-    background_tasks: BackgroundTasks,
-    files: List[UploadFile],
-    options: Annotated[ConvertDocumentsOptions, FormDepends(ConvertDocumentsOptions)],
-):
+        return response

-    _log.info(f"Received {len(files)} files for processing.")
-
-    # Load the uploaded files to Docling DocumentStream
-    file_sources = []
-    for file in files:
-        buf = BytesIO(file.file.read())
-        name = file.filename if file.filename else "file.pdf"
-        file_sources.append(DocumentStream(name=name, stream=buf))
-
-    results = convert_documents(sources=file_sources, options=options)
-
-    response = process_results(
-        background_tasks=background_tasks,
-        conversion_options=options,
-        conv_results=results,
+    # Convert a document from file(s)
+    @app.post(
+        "/v1alpha/convert/file",
+        response_model=ConvertDocumentResponse,
+        responses={
+            200: {
+                "content": {"application/zip": {}},
+            }
+        },
    )
+    async def process_file(
+        background_tasks: BackgroundTasks,
+        files: list[UploadFile],
+        options: Annotated[
+            ConvertDocumentsOptions, FormDepends(ConvertDocumentsOptions)
+        ],
+    ):
+        _log.info(f"Received {len(files)} files for processing.")

-    return response
+        # Load the uploaded files to Docling DocumentStream
+        file_sources = []
+        for file in files:
+            buf = BytesIO(file.file.read())
+            name = file.filename if file.filename else "file.pdf"
+            file_sources.append(DocumentStream(name=name, stream=buf))
+
+        results = convert_documents(sources=file_sources, options=options)
+
+        response = process_results(
+            background_tasks=background_tasks,
+            conversion_options=options,
+            conv_results=results,
+        )
+
+        return response
+
+    # Convert a document from URL(s) using the async api
+    @app.post(
+        "/v1alpha/convert/source/async",
+        response_model=TaskStatusResponse,
+    )
+    async def process_url_async(
+        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        conversion_request: ConvertDocumentsRequest,
+    ):
+        task = await orchestrator.enqueue(request=conversion_request)
+        task_queue_position = await orchestrator.get_queue_position(
+            task_id=task.task_id
+        )
+        return TaskStatusResponse(
+            task_id=task.task_id,
+            task_status=task.task_status,
+            task_position=task_queue_position,
+        )
+
+    # Task status poll
+    @app.get(
+        "/v1alpha/status/poll/{task_id}",
+        response_model=TaskStatusResponse,
+    )
+    async def task_status_poll(
+        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        task_id: str,
+        wait: Annotated[
+            float, Query(help="Number of seconds to wait for a completed status.")
+        ] = 0.0,
+    ):
+        try:
+            task = await orchestrator.task_status(task_id=task_id, wait=wait)
+            task_queue_position = await orchestrator.get_queue_position(task_id=task_id)
+        except TaskNotFoundError:
+            raise HTTPException(status_code=404, detail="Task not found.")
+        return TaskStatusResponse(
+            task_id=task.task_id,
+            task_status=task.task_status,
+            task_position=task_queue_position,
+        )
+
+    # Task status websocket
+    @app.websocket(
+        "/v1alpha/status/ws/{task_id}",
+    )
+    async def task_status_ws(
+        websocket: WebSocket,
+        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        task_id: str,
+    ):
+        await websocket.accept()
+
+        if task_id not in orchestrator.tasks:
+            await websocket.send_text(
+                WebsocketMessage(
+                    message=MessageKind.ERROR, error="Task not found."
+                ).model_dump_json()
+            )
+            await websocket.close()
+            return
+
+        task = orchestrator.tasks[task_id]
+
+        # Track active WebSocket connections for this job
+        orchestrator.task_subscribers[task_id].add(websocket)
+
+        try:
+            task_queue_position = await orchestrator.get_queue_position(task_id=task_id)
+            task_response = TaskStatusResponse(
+                task_id=task.task_id,
+                task_status=task.task_status,
+                task_position=task_queue_position,
+            )
+            await websocket.send_text(
+                WebsocketMessage(
+                    message=MessageKind.CONNECTION, task=task_response
+                ).model_dump_json()
+            )
+            while True:
+                task_queue_position = await orchestrator.get_queue_position(
+                    task_id=task_id
+                )
+                task_response = TaskStatusResponse(
+                    task_id=task.task_id,
+                    task_status=task.task_status,
+                    task_position=task_queue_position,
+                )
+                await websocket.send_text(
+                    WebsocketMessage(
+                        message=MessageKind.UPDATE, task=task_response
+                    ).model_dump_json()
+                )
+                # each client message will be interpreted as a request for update
+                msg = await websocket.receive_text()
+                _log.debug(f"Received message: {msg}")
+
+        except WebSocketDisconnect:
+            _log.info(f"WebSocket disconnected for job {task_id}")
+
+        finally:
+            orchestrator.task_subscribers[task_id].remove(websocket)
+
+    # Task result
+    @app.get(
+        "/v1alpha/result/{task_id}",
+        response_model=ConvertDocumentResponse,
+        responses={
+            200: {
+                "content": {"application/zip": {}},
+            }
+        },
+    )
+    async def task_result(
+        orchestrator: Annotated[AsyncLocalOrchestrator, Depends(get_orchestrator)],
+        task_id: str,
+    ):
+        result = await orchestrator.task_result(task_id=task_id)
+        if result is None:
+            raise HTTPException(
+                status_code=404,
+                detail="Task result not found. Please wait for a completion status.",
+            )
+        return result
+
+    return app
--- a/docling_serve/datamodel/init.py
+++ b/docling_serve/datamodel/init.py
--- a/docling_serve/datamodel/convert.py
+++ b/docling_serve/datamodel/convert.py
@@ -0,0 +1,174 @@
+# Define the input options for the API
+from typing import Annotated, Optional
+
+from pydantic import BaseModel, Field
+
+from docling.datamodel.base_models import InputFormat, OutputFormat
+from docling.datamodel.pipeline_options import OcrEngine, PdfBackend, TableFormerMode
+from docling_core.types.doc import ImageRefMode
+
+
+class ConvertDocumentsOptions(BaseModel):
+    from_formats: Annotated[
+        list[InputFormat],
+        Field(
+            description=(
+                "Input format(s) to convert from. String or list of strings. "
+                f"Allowed values: {', '.join([v.value for v in InputFormat])}. "
+                "Optional, defaults to all formats."
+            ),
+            examples=[[v.value for v in InputFormat]],
+        ),
+    ] = list(InputFormat)
+
+    to_formats: Annotated[
+        list[OutputFormat],
+        Field(
+            description=(
+                "Output format(s) to convert to. String or list of strings. "
+                f"Allowed values: {', '.join([v.value for v in OutputFormat])}. "
+                "Optional, defaults to Markdown."
+            ),
+            examples=[[OutputFormat.MARKDOWN]],
+        ),
+    ] = [OutputFormat.MARKDOWN]
+
+    image_export_mode: Annotated[
+        ImageRefMode,
+        Field(
+            description=(
+                "Image export mode for the document (in case of JSON,"
+                " Markdown or HTML). "
+                f"Allowed values: {', '.join([v.value for v in ImageRefMode])}. "
+                "Optional, defaults to Embedded."
+            ),
+            examples=[ImageRefMode.EMBEDDED.value],
+            # pattern="embedded|placeholder|referenced",
+        ),
+    ] = ImageRefMode.EMBEDDED
+
+    do_ocr: Annotated[
+        bool,
+        Field(
+            description=(
+                "If enabled, the bitmap content will be processed using OCR. "
+                "Boolean. Optional, defaults to true"
+            ),
+            # examples=[True],
+        ),
+    ] = True
+
+    force_ocr: Annotated[
+        bool,
+        Field(
+            description=(
+                "If enabled, replace existing text with OCR-generated "
+                "text over content. Boolean. Optional, defaults to false."
+            ),
+            # examples=[False],
+        ),
+    ] = False
+
+    # TODO: use a restricted list based on what is installed on the system
+    ocr_engine: Annotated[
+        OcrEngine,
+        Field(
+            description=(
+                "The OCR engine to use. String. "
+                "Allowed values: easyocr, tesseract, rapidocr. "
+                "Optional, defaults to easyocr."
+            ),
+            examples=[OcrEngine.EASYOCR],
+        ),
+    ] = OcrEngine.EASYOCR
+
+    ocr_lang: Annotated[
+        Optional[list[str]],
+        Field(
+            description=(
+                "List of languages used by the OCR engine. "
+                "Note that each OCR engine has "
+                "different values for the language names. String or list of strings. "
+                "Optional, defaults to empty."
+            ),
+            examples=[["fr", "de", "es", "en"]],
+        ),
+    ] = None
+
+    pdf_backend: Annotated[
+        PdfBackend,
+        Field(
+            description=(
+                "The PDF backend to use. String. "
+                f"Allowed values: {', '.join([v.value for v in PdfBackend])}. "
+                f"Optional, defaults to {PdfBackend.DLPARSE_V2.value}."
+            ),
+            examples=[PdfBackend.DLPARSE_V2],
+        ),
+    ] = PdfBackend.DLPARSE_V2
+
+    table_mode: Annotated[
+        TableFormerMode,
+        Field(
+            TableFormerMode.FAST,
+            description=(
+                "Mode to use for table structure, String. "
+                f"Allowed values: {', '.join([v.value for v in TableFormerMode])}. "
+                "Optional, defaults to fast."
+            ),
+            examples=[TableFormerMode.FAST],
+            # pattern="fast|accurate",
+        ),
+    ] = TableFormerMode.FAST
+
+    abort_on_error: Annotated[
+        bool,
+        Field(
+            description=(
+                "Abort on error if enabled. Boolean. Optional, defaults to false."
+            ),
+            # examples=[False],
+        ),
+    ] = False
+
+    return_as_file: Annotated[
+        bool,
+        Field(
+            description=(
+                "Return the output as a zip file "
+                "(will happen anyway if multiple files are generated). "
+                "Boolean. Optional, defaults to false."
+            ),
+            examples=[False],
+        ),
+    ] = False
+
+    do_table_structure: Annotated[
+        bool,
+        Field(
+            description=(
+                "If enabled, the table structure will be extracted. "
+                "Boolean. Optional, defaults to true."
+            ),
+            examples=[True],
+        ),
+    ] = True
+
+    include_images: Annotated[
+        bool,
+        Field(
+            description=(
+                "If enabled, images will be extracted from the document. "
+                "Boolean. Optional, defaults to true."
+            ),
+            examples=[True],
+        ),
+    ] = True
+
+    images_scale: Annotated[
+        float,
+        Field(
+            description="Scale factor for images. Float. Optional, defaults to 2.0.",
+            examples=[2.0],
+        ),
+    ] = 2.0
--- a/docling_serve/datamodel/engines.py
+++ b/docling_serve/datamodel/engines.py
@@ -0,0 +1,30 @@
+import enum
+from typing import Optional
+
+from pydantic import BaseModel
+
+from docling_serve.datamodel.requests import ConvertDocumentsRequest
+from docling_serve.datamodel.responses import ConvertDocumentResponse
+
+
+class TaskStatus(str, enum.Enum):
+    SUCCESS = "success"
+    PENDING = "pending"
+    STARTED = "started"
+    FAILURE = "failure"
+
+
+class AsyncEngine(str, enum.Enum):
+    LOCAL = "local"
+
+
+class Task(BaseModel):
+    task_id: str
+    task_status: TaskStatus = TaskStatus.PENDING
+    request: Optional[ConvertDocumentsRequest]
+    result: Optional[ConvertDocumentResponse] = None
+
+    def is_completed(self) -> bool:
+        if self.task_status in [TaskStatus.SUCCESS, TaskStatus.FAILURE]:
+            return True
+        return False
--- a/docling_serve/datamodel/requests.py
+++ b/docling_serve/datamodel/requests.py
@@ -0,0 +1,62 @@
+import base64
+from io import BytesIO
+from typing import Annotated, Any, Union
+
+from pydantic import BaseModel, Field
+
+from docling.datamodel.base_models import DocumentStream
+
+from docling_serve.datamodel.convert import ConvertDocumentsOptions
+
+
+class DocumentsConvertBase(BaseModel):
+    options: ConvertDocumentsOptions = ConvertDocumentsOptions()
+
+
+class HttpSource(BaseModel):
+    url: Annotated[
+        str,
+        Field(
+            description="HTTP url to process",
+            examples=["https://arxiv.org/pdf/2206.01062"],
+        ),
+    ]
+    headers: Annotated[
+        dict[str, Any],
+        Field(
+            description="Additional headers used to fetch the urls, "
+            "e.g. authorization, agent, etc"
+        ),
+    ] = {}
+
+
+class FileSource(BaseModel):
+    base64_string: Annotated[
+        str,
+        Field(
+            description="Content of the file serialized in base64. "
+            "For example it can be obtained via "
+            "`base64 -w 0 /path/to/file/pdf-to-convert.pdf`."
+        ),
+    ]
+    filename: Annotated[
+        str,
+        Field(description="Filename of the uploaded document", examples=["file.pdf"]),
+    ]
+
+    def to_document_stream(self) -> DocumentStream:
+        buf = BytesIO(base64.b64decode(self.base64_string))
+        return DocumentStream(stream=buf, name=self.filename)
+
+
+class ConvertDocumentHttpSourcesRequest(DocumentsConvertBase):
+    http_sources: list[HttpSource]
+
+
+class ConvertDocumentFileSourcesRequest(DocumentsConvertBase):
+    file_sources: list[FileSource]
+
+
+ConvertDocumentsRequest = Union[
+    ConvertDocumentFileSourcesRequest, ConvertDocumentHttpSourcesRequest
+]
--- a/docling_serve/datamodel/responses.py
+++ b/docling_serve/datamodel/responses.py
@@ -0,0 +1,52 @@
+import enum
+from typing import Optional
+
+from pydantic import BaseModel
+
+from docling.datamodel.document import ConversionStatus, ErrorItem
+from docling.utils.profiling import ProfilingItem
+from docling_core.types.doc import DoclingDocument
+
+
+# Status
+class HealthCheckResponse(BaseModel):
+    status: str = "ok"
+
+
+class DocumentResponse(BaseModel):
+    filename: str
+    md_content: Optional[str] = None
+    json_content: Optional[DoclingDocument] = None
+    html_content: Optional[str] = None
+    text_content: Optional[str] = None
+    doctags_content: Optional[str] = None
+
+
+class ConvertDocumentResponse(BaseModel):
+    document: DocumentResponse
+    status: ConversionStatus
+    errors: list[ErrorItem] = []
+    processing_time: float
+    timings: dict[str, ProfilingItem] = {}
+
+
+class ConvertDocumentErrorResponse(BaseModel):
+    status: ConversionStatus
+
+
+class TaskStatusResponse(BaseModel):
+    task_id: str
+    task_status: str
+    task_position: Optional[int] = None
+
+
+class MessageKind(str, enum.Enum):
+    CONNECTION = "connection"
+    UPDATE = "update"
+    ERROR = "error"
+
+
+class WebsocketMessage(BaseModel):
+    message: MessageKind
+    task: Optional[TaskStatusResponse] = None
+    error: Optional[str] = None
--- a/docling_serve/docling_conversion.py
+++ b/docling_serve/docling_conversion.py
@@ -1,27 +1,17 @@
-import base64
 import hashlib
 import json
 import logging
-from io import BytesIO
+from collections.abc import Iterable, Iterator
 from pathlib import Path
-from typing import (
-    Annotated,
-    Any,
-    Dict,
-    Iterable,
-    Iterator,
-    List,
-    Optional,
-    Tuple,
-    Type,
-    Union,
-)
+from typing import Any, Optional, Union
+
+from fastapi import HTTPException

 from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
 from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend
 from docling.backend.pdf_backend import PdfDocumentBackend
 from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
-from docling.datamodel.base_models import DocumentStream, InputFormat, OutputFormat
+from docling.datamodel.base_models import DocumentStream, InputFormat
 from docling.datamodel.document import ConversionResult
 from docling.datamodel.pipeline_options import (
    EasyOcrOptions,
@@ -35,236 +25,16 @@ from docling.datamodel.pipeline_options import (
 )
 from docling.document_converter import DocumentConverter, FormatOption, PdfFormatOption
 from docling_core.types.doc import ImageRefMode
-from fastapi import HTTPException
-from pydantic import BaseModel, Field

+from docling_serve.datamodel.convert import ConvertDocumentsOptions
 from docling_serve.helper_functions import _to_list_of_strings
+from docling_serve.settings import docling_serve_settings

 _log = logging.getLogger(__name__)


-# Define the input options for the API
-class ConvertDocumentsOptions(BaseModel):
-    from_formats: Annotated[
-        List[InputFormat],
-        Field(
-            description=(
-                "Input format(s) to convert from. String or list of strings. "
-                f"Allowed values: {', '.join([v.value for v in InputFormat])}. "
-                "Optional, defaults to all formats."
-            ),
-            examples=[[v.value for v in InputFormat]],
-        ),
-    ] = [v for v in InputFormat]
-
-    to_formats: Annotated[
-        List[OutputFormat],
-        Field(
-            description=(
-                "Output format(s) to convert to. String or list of strings. "
-                f"Allowed values: {', '.join([v.value for v in OutputFormat])}. "
-                "Optional, defaults to Markdown."
-            ),
-            examples=[[OutputFormat.MARKDOWN]],
-        ),
-    ] = [OutputFormat.MARKDOWN]
-
-    image_export_mode: Annotated[
-        ImageRefMode,
-        Field(
-            description=(
-                "Image export mode for the document (in case of JSON,"
-                " Markdown or HTML). "
-                f"Allowed values: {', '.join([v.value for v in ImageRefMode])}. "
-                "Optional, defaults to Embedded."
-            ),
-            examples=[ImageRefMode.EMBEDDED.value],
-            # pattern="embedded|placeholder|referenced",
-        ),
-    ] = ImageRefMode.EMBEDDED
-
-    do_ocr: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, the bitmap content will be processed using OCR. "
-                "Boolean. Optional, defaults to true"
-            ),
-            # examples=[True],
-        ),
-    ] = True
-
-    force_ocr: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, replace existing text with OCR-generated "
-                "text over content. Boolean. Optional, defaults to false."
-            ),
-            # examples=[False],
-        ),
-    ] = False
-
-    # TODO: use a restricted list based on what is installed on the system
-    ocr_engine: Annotated[
-        OcrEngine,
-        Field(
-            description=(
-                "The OCR engine to use. String. "
-                "Allowed values: easyocr, tesseract, rapidocr. "
-                "Optional, defaults to easyocr."
-            ),
-            examples=[OcrEngine.EASYOCR],
-        ),
-    ] = OcrEngine.EASYOCR
-
-    ocr_lang: Annotated[
-        Optional[List[str]],
-        Field(
-            description=(
-                "List of languages used by the OCR engine. "
-                "Note that each OCR engine has "
-                "different values for the language names. String or list of strings. "
-                "Optional, defaults to empty."
-            ),
-            examples=[["fr", "de", "es", "en"]],
-        ),
-    ] = None
-
-    pdf_backend: Annotated[
-        PdfBackend,
-        Field(
-            description=(
-                "The PDF backend to use. String. "
-                f"Allowed values: {', '.join([v.value for v in PdfBackend])}. "
-                f"Optional, defaults to {PdfBackend.DLPARSE_V2.value}."
-            ),
-            examples=[PdfBackend.DLPARSE_V2],
-        ),
-    ] = PdfBackend.DLPARSE_V2
-
-    table_mode: Annotated[
-        TableFormerMode,
-        Field(
-            TableFormerMode.FAST,
-            description=(
-                "Mode to use for table structure, String. "
-                f"Allowed values: {', '.join([v.value for v in TableFormerMode])}. "
-                "Optional, defaults to fast."
-            ),
-            examples=[TableFormerMode.FAST],
-            # pattern="fast|accurate",
-        ),
-    ] = TableFormerMode.FAST
-
-    abort_on_error: Annotated[
-        bool,
-        Field(
-            description=(
-                "Abort on error if enabled. " "Boolean. Optional, defaults to false."
-            ),
-            # examples=[False],
-        ),
-    ] = False
-
-    return_as_file: Annotated[
-        bool,
-        Field(
-            description=(
-                "Return the output as a zip file "
-                "(will happen anyway if multiple files are generated). "
-                "Boolean. Optional, defaults to false."
-            ),
-            examples=[False],
-        ),
-    ] = False
-
-    do_table_structure: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, the table structure will be extracted. "
-                "Boolean. Optional, defaults to true."
-            ),
-            examples=[True],
-        ),
-    ] = True
-
-    include_images: Annotated[
-        bool,
-        Field(
-            description=(
-                "If enabled, images will be extracted from the document. "
-                "Boolean. Optional, defaults to true."
-            ),
-            examples=[True],
-        ),
-    ] = True
-
-    images_scale: Annotated[
-        float,
-        Field(
-            description="Scale factor for images. Float. Optional, defaults to 2.0.",
-            examples=[2.0],
-        ),
-    ] = 2.0
-
-
-class DocumentsConvertBase(BaseModel):
-    options: ConvertDocumentsOptions = ConvertDocumentsOptions()
-
-
-class HttpSource(BaseModel):
-    url: Annotated[
-        str,
-        Field(
-            description="HTTP url to process",
-            examples=["https://arxiv.org/pdf/2206.01062"],
-        ),
-    ]
-    headers: Annotated[
-        Dict[str, Any],
-        Field(
-            description="Additional headers used to fetch the urls, "
-            "e.g. authorization, agent, etc"
-        ),
-    ] = {}
-
-
-class FileSource(BaseModel):
-    base64_string: Annotated[
-        str,
-        Field(
-            description="Content of the file serialized in base64. "
-            "For example it can be obtained via "
-            "`base64 -w 0 /path/to/file/pdf-to-convert.pdf`."
-        ),
-    ]
-    filename: Annotated[
-        str,
-        Field(description="Filename of the uploaded document", examples=["file.pdf"]),
-    ]
-
-    def to_document_stream(self) -> DocumentStream:
-        buf = BytesIO(base64.b64decode(self.base64_string))
-        return DocumentStream(stream=buf, name=self.filename)
-
-
-class ConvertDocumentHttpSourcesRequest(DocumentsConvertBase):
-    http_sources: List[HttpSource]
-
-
-class ConvertDocumentFileSourcesRequest(DocumentsConvertBase):
-    file_sources: List[FileSource]
-
-
-ConvertDocumentsRequest = Union[
-    ConvertDocumentFileSourcesRequest, ConvertDocumentHttpSourcesRequest
-]
-
-
 # Document converters will be preloaded and stored in a dictionary
-converters: Dict[str, DocumentConverter] = {}
+converters: dict[bytes, DocumentConverter] = {}


 # Custom serializer for PdfFormatOption
@@ -276,6 +46,11 @@ def _serialize_pdf_format_option(pdf_format_option: PdfFormatOption) -> str:
    if pdf_format_option.pipeline_options:
        data["pipeline_options"] = pdf_format_option.pipeline_options.model_dump()

+        # Replace `artifacts_path` with a string representation
+        data["pipeline_options"]["artifacts_path"] = repr(
+            data["pipeline_options"]["artifacts_path"]
+        )
+
    # Replace `pipeline_cls` with a string representation
    data["pipeline_cls"] = repr(data["pipeline_cls"])

@@ -293,10 +68,9 @@ def _serialize_pdf_format_option(pdf_format_option: PdfFormatOption) -> str:


 # Computes the PDF pipeline options and returns the PdfFormatOption and its hash
-def get_pdf_pipeline_opts(
+def get_pdf_pipeline_opts(  # noqa: C901
    request: ConvertDocumentsOptions,
-) -> Tuple[PdfFormatOption, str]:
-
+) -> tuple[PdfFormatOption, bytes]:
    if request.ocr_engine == OcrEngine.EASYOCR:
        try:
            import easyocr  # noqa: F401
@@ -356,7 +130,7 @@ def get_pdf_pipeline_opts(
            pipeline_options.images_scale = request.images_scale

    if request.pdf_backend == PdfBackend.DLPARSE_V1:
-        backend: Type[PdfDocumentBackend] = DoclingParseDocumentBackend
+        backend: type[PdfDocumentBackend] = DoclingParseDocumentBackend
    elif request.pdf_backend == PdfBackend.DLPARSE_V2:
        backend = DoclingParseV2DocumentBackend
    elif request.pdf_backend == PdfBackend.PYPDFIUM2:
@@ -364,6 +138,31 @@ def get_pdf_pipeline_opts(
    else:
        raise RuntimeError(f"Unexpected PDF backend type {request.pdf_backend}")

+    if docling_serve_settings.artifacts_path is not None:
+        if str(docling_serve_settings.artifacts_path.absolute()) == "":
+            _log.info(
+                "artifacts_path is an empty path, model weights will be dowloaded "
+                "at runtime."
+            )
+            pipeline_options.artifacts_path = None
+        elif docling_serve_settings.artifacts_path.is_dir():
+            _log.info(
+                "artifacts_path is set to a valid directory. "
+                "No model weights will be downloaded at runtime."
+            )
+            pipeline_options.artifacts_path = docling_serve_settings.artifacts_path
+        else:
+            _log.warning(
+                "artifacts_path is set to an invalid directory. "
+                "The system will download the model weights at runtime."
+            )
+            pipeline_options.artifacts_path = None
+    else:
+        _log.info(
+            "artifacts_path is unset. "
+            "The system will download the model weights at runtime."
+        )
+
    pdf_format_option = PdfFormatOption(
        pipeline_options=pipeline_options,
        backend=backend,
@@ -371,7 +170,7 @@ def get_pdf_pipeline_opts(

    serialized_data = _serialize_pdf_format_option(pdf_format_option)

-    options_hash = hashlib.sha1(serialized_data.encode()).hexdigest()
+    options_hash = hashlib.sha1(serialized_data.encode()).digest()

    return pdf_format_option, options_hash

@@ -379,12 +178,12 @@ def get_pdf_pipeline_opts(
 def convert_documents(
    sources: Iterable[Union[Path, str, DocumentStream]],
    options: ConvertDocumentsOptions,
-    headers: Optional[Dict[str, Any]] = None,
+    headers: Optional[dict[str, Any]] = None,
 ):
    pdf_format_option, options_hash = get_pdf_pipeline_opts(options)

    if options_hash not in converters:
-        format_options: Dict[InputFormat, FormatOption] = {
+        format_options: dict[InputFormat, FormatOption] = {
            InputFormat.PDF: pdf_format_option,
            InputFormat.IMAGE: pdf_format_option,
        }
--- a/docling_serve/engines/init.py
+++ b/docling_serve/engines/init.py
@@ -0,0 +1,8 @@
+from functools import lru_cache
+
+from docling_serve.engines.async_local.orchestrator import AsyncLocalOrchestrator
+
+
+@lru_cache
+def get_orchestrator() -> AsyncLocalOrchestrator:
+    return AsyncLocalOrchestrator()
--- a/docling_serve/engines/async_local/init.py
+++ b/docling_serve/engines/async_local/init.py
--- a/docling_serve/engines/async_local/orchestrator.py
+++ b/docling_serve/engines/async_local/orchestrator.py
@@ -0,0 +1,101 @@
+import asyncio
+import logging
+import uuid
+from typing import Optional
+
+from fastapi import WebSocket
+
+from docling_serve.datamodel.engines import Task, TaskStatus
+from docling_serve.datamodel.requests import ConvertDocumentsRequest
+from docling_serve.datamodel.responses import (
+    MessageKind,
+    TaskStatusResponse,
+    WebsocketMessage,
+)
+from docling_serve.engines.async_local.worker import AsyncLocalWorker
+from docling_serve.engines.base_orchestrator import BaseOrchestrator
+from docling_serve.settings import docling_serve_settings
+
+_log = logging.getLogger(__name__)
+
+
+class OrchestratorError(Exception):
+    pass
+
+
+class TaskNotFoundError(OrchestratorError):
+    pass
+
+
+class AsyncLocalOrchestrator(BaseOrchestrator):
+    def __init__(self):
+        self.task_queue = asyncio.Queue()
+        self.tasks: dict[str, Task] = {}
+        self.queue_list: list[str] = []
+        self.task_subscribers: dict[str, set[WebSocket]] = {}
+
+    async def enqueue(self, request: ConvertDocumentsRequest) -> Task:
+        task_id = str(uuid.uuid4())
+        task = Task(task_id=task_id, request=request)
+        self.tasks[task_id] = task
+        self.queue_list.append(task_id)
+        self.task_subscribers[task_id] = set()
+        await self.task_queue.put(task_id)
+        return task
+
+    async def queue_size(self) -> int:
+        return self.task_queue.qsize()
+
+    async def get_queue_position(self, task_id: str) -> Optional[int]:
+        return (
+            self.queue_list.index(task_id) + 1 if task_id in self.queue_list else None
+        )
+
+    async def task_status(self, task_id: str, wait: float = 0.0) -> Task:
+        if task_id not in self.tasks:
+            raise TaskNotFoundError()
+        return self.tasks[task_id]
+
+    async def task_result(self, task_id: str):
+        if task_id not in self.tasks:
+            raise TaskNotFoundError()
+        return self.tasks[task_id].result
+
+    async def process_queue(self):
+        # Create a pool of workers
+        workers = []
+        for i in range(docling_serve_settings.eng_loc_num_workers):
+            _log.debug(f"Starting worker {i}")
+            w = AsyncLocalWorker(i, self)
+            worker_task = asyncio.create_task(w.loop())
+            workers.append(worker_task)
+
+        # Wait for all workers to complete (they won't, as they run indefinitely)
+        await asyncio.gather(*workers)
+        _log.debug("All workers completed.")
+
+    async def notify_task_subscribers(self, task_id: str):
+        if task_id not in self.task_subscribers:
+            raise RuntimeError(f"Task {task_id} does not have a subscribers list.")
+
+        task = self.tasks[task_id]
+        task_queue_position = await self.get_queue_position(task_id)
+        msg = TaskStatusResponse(
+            task_id=task.task_id,
+            task_status=task.task_status,
+            task_position=task_queue_position,
+        )
+        for websocket in self.task_subscribers[task_id]:
+            await websocket.send_text(
+                WebsocketMessage(message=MessageKind.UPDATE, task=msg).model_dump_json()
+            )
+            if task.is_completed():
+                await websocket.close()
+
+    async def notify_queue_positions(self):
+        for task_id in self.task_subscribers.keys():
+            # notify only pending tasks
+            if self.tasks[task_id].task_status != TaskStatus.PENDING:
+                continue
+
+            await self.notify_task_subscribers(task_id)
--- a/docling_serve/engines/async_local/worker.py
+++ b/docling_serve/engines/async_local/worker.py
@@ -0,0 +1,116 @@
+import asyncio
+import logging
+import time
+from typing import TYPE_CHECKING, Any, Optional, Union
+
+from fastapi import BackgroundTasks
+
+from docling.datamodel.base_models import DocumentStream
+
+from docling_serve.datamodel.engines import TaskStatus
+from docling_serve.datamodel.requests import ConvertDocumentFileSourcesRequest
+from docling_serve.datamodel.responses import ConvertDocumentResponse
+from docling_serve.docling_conversion import convert_documents
+from docling_serve.response_preparation import process_results
+
+if TYPE_CHECKING:
+    from docling_serve.engines.async_local.orchestrator import AsyncLocalOrchestrator
+
+_log = logging.getLogger(__name__)
+
+
+class AsyncLocalWorker:
+    def __init__(self, worker_id: int, orchestrator: "AsyncLocalOrchestrator"):
+        self.worker_id = worker_id
+        self.orchestrator = orchestrator
+
+    async def loop(self):
+        _log.debug(f"Starting loop for worker {self.worker_id}")
+        while True:
+            task_id: str = await self.orchestrator.task_queue.get()
+            self.orchestrator.queue_list.remove(task_id)
+
+            if task_id not in self.orchestrator.tasks:
+                raise RuntimeError(f"Task {task_id} not found.")
+            task = self.orchestrator.tasks[task_id]
+
+            try:
+                task.task_status = TaskStatus.STARTED
+                _log.info(f"Worker {self.worker_id} processing task {task_id}")
+
+                # Notify clients about task updates
+                await self.orchestrator.notify_task_subscribers(task_id)
+
+                # Notify clients about queue updates
+                await self.orchestrator.notify_queue_positions()
+
+                # Get the current event loop
+                asyncio.get_event_loop()
+
+                # Define a callback function to send progress updates to the client.
+                # TODO: send partial updates, e.g. when a document in the batch is done
+                def run_conversion():
+                    sources: list[Union[str, DocumentStream]] = []
+                    headers: Optional[dict[str, Any]] = None
+                    if isinstance(task.request, ConvertDocumentFileSourcesRequest):
+                        for file_source in task.request.file_sources:
+                            sources.append(file_source.to_document_stream())
+                    else:
+                        for http_source in task.request.http_sources:
+                            sources.append(http_source.url)
+                            if headers is None and http_source.headers:
+                                headers = http_source.headers
+
+                    # Note: results are only an iterator->lazy evaluation
+                    results = convert_documents(
+                        sources=sources,
+                        options=task.request.options,
+                        headers=headers,
+                    )
+
+                    # The real processing will happen here
+                    response = process_results(
+                        background_tasks=BackgroundTasks(),
+                        conversion_options=task.request.options,
+                        conv_results=results,
+                    )
+
+                    return response
+
+                # Run the prediction in a thread to avoid blocking the event loop.
+                start_time = time.monotonic()
+                # future = asyncio.run_coroutine_threadsafe(
+                #     run_conversion(),
+                #     loop=loop
+                # )
+                # response = future.result()
+
+                response = await asyncio.to_thread(
+                    run_conversion,
+                )
+                processing_time = time.monotonic() - start_time
+
+                if not isinstance(response, ConvertDocumentResponse):
+                    _log.error(
+                        f"Worker {self.worker_id} got un-processable "
+                        "result for {task_id}: {type(response)}"
+                    )
+                task.result = response
+                task.request = None
+
+                task.task_status = TaskStatus.SUCCESS
+                _log.info(
+                    f"Worker {self.worker_id} completed job {task_id} "
+                    f"in {processing_time:.2f} seconds"
+                )
+
+            except Exception as e:
+                _log.error(
+                    f"Worker {self.worker_id} failed to process job {task_id}: {e}"
+                )
+                task.task_status = TaskStatus.FAILURE
+
+            finally:
+                await self.orchestrator.notify_task_subscribers(task_id)
+                self.orchestrator.task_queue.task_done()
+                _log.debug(f"Worker {self.worker_id} completely done with {task_id}")
--- a/docling_serve/engines/base_orchestrator.py
+++ b/docling_serve/engines/base_orchestrator.py
@@ -0,0 +1,21 @@
+from abc import ABC, abstractmethod
+
+from docling_serve.datamodel.engines import Task
+
+
+class BaseOrchestrator(ABC):
+    @abstractmethod
+    async def enqueue(self, task) -> Task:
+        pass
+
+    @abstractmethod
+    async def queue_size(self) -> int:
+        pass
+
+    @abstractmethod
+    async def task_status(self, task_id: str) -> Task:
+        pass
+
+    @abstractmethod
+    async def task_result(self, task_id: str):
+        pass
--- a/docling_serve/engines/block_local/init.py
+++ b/docling_serve/engines/block_local/init.py
--- a/docling_serve/gradio_ui.py
+++ b/docling_serve/gradio_ui.py
@@ -12,6 +12,13 @@ from docling_serve.helper_functions import _to_list_of_strings

 logger = logging.getLogger(__name__)

+##############################
+# Head JS for web components #
+##############################
+head = """
+    <script src="https://unpkg.com/@docling/docling-components@0.0.3" type="module"></script>
+"""
+
 #################
 # CSS and theme #
 #################
@@ -49,6 +56,14 @@ css = """
 #file_input_zone {
    height: 140px;
 }
+
+docling-img::part(pages) {
+    gap: 1rem;
+}
+
+docling-img::part(page) {
+    box-shadow: 0 0.5rem 1rem 0 rgba(0, 0, 0, 0.2);
+}
 """

 theme = gr.themes.Default(
@@ -110,6 +125,7 @@ def set_download_button_label(label_text: gr.State):
 def clear_outputs():
    markdown_content = ""
    json_content = ""
+    json_rendered_content = ""
    html_content = ""
    text_content = ""
    doctags_content = ""
@@ -118,6 +134,7 @@ def clear_outputs():
        markdown_content,
        markdown_content,
        json_content,
+        json_rendered_content,
        html_content,
        html_content,
        text_content,
@@ -260,6 +277,7 @@ def process_file(
 def response_to_output(response, return_as_file):
    markdown_content = ""
    json_content = ""
+    json_rendered_content = ""
    html_content = ""
    text_content = ""
    doctags_content = ""
@@ -282,6 +300,12 @@ def response_to_output(response, return_as_file):
        json_content = json.dumps(
            full_content.get("document").get("json_content"), indent=2
        )
+        # Embed document JSON and trigger load at client via an image.
+        json_rendered_content = f"""
+            <docling-img id="dclimg" pagenumbers tooltip="parsed"></docling-img>
+            <script id="dcljson" type="application/json" onload="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);">{json_content}</script>
+            <img src onerror="document.getElementById('dclimg').src = JSON.parse(document.getElementById('dcljson').textContent);" />
+            """
        html_content = full_content.get("document").get("html_content")
        text_content = full_content.get("document").get("text_content")
        doctags_content = full_content.get("document").get("doctags_content")
@@ -289,6 +313,7 @@ def response_to_output(response, return_as_file):
        markdown_content,
        markdown_content,
        json_content,
+        json_rendered_content,
        html_content,
        html_content,
        text_content,
@@ -302,12 +327,12 @@ def response_to_output(response, return_as_file):
 ############

 with gr.Blocks(
+    head=head,
    css=css,
    theme=theme,
    title="Docling Serve",
    delete_cache=(3600, 3600),  # Delete all files older than 1 hour every hour
 ) as ui:
-
    # Constants stored in states to be able to pass them as inputs to functions
    processing_text = gr.State("Processing your document(s), please wait...")
    true_bool = gr.State(True)
@@ -464,6 +489,8 @@ with gr.Blocks(
            output_markdown_rendered = gr.Markdown(label="Response")
        with gr.Tab("Docling (JSON)"):
            output_json = gr.Code(language="json", wrap_lines=True, show_label=False)
+        with gr.Tab("Docling-Rendered"):
+            output_json_rendered = gr.HTML()
        with gr.Tab("HTML"):
            output_html = gr.Code(language="html", wrap_lines=True, show_label=False)
        with gr.Tab("HTML-Rendered"):
@@ -514,6 +541,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -538,6 +566,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -553,6 +582,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -562,9 +592,7 @@ with gr.Blocks(
        set_outputs_visibility_direct,
        inputs=[false_bool, false_bool],
        outputs=[content_output, file_output],
-    ).then(
-        clear_url_input, inputs=None, outputs=[url_input]
-    )
+    ).then(clear_url_input, inputs=None, outputs=[url_input])

    # File processing
    file_process_btn.click(
@@ -582,6 +610,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -606,6 +635,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -621,6 +651,7 @@ with gr.Blocks(
            output_markdown,
            output_markdown_rendered,
            output_json,
+            output_json_rendered,
            output_html,
            output_html_rendered,
            output_text,
@@ -630,6 +661,4 @@ with gr.Blocks(
        set_outputs_visibility_direct,
        inputs=[false_bool, false_bool],
        outputs=[content_output, file_output],
-    ).then(
-        clear_file_input, inputs=None, outputs=[file_input]
-    )
+    ).then(clear_file_input, inputs=None, outputs=[file_input])
--- a/docling_serve/helper_functions.py
+++ b/docling_serve/helper_functions.py
@@ -1,6 +1,6 @@
 import inspect
 import re
-from typing import List, Type, Union
+from typing import Union

 from fastapi import Depends, Form
 from pydantic import BaseModel
@@ -8,7 +8,7 @@ from pydantic import BaseModel

 # Adapted from
 # https://github.com/fastapi/fastapi/discussions/8971#discussioncomment-7892972
-def FormDepends(cls: Type[BaseModel]):
+def FormDepends(cls: type[BaseModel]):
    new_parameters = []

    for field_name, model_field in cls.model_fields.items():
@@ -34,8 +34,8 @@ def FormDepends(cls: Type[BaseModel]):
    return Depends(as_form_func)


-def _to_list_of_strings(input_value: Union[str, List[str]]) -> List[str]:
-    def split_and_strip(value: str) -> List[str]:
+def _to_list_of_strings(input_value: Union[str, list[str]]) -> list[str]:
+    def split_and_strip(value: str) -> list[str]:
        if re.search(r"[;,]", value):
            return [item.strip() for item in re.split(r"[;,]", value)]
        else:
--- a/docling_serve/response_preparation.py
+++ b/docling_serve/response_preparation.py
@@ -3,43 +3,23 @@ import os
 import shutil
 import tempfile
 import time
+from collections.abc import Iterable
 from pathlib import Path
-from typing import Dict, Iterable, List, Optional, Union
+from typing import Union

-from docling.datamodel.base_models import OutputFormat
-from docling.datamodel.document import ConversionResult, ConversionStatus, ErrorItem
-from docling.utils.profiling import ProfilingItem
-from docling_core.types.doc import DoclingDocument, ImageRefMode
 from fastapi import BackgroundTasks, HTTPException
 from fastapi.responses import FileResponse
-from pydantic import BaseModel

-from docling_serve.docling_conversion import ConvertDocumentsOptions
+from docling.datamodel.base_models import OutputFormat
+from docling.datamodel.document import ConversionResult, ConversionStatus
+from docling_core.types.doc import ImageRefMode
+
+from docling_serve.datamodel.convert import ConvertDocumentsOptions
+from docling_serve.datamodel.responses import ConvertDocumentResponse, DocumentResponse

 _log = logging.getLogger(__name__)


-class DocumentResponse(BaseModel):
-    filename: str
-    md_content: Optional[str] = None
-    json_content: Optional[DoclingDocument] = None
-    html_content: Optional[str] = None
-    text_content: Optional[str] = None
-    doctags_content: Optional[str] = None
-
-
-class ConvertDocumentResponse(BaseModel):
-    document: DocumentResponse
-    status: ConversionStatus
-    errors: List[ErrorItem] = []
-    processing_time: float
-    timings: Dict[str, ProfilingItem] = {}
-
-
-class ConvertDocumentErrorResponse(BaseModel):
-    status: ConversionStatus
-
-
 def _export_document_as_content(
    conv_res: ConversionResult,
    export_json: bool,
@@ -49,7 +29,6 @@ def _export_document_as_content(
    export_doctags: bool,
    image_mode: ImageRefMode,
 ):
-
    document = DocumentResponse(filename=conv_res.input.file.name)

    if conv_res.status == ConversionStatus.SUCCESS:
@@ -86,7 +65,6 @@ def _export_documents_as_files(
    export_doctags: bool,
    image_export_mode: ImageRefMode,
 ):
-
    success_count = 0
    failure_count = 0

@@ -150,7 +128,6 @@ def process_results(
    conversion_options: ConvertDocumentsOptions,
    conv_results: Iterable[ConversionResult],
 ) -> Union[ConvertDocumentResponse, FileResponse]:
-
    # Let's start by processing the documents
    try:
        start_time = time.monotonic()
--- a/docling_serve/settings.py
+++ b/docling_serve/settings.py
@@ -1,6 +1,38 @@
+from pathlib import Path
+from typing import Optional, Union
+
 from pydantic_settings import BaseSettings, SettingsConfigDict

+from docling_serve.datamodel.engines import AsyncEngine

-class Settings(BaseSettings):

-    model_config = SettingsConfigDict(env_prefix="DOCLING_")
+class UvicornSettings(BaseSettings):
+    model_config = SettingsConfigDict(
+        env_prefix="UVICORN_", env_file=".env", extra="allow"
+    )
+
+    host: str = "0.0.0.0"
+    port: int = 5001
+    reload: bool = False
+    root_path: str = ""
+    proxy_headers: bool = True
+    workers: Union[int, None] = None
+
+
+class DoclingServeSettings(BaseSettings):
+    model_config = SettingsConfigDict(
+        env_prefix="DOCLING_SERVE_",
+        env_file=".env",
+        env_parse_none_str="",
+        extra="allow",
+    )
+
+    enable_ui: bool = False
+    artifacts_path: Optional[Path] = None
+
+    eng_kind: AsyncEngine = AsyncEngine.LOCAL
+    eng_loc_num_workers: int = 2
+
+
+uvicorn_settings = UvicornSettings()
+docling_serve_settings = DoclingServeSettings()
--- a/models_download.py
+++ b/models_download.py
@@ -1,36 +0,0 @@
-import os
-import zipfile
-
-import requests
-from deepsearch_glm.utils.load_pretrained_models import load_pretrained_nlp_models
-from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline
-
-# Download Docling models
-StandardPdfPipeline.download_models_hf(force=True)
-load_pretrained_nlp_models(verbose=True)
-
-# Download EasyOCR models
-urls = [
-    "https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/latin_g2.zip",
-    "https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip"
-]
-
-local_zip_paths = [
-    "/opt/app-root/src/latin_g2.zip",
-    "/opt/app-root/src/craft_mlt_25k.zip"
-]
-
-extract_path = "/opt/app-root/src/.EasyOCR/model/"
-
-for url, local_zip_path in zip(urls, local_zip_paths):
-    # Download the file
-    response = requests.get(url)
-    with open(local_zip_path, "wb") as file:
-        file.write(response.content)
-
-    # Unzip the file
-    with zipfile.ZipFile(local_zip_path, "r") as zip_ref:
-        zip_ref.extractall(extract_path)
-
-    # Clean up the zip file
-    os.remove(local_zip_path)
--- a/os-packages.txt
+++ b/os-packages.txt
@@ -4,5 +4,3 @@ tesseract-langpack-eng
 leptonica-devel
 libglvnd-glx
 glib2
-wget
-git
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,25 +1,25 @@
-[tool.poetry]
+[project]
 name = "docling-serve"
-version = "0.2.0"
+version = "0.5.1"  # DO NOT EDIT, updated automatically
 description = "Running Docling as a service"
-license = "MIT"
+license = {text = "MIT"}
 authors = [
-    "Michele Dolfi <dol@zurich.ibm.com>",
-    "Christoph Auer <cau@zurich.ibm.com>",
-    "Panos Vagenas <pva@zurich.ibm.com>",
-    "Cesar Berrospi Ramis <ceb@zurich.ibm.com>",
-   "Peter Staar <taa@zurich.ibm.com>",
+    {name="Michele Dolfi", email="dol@zurich.ibm.com"},
+    {name="Guillaume Moutier", email="gmoutier@redhat.com"},
+    {name="Anil Vishnoi", email="avishnoi@redhat.com"},
+    {name="Panos Vagenas", email="pva@zurich.ibm.com"},
+    {name="Panos Vagenas", email="pva@zurich.ibm.com"},
+    {name="Christoph Auer", email="cau@zurich.ibm.com"},
+    {name="Peter Staar", email="taa@zurich.ibm.com"},
 ]
 maintainers = [
-    "Peter Staar <taa@zurich.ibm.com>",
-    "Christoph Auer <cau@zurich.ibm.com>",
-    "Michele Dolfi <dol@zurich.ibm.com>",
-    "Cesar Berrospi Ramis <ceb@zurich.ibm.com>",
-    "Panos Vagenas <pva@zurich.ibm.com>",
+    {name="Michele Dolfi", email="dol@zurich.ibm.com"},
+    {name="Anil Vishnoi", email="avishnoi@redhat.com"},
+    {name="Panos Vagenas", email="pva@zurich.ibm.com"},
+    {name="Christoph Auer", email="cau@zurich.ibm.com"},
+    {name="Peter Staar", email="taa@zurich.ibm.com"},
 ]
 readme = "README.md"
-repository = "https://github.com/DS4SD/docling-serve"
-homepage = "https://github.com/DS4SD/docling-serve"
 classifiers = [
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
@@ -28,96 +28,159 @@ classifiers = [
    "Typing :: Typed",
    "Programming Language :: Python :: 3"
 ]
-
-[tool.poetry.dependencies]
-python = ">=3.10,<3.13" # 3.10 needed for Gradio, and no torchvision build for 3.13 yet
-docling = "^2.14.0"
-fastapi = {version = "^0.115.6", extras = ["standard"]}
-gradio = { version = "^5.9.1", optional = true }
-uvicorn = "~0.29.0"
-pydantic = "^2.10.3"
-pydantic-settings = "^2.4.0"
-python-multipart = "^0.0.19"
-httpx = "^0.28.1"
-tesserocr = { version = "^2.7.1", optional = true }
-rapidocr-onnxruntime = { version = "^1.4.0", optional = true, markers = "python_version < '3.13'" }
-onnxruntime = [
-  # 1.19.2 is the last version with python3.9 support,
-  # see https://github.com/microsoft/onnxruntime/releases/tag/v1.20.0
-  { version = ">=1.7.0,<1.20.0", optional = true, markers = "python_version < '3.10'" },
-  { version = "^1.7.0", optional = true, markers = "python_version >= '3.10'" }
+requires-python = ">=3.10"
+dependencies = [
+    "docling~=2.25.1",
+    "fastapi[standard]~=0.115",
+    "httpx~=0.28",
+    "pydantic~=2.10",
+    "pydantic-settings~=2.4",
+    "python-multipart>=0.0.14,<0.1.0",
+    "typer~=0.12",
+    "uvicorn[standard]>=0.29.0,<1.0.0",
+    "websockets~=14.0",
 ]

+[project.optional-dependencies]
+ui = [
+    "gradio~=5.9"
+]
+tesserocr = [
+    "tesserocr~=2.7"
+]
+rapidocr = [
+    "rapidocr-onnxruntime~=1.4; python_version<'3.13'",
+    "onnxruntime~=1.7",
+]
+cpu = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]
+cu124 = [
+  "torch>=2.6.0",
+  "torchvision>=0.21.0",
+]

-[tool.poetry.extras]
-ui = ["gradio"]
-tesserocr = ["tesserocr"]
-rapidocr = ["rapidocr-onnxruntime", "onnxruntime"]
+[dependency-groups]
+dev = [
+    "mypy~=1.11",
+    "pre-commit-uv~=4.1",
+    "pytest~=8.3",
+    "pytest-asyncio~=0.24",
+    "pytest-check~=2.4",
+    "python-semantic-release~=7.32",
+    "ruff>=0.9.6",
+]

+[tool.uv]
+package = true
+conflicts = [
+  [
+    { extra = "cpu" },
+    { extra = "cu124" },
+  ],
+]

-[tool.poetry.group.pypi-torch]
-optional = false
-
-[tool.poetry.group.pypi-torch.dependencies]
+[tool.uv.sources]
 torch = [
-  {version = "!=2.4.1+cpu" },
+  { index = "pytorch-cpu", extra = "cpu" },
+  { index = "pytorch-cu124", extra = "cu124" },
 ]
 torchvision = [
-  {version = "!=0.19.1+cpu" },
+  { index = "pytorch-cpu", extra = "cpu" },
+  { index = "pytorch-cu124", extra = "cu124" },
 ]

-[tool.poetry.group.cpu]
-optional = true
+[[tool.uv.index]]
+name = "pytorch-cpu"
+url = "https://download.pytorch.org/whl/cpu"
+explicit = true

-[tool.poetry.group.cpu.dependencies]
-torch = [
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.10"', url="https://download.pytorch.org/whl/cpu/torch-2.4.1%2Bcpu-cp310-cp310-linux_x86_64.whl"},
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.11"', url="https://download.pytorch.org/whl/cpu/torch-2.4.1%2Bcpu-cp311-cp311-linux_x86_64.whl"},
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.12"', url="https://download.pytorch.org/whl/cpu/torch-2.4.1%2Bcpu-cp312-cp312-linux_x86_64.whl"},
-]
-torchvision = [
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.10"', url="https://download.pytorch.org/whl/cpu/torchvision-0.19.1%2Bcpu-cp310-cp310-linux_x86_64.whl"},
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.11"', url="https://download.pytorch.org/whl/cpu/torchvision-0.19.1%2Bcpu-cp311-cp311-linux_x86_64.whl"},
-    {markers = 'platform_machine=="x86_64" and sys_platform=="linux" and python_version == "3.12"', url="https://download.pytorch.org/whl/cpu/torchvision-0.19.1%2Bcpu-cp312-cp312-linux_x86_64.whl"},
-]
+[[tool.uv.index]]
+name = "pytorch-cu124"
+url = "https://download.pytorch.org/whl/cu124"
+explicit = true

-[tool.poetry.group.constraints.dependencies]
-numpy = [
-    { version = "^2.1.0", markers = 'python_version >= "3.13"' },
-    { version = "^1.24.4", markers = 'python_version < "3.13"' },
-]
+[tool.setuptools.packages.find]
+include = ["docling_serve*"]
+namespaces = true

-[tool.poetry.group.dev.dependencies]
-black = "^24.8.0"
-isort = "^5.13.2"
-pre-commit = "^3.8.0"
-autoflake = "^2.3.1"
-flake8 = "^7.1.1"
-pytest = "^8.3.4"
-pytest-asyncio = "^0.24.0"
-pytest-check = "^2.4.1"
-mypy = "^1.11.2"
+[project.scripts]
+docling-serve = "docling_serve.__main__:main"

-[build-system]
-requires = ["poetry-core"]
-build-backend = "poetry.core.masonry.api"
+[project.urls]
+Homepage = "https://github.com/DS4SD/docling-serve"
+# Documentation = "https://ds4sd.github.io/docling"
+Repository = "https://github.com/DS4SD/docling-serve"
+Issues = "https://github.com/DS4SD/docling-serve/issues"
+Changelog = "https://github.com/DS4SD/docling-serve/blob/main/CHANGELOG.md"

-[tool.black]
+[tool.ruff]
+target-version = "py310"
 line-length = 88
-target-version = ["py310"]
-include = '\.pyi?$'
+respect-gitignore = true

-[tool.isort]
-profile = "black"
-line_length = 88
-py_version=311
+# extend-exclude = [
+#     "tests",
+# ]

-[tool.autoflake]
-in-place = true
-remove-all-unused-imports = true
-remove-unused-variables = true
-expand-star-imports = true
-recursive = true
+[tool.ruff.format]
+skip-magic-trailing-comma = false
+
+[tool.ruff.lint]
+select = [
+    # "B",  # flake8-bugbear
+    "C",  # flake8-comprehensions
+    "C9",  # mccabe
+    # "D",  # flake8-docstrings
+    "E",  # pycodestyle errors (default)
+    "F",  # pyflakes (default)
+    "I",  # isort
+    "PD", # pandas-vet
+    "PIE", # pie
+    # "PTH", # pathlib
+    "Q",  # flake8-quotes
+    # "RET", # return
+    "RUF", # Enable all ruff-specific checks
+    # "SIM", # simplify
+    "S307", # eval
+    # "T20",  # (disallow print statements) keep debugging statements out of the codebase
+    "W",  # pycodestyle warnings
+    "ASYNC", # async
+    "UP", # pyupgrade
+]
+
+ignore = [
+    "E501",  # Line too long, handled by ruff formatter
+    "D107", # "Missing docstring in __init__",
+    "F811", # "redefinition of the same function"
+    "PL", # Pylint
+    "RUF012", # Mutable Class Attributes
+    "UP007", # Option and Union
+]
+
+#extend-select = []
+
+[tool.ruff.lint.per-file-ignores]
+"__init__.py" = ["E402", "F401"]
+"tests/*.py" = ["ASYNC"] # Disable ASYNC check for tests
+
+[tool.ruff.lint.mccabe]
+max-complexity = 15
+
+[tool.ruff.lint.isort.sections]
+"docling" = ["docling", "docling_core"]
+
+[tool.ruff.lint.isort]
+combine-as-imports = true
+section-order = [
+  "future",
+  "standard-library",
+  "third-party",
+  "docling",
+  "first-party",
+  "local-folder",
+]

 [tool.mypy]
 pretty = true
@@ -131,10 +194,6 @@ module = [
    "easyocr.*",
    "tesserocr.*",
    "rapidocr_onnxruntime.*",
-    "docling_conversion.*",
-    "gradio_ui.*",
-    "response_preparation.*",
-    "helper_functions.*",
    "requests.*",
 ]
 ignore_missing_imports = true
@@ -150,3 +209,16 @@ addopts = "-rA --color=yes --tb=short --maxfail=5"
 markers = [
 "asyncio",
 ]
+
+[tool.semantic_release]
+# for default values check:
+# https://github.com/python-semantic-release/python-semantic-release/blob/v7.32.2/semantic_release/defaults.cfg
+
+version_source = "tag_only"
+branch = "main"
+
+# configure types which should trigger minor and patch version bumps respectively
+# (note that they must be a subset of the configured allowed types):
+parser_angular_allowed_types = "build,chore,ci,docs,feat,fix,perf,style,refactor,test"
+parser_angular_minor_types = "feat"
+parser_angular_patch_types = "fix,perf"
--- a/start_server.sh
+++ b/start_server.sh
@@ -1,30 +0,0 @@
-#!/bin/bash
-set -Eeuo pipefail
-
-# Network settings
-export PORT="${PORT:-5001}"
-export HOST="${HOST:-"0.0.0.0"}"
-
-# Performance settings
-UVICORN_WORKERS="${UVICORN_WORKERS:-1}"
-
-# Development settings
-export WITH_UI="${WITH_UI:-"true"}"
-export RELOAD=${RELOAD:-"false"}
-
-# --------------------------------------
-# Process env settings
-
-EXTRA_ARGS=""
-if [ "$RELOAD" == "true" ]; then
-  EXTRA_ARGS="$EXTRA_ARGS --reload"
-fi
-
-# Launch
-exec poetry run uvicorn \
-    docling_serve.app:app \
-    --host=${HOST} \
-    --port=${PORT} \
-    --timeout-keep-alive=600 \
-    ${EXTRA_ARGS} \
-    --workers=${UVICORN_WORKERS}
--- a/tests/test_1-file-all-outputs.py
+++ b/tests/test_1-file-all-outputs.py
@@ -89,7 +89,7 @@ async def test_convert_file(async_client):
        check.is_in(
            '{"schema_name": "DoclingDocument"',
            json.dumps(data["document"]["json_content"]),
-            msg=f"JSON document should contain '{{\\n  \"schema_name\": \"DoclingDocument'\". Received: {safe_slice(data['document']['json_content'])}",
+            msg=f'JSON document should contain \'{{\\n  "schema_name": "DoclingDocument\'". Received: {safe_slice(data["document"]["json_content"])}',
        )
    # HTML check
    check.is_in(
--- a/tests/test_1-url-all-outputs.py
+++ b/tests/test_1-url-all-outputs.py
@@ -83,7 +83,7 @@ async def test_convert_url(async_client):
        check.is_in(
            '{"schema_name": "DoclingDocument"',
            json.dumps(data["document"]["json_content"]),
-            msg=f"JSON document should contain '{{\\n  \"schema_name\": \"DoclingDocument'\". Received: {safe_slice(data['document']['json_content'])}",
+            msg=f'JSON document should contain \'{{\\n  "schema_name": "DoclingDocument\'". Received: {safe_slice(data["document"]["json_content"])}',
        )
    # HTML check
    check.is_in(
--- a/tests/test_1-url-async-ws.py
+++ b/tests/test_1-url-async-ws.py
@@ -0,0 +1,48 @@
+import base64
+from pathlib import Path
+
+import httpx
+import pytest
+import pytest_asyncio
+from websockets.sync.client import connect
+
+
+@pytest_asyncio.fixture
+async def async_client():
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_url(async_client: httpx.AsyncClient):
+    """Test convert URL to all outputs"""
+
+    doc_filename = Path("tests/2408.09869v5.pdf")
+    encoded_doc = base64.b64encode(doc_filename.read_bytes()).decode()
+
+    base_url = "http://localhost:5001/v1alpha"
+    payload = {
+        "options": {
+            "to_formats": ["md", "json"],
+            "image_export_mode": "placeholder",
+            "ocr": True,
+            "abort_on_error": False,
+            "return_as_file": False,
+        },
+        # "http_sources": [{"url": "https://arxiv.org/pdf/2501.17887"}],
+        "file_sources": [{"base64_string": encoded_doc, "filename": doc_filename.name}],
+    }
+    # print(json.dumps(payload, indent=2))
+
+    for n in range(5):
+        response = await async_client.post(
+            f"{base_url}/convert/source/async", json=payload
+        )
+        assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    uri = f"ws://localhost:5001/v1alpha/status/ws/{task['task_id']}"
+    with connect(uri) as websocket:
+        for message in websocket:
+            print(message)
--- a/tests/test_1-url-async.py
+++ b/tests/test_1-url-async.py
@@ -0,0 +1,60 @@
+import json
+import random
+import time
+
+import httpx
+import pytest
+import pytest_asyncio
+
+
+@pytest_asyncio.fixture
+async def async_client():
+    async with httpx.AsyncClient(timeout=60.0) as client:
+        yield client
+
+
+@pytest.mark.asyncio
+async def test_convert_url(async_client):
+    """Test convert URL to all outputs"""
+
+    example_docs = [
+        "https://arxiv.org/pdf/2411.19710",
+        "https://arxiv.org/pdf/2501.17887",
+        "https://www.nature.com/articles/s41467-024-50779-y.pdf",
+        "https://arxiv.org/pdf/2306.12802",
+        "https://arxiv.org/pdf/2311.18481",
+    ]
+
+    base_url = "http://localhost:5001/v1alpha"
+    payload = {
+        "options": {
+            "to_formats": ["md", "json"],
+            "image_export_mode": "placeholder",
+            "ocr": True,
+            "abort_on_error": False,
+            "return_as_file": False,
+        },
+        "http_sources": [{"url": random.choice(example_docs)}],
+    }
+    print(json.dumps(payload, indent=2))
+
+    for n in range(5):
+        response = await async_client.post(
+            f"{base_url}/convert/source/async", json=payload
+        )
+        assert response.status_code == 200, "Response should be 200 OK"
+
+    task = response.json()
+
+    print(json.dumps(task, indent=2))
+
+    while task["task_status"] not in ("success", "failure"):
+        response = await async_client.get(f"{base_url}/status/poll/{task['task_id']}")
+        assert response.status_code == 200, "Response should be 200 OK"
+        task = response.json()
+        print(f"{task['task_status']=}")
+        print(f"{task['task_position']=}")
+
+        time.sleep(2)
+
+    assert task["task_status"] == "success"
--- a/tests/test_2-files-all-outputs.py
+++ b/tests/test_2-files-all-outputs.py
@@ -57,18 +57,18 @@ async def test_convert_file(async_client):
    content_disposition = response.headers.get("content-disposition")

    with check:
-        assert (
-            content_disposition is not None
-        ), "Content-Disposition header should be present"
+        assert content_disposition is not None, (
+            "Content-Disposition header should be present"
+        )
    with check:
        assert "attachment" in content_disposition, "Response should be an attachment"
    with check:
-        assert (
-            'filename="converted_docs.zip"' in content_disposition
-        ), "Attachment filename should be 'converted_docs.zip'"
+        assert 'filename="converted_docs.zip"' in content_disposition, (
+            "Attachment filename should be 'converted_docs.zip'"
+        )

    content_type = response.headers.get("content-type")
    with check:
-        assert (
-            content_type == "application/zip"
-        ), "Content-Type should be 'application/zip'"
+        assert content_type == "application/zip", (
+            "Content-Type should be 'application/zip'"
+        )
--- a/tests/test_2-urls-all-outputs.py
+++ b/tests/test_2-urls-all-outputs.py
@@ -50,18 +50,18 @@ async def test_convert_url(async_client):
    content_disposition = response.headers.get("content-disposition")

    with check:
-        assert (
-            content_disposition is not None
-        ), "Content-Disposition header should be present"
+        assert content_disposition is not None, (
+            "Content-Disposition header should be present"
+        )
    with check:
        assert "attachment" in content_disposition, "Response should be an attachment"
    with check:
-        assert (
-            'filename="converted_docs.zip"' in content_disposition
-        ), "Attachment filename should be 'converted_docs.zip'"
+        assert 'filename="converted_docs.zip"' in content_disposition, (
+            "Attachment filename should be 'converted_docs.zip'"
+        )

    content_type = response.headers.get("content-type")
    with check:
-        assert (
-            content_type == "application/zip"
-        ), "Content-Type should be 'application/zip'"
+        assert content_type == "application/zip", (
+            "Content-Type should be 'application/zip'"
+        )
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
github-actions[bot]	b92c5d8899	chore: bump version to 0.5.1 [skip ci]	2025-03-10 17:31:51 +00:00
Eugene	3c9825df30	ci: Speed up python linting (#64 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-03-10 18:05:33 +01:00
Michele Dolfi	8dd0e216fd	chore: extend timeout for downloading the model artifacts (#90 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-10 16:58:10 +01:00
Michele Dolfi	d406802f9d	chore: update uv.lock with new release version (#89 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-10 16:57:48 +01:00
Michele Dolfi	a92ad48b28	fix: submodules in wheels (#85 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-10 16:19:34 +01:00
Eugene	da2b26099d	chore: Remove unused OS deps (#80 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-03-10 08:53:25 +01:00
github-actions[bot]	98b46eda50	chore: bump version to 0.5.0 [skip ci]	2025-03-07 17:24:16 +00:00
Michele Dolfi	7e75919ae8	chore: Remove deprecated type aliases and run as pre-commit (#79 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-07 15:46:52 +01:00
Eugene	c95db36438	fix: Remove uv from image, merge ARG and ENV declarations (#57 ) Signed-off-by: Eugene <fogaprod@gmail.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-07 15:33:21 +01:00
Michele Dolfi	82f8900197	feat: Async api (#60 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-07 11:26:50 +01:00
Eugene	ed851c95fe	feat: display version in fastapi docs (#78 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-03-07 09:28:05 +01:00
Steffen Röcker	05df0735d3	fix(docs): Remove comma in convert/source curl example (#73 ) Signed-off-by: Steffen Röcker <sroecker@redhat.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-03-06 08:12:09 +01:00
github-actions[bot]	cad1053e36	chore: bump version to 0.4.0 [skip ci]	2025-02-26 13:05:03 +00:00
Michele Dolfi	7e6d9cdef3	feat: New container images (#68 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-26 12:49:20 +01:00
Brent Salisbury	343b985287	Readme additions for running Readme additions for a quickstart running of docling-serve Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>	2025-02-25 14:49:50 -08:00
Kasper Dinkla	c430d9b1a1	feat: Render DoclingDocument with npm docling-components in the example UI (#65 ) Signed-off-by: DKL <dkl@zurich.ibm.com>	2025-02-25 11:27:42 +01:00
Anil Vishnoi	63141f1cc7	ci: Use release event to trigger the image publishing job for releases (#63 ) Signed-off-by: Anil Vishnoi <vishnoianil@gmail.com>	2025-02-24 08:21:17 +01:00
Eugene	d5557fad9f	refactor: Use bytes as options key (#58 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-02-21 18:03:27 +01:00
İlker SIĞIRCI	36967f7f61	chore(config): replace black,isort,flake and autoflake with ruff (#55 ) Signed-off-by: ilker.sigirci <ilker.sigirci@data-boss.com.tr> Signed-off-by: ilkersigirci <sigirci.ilker@mgail.com> Co-authored-by: ilker.sigirci <ilker.sigirci@data-boss.com.tr>	2025-02-20 13:29:41 +01:00
github-actions[bot]	3b54d9b6ef	chore: bump version to 0.3.0 [skip ci]	2025-02-19 21:22:27 +00:00
Michele Dolfi	4877248368	fix: set DOCLING_SERVE_ARTIFACTS_PATH in images (#53 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-19 22:03:56 +01:00
Michele Dolfi	ec33a61faa	feat: Add new docling-serve cli (#50 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-19 20:54:13 +01:00
Eugene	663e03303a	chore: use uv in start_server.sh and update docs (#49 ) Signed-off-by: Eugene <fogaprod@gmail.com>	2025-02-19 19:25:00 +01:00
Guillaume Moutier	c64a450bf9	fix: Set root UI path when behind proxy (#38 ) Signed-off-by: Guillaume Moutier <3944034+guimou@users.noreply.github.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Guillaume Moutier <3944034+guimou@users.noreply.github.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-19 10:32:43 +01:00
Michele Dolfi	ae3b4906f1	fix: support python 3.13 and docling updates and switch to uv (#48 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-19 09:53:07 +01:00
Michele Dolfi	7a351fcdea	fix missing secrets inherit Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-13 17:02:01 +00:00
Michele Dolfi	1615f977a2	ci: add semantic release and build/publish python wheel (#41 ) Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>	2025-02-13 16:49:43 +01:00
Guillaume Moutier	1bf487b18e	Fix main when workers > 1 (#35 ) Always load the app by using an import string Signed-off-by: Guillaume Moutier <3944034+guimou@users.noreply.github.com> Co-authored-by: Guillaume Moutier <3944034+guimou@users.noreply.github.com>	2025-02-12 09:54:49 +01:00