mirror of https://github.com/docling-project/docling-serve.git synced 2025-11-29 08:33:50 +00:00

Files

VIktor Kuropiantnyk 71edf41849 docs: example of docling-serve deployment in the RQ engine mode (#321 )

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>

2025-08-14 16:10:39 +02:00

7.9 KiB

Raw Blame History

Deployment Examples

This document provides deployment examples for running the application in different environments.

Choose the deployment option that best fits your setup.

Local GPU NVIDIA: For deploying the application locally on a machine with a supported NVIDIA GPU (using Docker Compose).
Local GPU AMD: For deploying the application locally on a machine with a supported AMD GPU (using Docker Compose).
OpenShift: For deploying the application on an OpenShift cluster, designed for cloud-native environments.

Local GPU NVIDIA

Docker compose

Manifest example: compose-nvidia.yaml

This deployment has the following features:

NVIDIA cuda enabled

Install the app with:

docker compose -f docs/deploy-examples/compose-nvidia.yaml up -d

For using the API:

# Make a test query
curl -X 'POST' \
  "localhost:5001/v1/convert/source/async" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

Requirements

debian/ubuntu/rhel/fedora/opensuse
docker
nvidia drivers >=550.54.14
nvidia-container-toolkit

Docs:

Steps

Check driver version and which GPU you want to use 0/1/2/n (and update compose-nvidia.yaml file or use count: all)
```
nvidia-smi
```
Check if the NVIDIA Container Toolkit is installed/updated
```
# debian
dpkg -l | grep nvidia-container-toolkit
```
```
# rhel
rpm -q nvidia-container-toolkit
```
NVIDIA Container Toolkit install steps can be found here:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Check which runtime is being used by Docker
```
# docker
docker info | grep -i runtime
```

If the default Docker runtime changes back from 'nvidia' to 'default' after restarting the Docker service (optional):

Backup the daemon.json file:

sudo cp /etc/docker/daemon.json /etc/docker/daemon.json.bak

Update the daemon.json file:

echo '{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime"
    }
  },
  "default-runtime": "nvidia"
}' | sudo tee /etc/docker/daemon.json > /dev/null

Restart the Docker service:

sudo systemctl restart docker

Confirm 'nvidia' is the default runtime used by Docker by repeating step 3.

Run the container:

docker compose -f docs/deploy-examples/compose-nvidia.yaml up -d

Local GPU AMD

Docker compose

Manifest example: compose-amd.yaml

This deployment has the following features:

AMD rocm enabled

Install the app with:

docker compose -f docs/deploy-examples/compose-amd.yaml up -d

For using the API:

# Make a test query
curl -X 'POST' \
  "localhost:5001/v1/convert/source/async" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

Requirements

debian/ubuntu/rhel/fedora/opensuse
docker
AMDGPU driver >=6.3
AMD ROCm >=6.3

Docs:

AMD ROCm installation

Steps

Check driver version and which GPU you want to use 0/1/2/n (and update compose-amd.yaml file)
```
rocm-smi --showdriverversion
rocminfo | grep -i "ROCm version"
```
Find both video group GID and render group GID from host (and update compose-amd.yaml file)
```
getent group video
getent group render
```
Build the image locally (and update compose-amd.yaml file)
```
make docling-serve-rocm-image
```

OpenShift

Simple deployment

Manifest example: docling-serve-simple.yaml

This deployment example has the following features:

Deployment configuration
Service configuration
NVIDIA cuda enabled

Install the app with:

oc apply -f docs/deploy-examples/docling-serve-simple.yaml

For using the API:

# Port-forward the service
oc port-forward svc/docling-serve 5001:5001

# Make a test query
curl -X 'POST' \
  "localhost:5001/v1/convert/source/async" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

Multiple workers with RQ

Manifest example: docling-serve-rq-workers.yaml

This deployment example has the following features:

Deployment configuration
Service configuration
Redis deployment
Multiple (2 by default) worker Pods

Install the app with:

create k8s secret:

kubectl create secret generic docling-serve-rq-secrets --from-literal=REDIS_PASSWORD=myredispassword --from-literal=RQ_REDIS_URL=redis://:myredispassword@docling-serve-redis-service:6373/

apply deployment manifest:

oc apply -f docs/deploy-examples/docling-serve-rq-workers.yaml

Secure deployment with `oauth-proxy`

Manifest example: docling-serve-oauth.yaml

This deployment has the following features:

TLS encryption between all components (using the cluster-internal CA authority).
Authentication via a secure oauth-proxy sidecar.
Expose the service using a secure OpenShift Route

Install the app with:

oc apply -f docs/deploy-examples/docling-serve-oauth.yaml

For using the API:

# Retrieve the endpoint
DOCLING_NAME=docling-serve
DOCLING_ROUTE="https://$(oc get routes ${DOCLING_NAME} --template={{.spec.host}})"

# Retrieve the authentication token
OCP_AUTH_TOKEN=$(oc whoami --show-token)

# Make a test query
curl -X 'POST' \
  "${DOCLING_ROUTE}/v1/convert/source/async" \
  -H "Authorization: Bearer ${OCP_AUTH_TOKEN}" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
  }'

ReplicaSets with `sticky sessions`

Manifest example: docling-serve-replicas-w-sticky-sessions.yaml

This deployment has the following features:

Deployment configuration with 3 replicas
Service configuration
Expose the service using a OpenShift Route and enables sticky sessions

Install the app with:

oc apply -f docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml

For using the API:

# Retrieve the endpoint
DOCLING_NAME=docling-serve
DOCLING_ROUTE="https://$(oc get routes $DOCLING_NAME --template={{.spec.host}})"

# Make a test query, store the cookie and taskid
task_id=$(curl -s -X 'POST' \
    "${DOCLING_ROUTE}/v1/convert/source/async" \
    -H "accept: application/json" \
    -H "Content-Type: application/json" \
    -d '{
      "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
    }' \
    -c cookies.txt | grep -oP '"task_id":"\K[^"]+')

# Grab the taskid and cookie to check the task status
curl -v -X 'GET' \
  "${DOCLING_ROUTE}/v1/status/poll/$task_id?wait=0" \
  -H "accept: application/json" \
  -b "cookies.txt"

7.9 KiB Raw Blame History

Deployment Examples

Local GPU NVIDIA

Docker compose

Local GPU AMD

Docker compose

OpenShift

Simple deployment

Multiple workers with RQ

Secure deployment with oauth-proxy

ReplicaSets with sticky sessions

7.9 KiB

Raw Blame History

Secure deployment with `oauth-proxy`

ReplicaSets with `sticky sessions`