mirror of
https://github.com/docling-project/docling-serve.git
synced 2025-11-29 08:33:50 +00:00
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com> Signed-off-by: Michele Dolfi <dol@zurich.ibm.com> Co-authored-by: Michele Dolfi <dol@zurich.ibm.com>
331 lines
7.9 KiB
Markdown
331 lines
7.9 KiB
Markdown
# Deployment Examples
|
|
|
|
This document provides deployment examples for running the application in different environments.
|
|
|
|
Choose the deployment option that best fits your setup.
|
|
|
|
- **[Local GPU NVIDIA](#local-gpu-nvidia)**: For deploying the application locally on a machine with a supported NVIDIA GPU (using Docker Compose).
|
|
- **[Local GPU AMD](#local-gpu-amd)**: For deploying the application locally on a machine with a supported AMD GPU (using Docker Compose).
|
|
- **[OpenShift](#openshift)**: For deploying the application on an OpenShift cluster, designed for cloud-native environments.
|
|
|
|
---
|
|
|
|
## Local GPU NVIDIA
|
|
|
|
### Docker compose
|
|
|
|
Manifest example: [compose-nvidia.yaml](./deploy-examples/compose-nvidia.yaml)
|
|
|
|
This deployment has the following features:
|
|
|
|
- NVIDIA cuda enabled
|
|
|
|
Install the app with:
|
|
|
|
```sh
|
|
docker compose -f docs/deploy-examples/compose-nvidia.yaml up -d
|
|
```
|
|
|
|
For using the API:
|
|
|
|
```sh
|
|
# Make a test query
|
|
curl -X 'POST' \
|
|
"localhost:5001/v1/convert/source/async" \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
|
|
}'
|
|
```
|
|
|
|
<details>
|
|
<summary><b>Requirements</b></summary>
|
|
|
|
- debian/ubuntu/rhel/fedora/opensuse
|
|
- docker
|
|
- nvidia drivers >=550.54.14
|
|
- nvidia-container-toolkit
|
|
|
|
Docs:
|
|
|
|
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/supported-platforms.html)
|
|
- [CUDA Toolkit Release Notes](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id6)
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>Steps</b></summary>
|
|
|
|
1. Check driver version and which GPU you want to use 0/1/2/n (and update [compose-nvidia.yaml](./deploy-examples/compose-nvidia.yaml) file or use `count: all`)
|
|
|
|
```sh
|
|
nvidia-smi
|
|
```
|
|
|
|
2. Check if the NVIDIA Container Toolkit is installed/updated
|
|
|
|
```sh
|
|
# debian
|
|
dpkg -l | grep nvidia-container-toolkit
|
|
```
|
|
|
|
```sh
|
|
# rhel
|
|
rpm -q nvidia-container-toolkit
|
|
```
|
|
|
|
NVIDIA Container Toolkit install steps can be found here:
|
|
|
|
<https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>
|
|
|
|
3. Check which runtime is being used by Docker
|
|
|
|
```sh
|
|
# docker
|
|
docker info | grep -i runtime
|
|
```
|
|
|
|
4. If the default Docker runtime changes back from 'nvidia' to 'default' after restarting the Docker service (optional):
|
|
|
|
Backup the daemon.json file:
|
|
|
|
```sh
|
|
sudo cp /etc/docker/daemon.json /etc/docker/daemon.json.bak
|
|
```
|
|
|
|
Update the daemon.json file:
|
|
|
|
```sh
|
|
echo '{
|
|
"runtimes": {
|
|
"nvidia": {
|
|
"path": "nvidia-container-runtime"
|
|
}
|
|
},
|
|
"default-runtime": "nvidia"
|
|
}' | sudo tee /etc/docker/daemon.json > /dev/null
|
|
```
|
|
|
|
Restart the Docker service:
|
|
|
|
```sh
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
Confirm 'nvidia' is the default runtime used by Docker by repeating step 3.
|
|
|
|
5. Run the container:
|
|
|
|
```sh
|
|
docker compose -f docs/deploy-examples/compose-nvidia.yaml up -d
|
|
```
|
|
|
|
</details>
|
|
|
|
## Local GPU AMD
|
|
|
|
### Docker compose
|
|
|
|
Manifest example: [compose-amd.yaml](./deploy-examples/compose-amd.yaml)
|
|
|
|
This deployment has the following features:
|
|
|
|
- AMD rocm enabled
|
|
|
|
Install the app with:
|
|
|
|
```sh
|
|
docker compose -f docs/deploy-examples/compose-amd.yaml up -d
|
|
```
|
|
|
|
For using the API:
|
|
|
|
```sh
|
|
# Make a test query
|
|
curl -X 'POST' \
|
|
"localhost:5001/v1/convert/source/async" \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
|
|
}'
|
|
```
|
|
|
|
<details>
|
|
<summary><b>Requirements</b></summary>
|
|
|
|
- debian/ubuntu/rhel/fedora/opensuse
|
|
- docker
|
|
- AMDGPU driver >=6.3
|
|
- AMD ROCm >=6.3
|
|
|
|
Docs:
|
|
|
|
- [AMD ROCm installation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html)
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary><b>Steps</b></summary>
|
|
|
|
1. Check driver version and which GPU you want to use 0/1/2/n (and update [compose-amd.yaml](./deploy-examples/compose-amd.yaml) file)
|
|
|
|
```sh
|
|
rocm-smi --showdriverversion
|
|
rocminfo | grep -i "ROCm version"
|
|
```
|
|
|
|
2. Find both video group GID and render group GID from host (and update [compose-amd.yaml](./deploy-examples/compose-amd.yaml) file)
|
|
|
|
```sh
|
|
getent group video
|
|
getent group render
|
|
```
|
|
|
|
3. Build the image locally (and update [compose-amd.yaml](./deploy-examples/compose-amd.yaml) file)
|
|
|
|
```sh
|
|
make docling-serve-rocm-image
|
|
```
|
|
|
|
</details>
|
|
|
|
## OpenShift
|
|
|
|
### Simple deployment
|
|
|
|
Manifest example: [docling-serve-simple.yaml](./deploy-examples/docling-serve-simple.yaml)
|
|
|
|
This deployment example has the following features:
|
|
|
|
- Deployment configuration
|
|
- Service configuration
|
|
- NVIDIA cuda enabled
|
|
|
|
Install the app with:
|
|
|
|
```sh
|
|
oc apply -f docs/deploy-examples/docling-serve-simple.yaml
|
|
```
|
|
|
|
For using the API:
|
|
|
|
```sh
|
|
# Port-forward the service
|
|
oc port-forward svc/docling-serve 5001:5001
|
|
|
|
# Make a test query
|
|
curl -X 'POST' \
|
|
"localhost:5001/v1/convert/source/async" \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
|
|
}'
|
|
```
|
|
|
|
### Multiple workers with RQ
|
|
|
|
Manifest example: [`docling-serve-rq-workers.yaml`](./deploy-examples/docling-serve-rq-workers.yaml)
|
|
|
|
This deployment example has the following features:
|
|
|
|
- Deployment configuration
|
|
- Service configuration
|
|
- Redis deployment
|
|
- Multiple (2 by default) worker Pods
|
|
|
|
Install the app with:
|
|
|
|
- create k8s secret:
|
|
|
|
```sh
|
|
kubectl create secret generic docling-serve-rq-secrets --from-literal=REDIS_PASSWORD=myredispassword --from-literal=RQ_REDIS_URL=redis://:myredispassword@docling-serve-redis-service:6373/
|
|
```
|
|
|
|
- apply deployment manifest:
|
|
|
|
```sh
|
|
oc apply -f docs/deploy-examples/docling-serve-rq-workers.yaml
|
|
```
|
|
|
|
### Secure deployment with `oauth-proxy`
|
|
|
|
Manifest example: [docling-serve-oauth.yaml](./deploy-examples/docling-serve-oauth.yaml)
|
|
|
|
This deployment has the following features:
|
|
|
|
- TLS encryption between all components (using the cluster-internal CA authority).
|
|
- Authentication via a secure `oauth-proxy` sidecar.
|
|
- Expose the service using a secure OpenShift `Route`
|
|
|
|
Install the app with:
|
|
|
|
```sh
|
|
oc apply -f docs/deploy-examples/docling-serve-oauth.yaml
|
|
```
|
|
|
|
For using the API:
|
|
|
|
```sh
|
|
# Retrieve the endpoint
|
|
DOCLING_NAME=docling-serve
|
|
DOCLING_ROUTE="https://$(oc get routes ${DOCLING_NAME} --template={{.spec.host}})"
|
|
|
|
# Retrieve the authentication token
|
|
OCP_AUTH_TOKEN=$(oc whoami --show-token)
|
|
|
|
# Make a test query
|
|
curl -X 'POST' \
|
|
"${DOCLING_ROUTE}/v1/convert/source/async" \
|
|
-H "Authorization: Bearer ${OCP_AUTH_TOKEN}" \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
|
|
}'
|
|
```
|
|
|
|
### ReplicaSets with `sticky sessions`
|
|
|
|
Manifest example: [docling-serve-replicas-w-sticky-sessions.yaml](./deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml)
|
|
|
|
This deployment has the following features:
|
|
|
|
- Deployment configuration with 3 replicas
|
|
- Service configuration
|
|
- Expose the service using a OpenShift `Route` and enables sticky sessions
|
|
|
|
Install the app with:
|
|
|
|
```sh
|
|
oc apply -f docs/deploy-examples/docling-serve-replicas-w-sticky-sessions.yaml
|
|
```
|
|
|
|
For using the API:
|
|
|
|
```sh
|
|
# Retrieve the endpoint
|
|
DOCLING_NAME=docling-serve
|
|
DOCLING_ROUTE="https://$(oc get routes $DOCLING_NAME --template={{.spec.host}})"
|
|
|
|
# Make a test query, store the cookie and taskid
|
|
task_id=$(curl -s -X 'POST' \
|
|
"${DOCLING_ROUTE}/v1/convert/source/async" \
|
|
-H "accept: application/json" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2501.17887"}]
|
|
}' \
|
|
-c cookies.txt | grep -oP '"task_id":"\K[^"]+')
|
|
```
|
|
|
|
```sh
|
|
# Grab the taskid and cookie to check the task status
|
|
curl -v -X 'GET' \
|
|
"${DOCLING_ROUTE}/v1/status/poll/$task_id?wait=0" \
|
|
-H "accept: application/json" \
|
|
-b "cookies.txt"
|
|
```
|