mirror of
https://github.com/docling-project/docling-serve.git
synced 2025-11-29 08:33:50 +00:00
docs: Example and instructions on how to load model weights to persistent volume (#197)
Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>
This commit is contained in:
committed by
GitHub
parent
21c1791e42
commit
3f090b7d15
@@ -70,7 +70,7 @@ An easy to use UI is available at the `/ui` endpoint.
|
|||||||
|
|
||||||
## Documentation and advance usages
|
## Documentation and advance usages
|
||||||
|
|
||||||
Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md).
|
Visit the [Docling Serve documentation](./docs/README.md) for learning how to [configure the webserver](./docs/configuration.md), use all the [runtime options](./docs/usage.md) of the API and [deployment examples](./docs/deployment.md), pre-load model weights into a persistent volume [model weights on persistent volume](./docs/pre-loading-models.md)
|
||||||
|
|
||||||
## Get help and support
|
## Get help and support
|
||||||
|
|
||||||
|
|||||||
47
docs/deploy-examples/docling-model-cache-deployment.yaml
Normal file
47
docs/deploy-examples/docling-model-cache-deployment.yaml
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
kind: Deployment
|
||||||
|
apiVersion: apps/v1
|
||||||
|
metadata:
|
||||||
|
name: docling-serve
|
||||||
|
labels:
|
||||||
|
app: docling-serve
|
||||||
|
component: docling-serve-api
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: docling-serve
|
||||||
|
component: docling-serve-api
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: docling-serve
|
||||||
|
component: docling-serve-api
|
||||||
|
spec:
|
||||||
|
restartPolicy: Always
|
||||||
|
containers:
|
||||||
|
- name: api
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 2Gi
|
||||||
|
requests:
|
||||||
|
cpu: 250m
|
||||||
|
memory: 1Gi
|
||||||
|
env:
|
||||||
|
- name: DOCLING_SERVE_ENABLE_UI
|
||||||
|
value: 'true'
|
||||||
|
- name: DOCLING_SERVE_ARTIFACTS_PATH
|
||||||
|
value: '/modelcache'
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
containerPort: 5001
|
||||||
|
protocol: TCP
|
||||||
|
imagePullPolicy: Always
|
||||||
|
image: 'ghcr.io/docling-project/docling-serve-cpu'
|
||||||
|
volumeMounts:
|
||||||
|
- name: docling-model-cache
|
||||||
|
mountPath: /modelcache
|
||||||
|
volumes:
|
||||||
|
- name: docling-model-cache
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: docling-model-cache-pvc
|
||||||
33
docs/deploy-examples/docling-model-cache-job.yaml
Normal file
33
docs/deploy-examples/docling-model-cache-job.yaml
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: docling-model-cache-load
|
||||||
|
spec:
|
||||||
|
selector: {}
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
name: docling-model-load
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: loader
|
||||||
|
image: ghcr.io/docling-project/docling-serve-cpu:main
|
||||||
|
command:
|
||||||
|
- docling-tools
|
||||||
|
- models
|
||||||
|
- download
|
||||||
|
- '--output-dir=/modelcache'
|
||||||
|
- 'layout'
|
||||||
|
- 'tableformer'
|
||||||
|
- 'code_formula'
|
||||||
|
- 'picture_classifier'
|
||||||
|
- 'smolvlm'
|
||||||
|
- 'granite_vision'
|
||||||
|
- 'easyocr'
|
||||||
|
volumeMounts:
|
||||||
|
- name: docling-model-cache
|
||||||
|
mountPath: /modelcache
|
||||||
|
volumes:
|
||||||
|
- name: docling-model-cache
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: docling-model-cache-pvc
|
||||||
|
restartPolicy: Never
|
||||||
11
docs/deploy-examples/docling-model-cache-pvc.yaml
Normal file
11
docs/deploy-examples/docling-model-cache-pvc.yaml
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: docling-model-cache-pvc
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
volumeMode: Filesystem
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Gi
|
||||||
103
docs/pre-loading-models.md
Normal file
103
docs/pre-loading-models.md
Normal file
@@ -0,0 +1,103 @@
|
|||||||
|
# Pre-loading models for docling
|
||||||
|
|
||||||
|
This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.
|
||||||
|
|
||||||
|
1. We need to create a persistent volume that will store models weights:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: docling-model-cache-pvc
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
volumeMode: Filesystem
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Gi
|
||||||
|
```
|
||||||
|
|
||||||
|
If you don't want to use default storage class, set your custom storage class with following:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
spec:
|
||||||
|
...
|
||||||
|
storageClassName: <Storage Class Name>
|
||||||
|
```
|
||||||
|
|
||||||
|
Manifest example: [docling-model-cache-pvc.yaml](./deploy-examples/docling-model-cache-pvc.yaml)
|
||||||
|
|
||||||
|
2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: docling-model-cache-load
|
||||||
|
spec:
|
||||||
|
selector: {}
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
name: docling-model-load
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: loader
|
||||||
|
image: ghcr.io/docling-project/docling-serve-cpu:main
|
||||||
|
command:
|
||||||
|
- docling-tools
|
||||||
|
- models
|
||||||
|
- download
|
||||||
|
- '--output-dir=/modelcache'
|
||||||
|
- 'layout'
|
||||||
|
- 'tableformer'
|
||||||
|
- 'code_formula'
|
||||||
|
- 'picture_classifier'
|
||||||
|
- 'smolvlm'
|
||||||
|
- 'granite_vision'
|
||||||
|
- 'easyocr'
|
||||||
|
volumeMounts:
|
||||||
|
- name: docling-model-cache
|
||||||
|
mountPath: /modelcache
|
||||||
|
volumes:
|
||||||
|
- name: docling-model-cache
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: docling-model-cache-pvc
|
||||||
|
restartPolicy: Never
|
||||||
|
```
|
||||||
|
|
||||||
|
The job will mount previously created persistent volume and execute command similar to how we would load models locally:
|
||||||
|
`docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]`
|
||||||
|
|
||||||
|
In manifest, we specify desired models individually, or we can use `--all` parameter to download all models.
|
||||||
|
|
||||||
|
Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
|
||||||
|
|
||||||
|
3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
|
||||||
|
Following additions to deploymeny should be made:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: api
|
||||||
|
env:
|
||||||
|
...
|
||||||
|
- name: DOCLING_SERVE_ARTIFACTS_PATH
|
||||||
|
value: '/modelcache'
|
||||||
|
volumeMounts:
|
||||||
|
- name: docling-model-cache
|
||||||
|
mountPath: /modelcache
|
||||||
|
...
|
||||||
|
volumes:
|
||||||
|
- name: docling-model-cache
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: docling-model-cache-pvc
|
||||||
|
```
|
||||||
|
|
||||||
|
Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
|
||||||
|
|
||||||
|
Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.
|
||||||
|
|
||||||
|
Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)
|
||||||
Reference in New Issue
Block a user