mirror of https://github.com/docling-project/docling-serve.git synced 2025-11-29 08:33:50 +00:00

Files

VIktor Kuropiantnyk 3f090b7d15 docs: Example and instructions on how to load model weights to persistent volume (#197 )

Signed-off-by: Viktor Kuropiatnyk <vku@zurich.ibm.com>

2025-05-21 13:04:46 +02:00

3.3 KiB

Raw Blame History

Pre-loading models for docling

This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.

We need to create a persistent volume that will store models weights:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: docling-model-cache-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi

If you don't want to use default storage class, set your custom storage class with following:

spec:
  ...
  storageClassName: <Storage Class Name>

Manifest example: docling-model-cache-pvc.yaml

In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:

apiVersion: batch/v1
kind: Job
metadata:
  name: docling-model-cache-load
spec:
  selector: {}
  template:
    metadata:
      name: docling-model-load
    spec:
      containers:
        - name: loader
          image: ghcr.io/docling-project/docling-serve-cpu:main
          command:
            - docling-tools
            - models
            - download
            - '--output-dir=/modelcache'
            - 'layout'
            - 'tableformer'
            - 'code_formula'
            - 'picture_classifier'
            - 'smolvlm'
            - 'granite_vision'
            - 'easyocr'
          volumeMounts:
            - name: docling-model-cache
              mountPath: /modelcache
      volumes:
        - name: docling-model-cache
          persistentVolumeClaim:
            claimName: docling-model-cache-pvc
      restartPolicy: Never

The job will mount previously created persistent volume and execute command similar to how we would load models locally: docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]

In manifest, we specify desired models individually, or we can use --all parameter to download all models.

Manifest example: docling-model-cache-job.yaml

Now we can mount volume in the docling-serve deployment and set env DOCLING_SERVE_ARTIFACTS_PATH to point to it. Following additions to deploymeny should be made:

spec:
  template:
    spec:
      containers:
        - name: api
          env:
          ...
            - name: DOCLING_SERVE_ARTIFACTS_PATH
              value: '/modelcache'
          volumeMounts:
            - name: docling-model-cache
              mountPath: /modelcache
      ...
      volumes:
        - name: docling-model-cache
          persistentVolumeClaim:
            claimName: docling-model-cache-pvc

Make sure that value of DOCLING_SERVE_ARTIFACTS_PATH is the same as where models were downloaded and where volume is mounted.

Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.

Manifest example: docling-model-cache-deployment.yaml

3.3 KiB Raw Blame History

Pre-loading models for docling

3.3 KiB

Raw Blame History