Files
docling-serve/docs/pre-loading-models.md
2025-07-15 14:17:35 +02:00

3.3 KiB

Pre-loading models for docling

This document provides examples for pre-loading docling models to a persistent volume and re-using it for docling-serve deployments.

  1. We need to create a persistent volume that will store models weights:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: docling-model-cache-pvc
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 10Gi
    

    If you don't want to use default storage class, set your custom storage class with following:

    spec:
      ...
      storageClassName: <Storage Class Name>
    

    Manifest example: docling-model-cache-pvc.yaml

  2. In order to load model weights, we can use docling-toolkit to download them, as this is a one time operation we can use kubernetes job for this:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: docling-model-cache-load
    spec:
      selector: {}
      template:
        metadata:
          name: docling-model-load
        spec:
          containers:
            - name: loader
              image: ghcr.io/docling-project/docling-serve-cpu:main
              command:
                - docling-tools
                - models
                - download
                - '--output-dir=/modelcache'
                - 'layout'
                - 'tableformer'
                - 'code_formula'
                - 'picture_classifier'
                - 'smolvlm'
                - 'granite_vision'
                - 'easyocr'
              volumeMounts:
                - name: docling-model-cache
                  mountPath: /modelcache
          volumes:
            - name: docling-model-cache
              persistentVolumeClaim:
                claimName: docling-model-cache-pvc
          restartPolicy: Never
    

    The job will mount previously created persistent volume and execute command similar to how we would load models locally: docling-tools models download --output-dir <MOUNT-PATH> [LIST_OF_MODELS]

    In manifest, we specify desired models individually, or we can use --all parameter to download all models.

    Manifest example: docling-model-cache-job.yaml

  3. Now we can mount volume in the docling-serve deployment and set env DOCLING_SERVE_ARTIFACTS_PATH to point to it. Following additions to deployment should be made:

    spec:
      template:
        spec:
          containers:
            - name: api
              env:
              ...
                - name: DOCLING_SERVE_ARTIFACTS_PATH
                  value: '/modelcache'
              volumeMounts:
                - name: docling-model-cache
                  mountPath: /modelcache
          ...
          volumes:
            - name: docling-model-cache
              persistentVolumeClaim:
                claimName: docling-model-cache-pvc
    

    Make sure that value of DOCLING_SERVE_ARTIFACTS_PATH is the same as where models were downloaded and where volume is mounted.

    Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mounted volume.

    Manifest example: docling-model-cache-deployment.yaml