Files
docling-serve/examples/OTEL.md
2026-01-12 13:17:07 +01:00

4.9 KiB

OpenTelemetry Integration for Docling Serve

Docling Serve includes built-in OpenTelemetry instrumentation for metrics and distributed tracing.

Features

  • Metrics: Prometheus-compatible metrics endpoint at /metrics
  • Traces: OTLP trace export to OpenTelemetry collectors
  • FastAPI Auto-instrumentation: HTTP request metrics and traces
  • RQ Metrics: Worker and job queue metrics (when using RQ engine)

Configuration

All settings are controlled via environment variables:

# Enable/disable features
DOCLING_SERVE_OTEL_ENABLE_METRICS=true       # Enable metrics collection
DOCLING_SERVE_OTEL_ENABLE_TRACES=true        # Enable trace collection
DOCLING_SERVE_OTEL_ENABLE_PROMETHEUS=true    # Enable Prometheus /metrics endpoint
DOCLING_SERVE_OTEL_ENABLE_OTLP_METRICS=false # Enable OTLP metrics export

# Service identification
DOCLING_SERVE_OTEL_SERVICE_NAME=docling-serve

# OTLP endpoint (for traces and optional metrics)
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

Quick Start

Option 1: Direct Prometheus Scraping

  1. Start docling-serve with default settings:

    uv run docling-serve
    
  2. Add to your prometheus.yml:

    scrape_configs:
      - job_name: 'docling-serve'
        static_configs:
          - targets: ['localhost:5001']
    
  3. Access metrics at http://localhost:5001/metrics

Option 2: Full OTEL Stack with Docker Compose

  1. Use the provided compose file:

    cd examples
    mkdit tempo-data
    docker-compose -f docker-compose-otel.yaml up
    
  2. This starts:

    • docling-serve: API server with UI
    • docling-worker: RQ worker for distributed processing (scales independently)
    • redis: Message queue for RQ
    • otel-collector: Receives and routes telemetry
    • prometheus: Metrics storage
    • tempo: Trace storage
    • grafana: Visualization UI
  3. Access:

    • Docling Serve UI: http://localhost:5001/ui
    • Metrics endpoint: http://localhost:5001/metrics
    • Grafana: http://localhost:3000 (pre-configured with Prometheus & Tempo)
    • Prometheus: http://localhost:9090
    • Tempo: http://localhost:3200
  4. Scale workers (optional):

    docker-compose -f docker-compose-otel.yaml up --scale docling-worker=3
    

Available Metrics

HTTP Metrics (from OpenTelemetry FastAPI instrumentation)

  • http_server_request_duration - Request duration histogram
  • http_server_active_requests - Active requests gauge
  • http_server_request_size - Request size histogram
  • http_server_response_size - Response size histogram

RQ Metrics (when using RQ engine)

  • rq_workers - Number of workers by state
  • rq_workers_success - Successful job count per worker
  • rq_workers_failed - Failed job count per worker
  • rq_workers_working_time - Total working time per worker
  • rq_jobs - Job counts by queue and status
  • rq_request_processing_seconds - RQ metrics collection time

Traces

Traces are automatically generated for:

  • All HTTP requests to FastAPI endpoints
  • Document conversion operations
  • RQ job execution (distributed tracing): When using RQ engine, traces propagate from API requests to worker jobs, providing end-to-end visibility across the distributed system

View traces in Grafana Tempo or any OTLP-compatible backend.

Distributed Tracing in RQ Mode

When running with the RQ engine (DOCLING_SERVE_ENG_KIND=rq), traces automatically propagate from the API to RQ workers:

  1. API Request: FastAPI creates a trace when a document conversion request arrives
  2. Job Enqueue: The trace context is injected into the RQ job metadata
  3. Worker Execution: The RQ worker extracts the trace context and continues the trace
  4. End-to-End View: You can see the complete request flow from API to worker in Grafana

This allows you to:

  • Track document processing latency across API and workers
  • Identify bottlenecks in the conversion pipeline
  • Debug distributed processing issues
  • Monitor queue wait times and processing times separately

Example Files

See the examples/ directory:

  • prometheus-scrape.yaml - Prometheus scrape configuration examples
  • docker-compose-otel.yaml - Full observability stack
  • otel-collector-config.yaml - OTEL collector configuration
  • prometheus.yaml - Prometheus configuration
  • tempo.yaml - Tempo trace storage configuration
  • grafana-datasources.yaml - Grafana data source provisioning

Production Considerations

  1. Security: Add authentication to the /metrics endpoint if needed
  2. Performance: Metrics collection has minimal overhead (<1ms per scrape)
  3. Storage: Configure retention policies in Prometheus/Tempo
  4. Sampling: Configure trace sampling for high-volume services
  5. Labels: Keep cardinality low to avoid metric explosion

Disabling OTEL

To disable all OTEL features:

DOCLING_SERVE_OTEL_ENABLE_METRICS=false
DOCLING_SERVE_OTEL_ENABLE_TRACES=false