docling-serve/examples/OTEL.md

# OpenTelemetry Integration for Docling Serve

Docling Serve includes built-in OpenTelemetry instrumentation for metrics and distributed tracing.

## Features

- **Metrics**: Prometheus-compatible metrics endpoint at `/metrics`
- **Traces**: OTLP trace export to OpenTelemetry collectors
- **FastAPI Auto-instrumentation**: HTTP request metrics and traces
- **RQ Metrics**: Worker and job queue metrics (when using RQ engine)

## Configuration

All settings are controlled via environment variables:

```bash
# Enable/disable features
DOCLING_SERVE_OTEL_ENABLE_METRICS=true       # Enable metrics collection
DOCLING_SERVE_OTEL_ENABLE_TRACES=true        # Enable trace collection
DOCLING_SERVE_OTEL_ENABLE_PROMETHEUS=true    # Enable Prometheus /metrics endpoint
DOCLING_SERVE_OTEL_ENABLE_OTLP_METRICS=false # Enable OTLP metrics export

# Service identification
DOCLING_SERVE_OTEL_SERVICE_NAME=docling-serve

# OTLP endpoint (for traces and optional metrics)
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
```

## Quick Start

### Option 1: Direct Prometheus Scraping

1. Start docling-serve with default settings:
   ```bash
   uv run docling-serve
   ```

2. Add to your `prometheus.yml`:
   ```yaml
   scrape_configs:
     - job_name: 'docling-serve'
       static_configs:
         - targets: ['localhost:5001']
   ```

3. Access metrics at `http://localhost:5001/metrics`

### Option 2: Full OTEL Stack with Docker Compose

1. Use the provided compose file:
   ```bash
   cd examples
   mkdit tempo-data
   docker-compose -f docker-compose-otel.yaml up
   ```

2. This starts:
   - **docling-serve**: API server with UI
   - **docling-worker**: RQ worker for distributed processing (scales independently)
   - **redis**: Message queue for RQ
   - **otel-collector**: Receives and routes telemetry
   - **prometheus**: Metrics storage
   - **tempo**: Trace storage
   - **grafana**: Visualization UI

3. Access:
   - Docling Serve UI: `http://localhost:5001/ui`
   - Metrics endpoint: `http://localhost:5001/metrics`
   - Grafana: `http://localhost:3000` (pre-configured with Prometheus & Tempo)
   - Prometheus: `http://localhost:9090`
   - Tempo: `http://localhost:3200`

4. Scale workers (optional):
   ```bash
   docker-compose -f docker-compose-otel.yaml up --scale docling-worker=3
   ```

## Available Metrics

### HTTP Metrics (from OpenTelemetry FastAPI instrumentation)
- `http_server_request_duration` - Request duration histogram
- `http_server_active_requests` - Active requests gauge
- `http_server_request_size` - Request size histogram
- `http_server_response_size` - Response size histogram

### RQ Metrics (when using RQ engine)
- `rq_workers` - Number of workers by state
- `rq_workers_success` - Successful job count per worker
- `rq_workers_failed` - Failed job count per worker
- `rq_workers_working_time` - Total working time per worker
- `rq_jobs` - Job counts by queue and status
- `rq_request_processing_seconds` - RQ metrics collection time

## Traces

Traces are automatically generated for:
- All HTTP requests to FastAPI endpoints
- Document conversion operations
- **RQ job execution (distributed tracing)**: When using RQ engine, traces propagate from API requests to worker jobs, providing end-to-end visibility across the distributed system

View traces in Grafana Tempo or any OTLP-compatible backend.

### Distributed Tracing in RQ Mode

When running with the RQ engine (`DOCLING_SERVE_ENG_KIND=rq`), traces automatically propagate from the API to RQ workers:

1. **API Request**: FastAPI creates a trace when a document conversion request arrives
2. **Job Enqueue**: The trace context is injected into the RQ job metadata
3. **Worker Execution**: The RQ worker extracts the trace context and continues the trace
4. **End-to-End View**: You can see the complete request flow from API to worker in Grafana

This allows you to:
- Track document processing latency across API and workers
- Identify bottlenecks in the conversion pipeline
- Debug distributed processing issues
- Monitor queue wait times and processing times separately

## Example Files

See the `examples/` directory:
- `prometheus-scrape.yaml` - Prometheus scrape configuration examples
- `docker-compose-otel.yaml` - Full observability stack
- `otel-collector-config.yaml` - OTEL collector configuration
- `prometheus.yaml` - Prometheus configuration
- `tempo.yaml` - Tempo trace storage configuration
- `grafana-datasources.yaml` - Grafana data source provisioning

## Production Considerations

1. **Security**: Add authentication to the `/metrics` endpoint if needed
2. **Performance**: Metrics collection has minimal overhead (<1ms per scrape)
3. **Storage**: Configure retention policies in Prometheus/Tempo
4. **Sampling**: Configure trace sampling for high-volume services
5. **Labels**: Keep cardinality low to avoid metric explosion

## Disabling OTEL

To disable all OTEL features:

```bash
DOCLING_SERVE_OTEL_ENABLE_METRICS=false
DOCLING_SERVE_OTEL_ENABLE_TRACES=false
```