Files
docling-serve/examples/OTEL.md
2026-01-12 13:17:07 +01:00

145 lines
4.9 KiB
Markdown

# OpenTelemetry Integration for Docling Serve
Docling Serve includes built-in OpenTelemetry instrumentation for metrics and distributed tracing.
## Features
- **Metrics**: Prometheus-compatible metrics endpoint at `/metrics`
- **Traces**: OTLP trace export to OpenTelemetry collectors
- **FastAPI Auto-instrumentation**: HTTP request metrics and traces
- **RQ Metrics**: Worker and job queue metrics (when using RQ engine)
## Configuration
All settings are controlled via environment variables:
```bash
# Enable/disable features
DOCLING_SERVE_OTEL_ENABLE_METRICS=true # Enable metrics collection
DOCLING_SERVE_OTEL_ENABLE_TRACES=true # Enable trace collection
DOCLING_SERVE_OTEL_ENABLE_PROMETHEUS=true # Enable Prometheus /metrics endpoint
DOCLING_SERVE_OTEL_ENABLE_OTLP_METRICS=false # Enable OTLP metrics export
# Service identification
DOCLING_SERVE_OTEL_SERVICE_NAME=docling-serve
# OTLP endpoint (for traces and optional metrics)
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
```
## Quick Start
### Option 1: Direct Prometheus Scraping
1. Start docling-serve with default settings:
```bash
uv run docling-serve
```
2. Add to your `prometheus.yml`:
```yaml
scrape_configs:
- job_name: 'docling-serve'
static_configs:
- targets: ['localhost:5001']
```
3. Access metrics at `http://localhost:5001/metrics`
### Option 2: Full OTEL Stack with Docker Compose
1. Use the provided compose file:
```bash
cd examples
mkdit tempo-data
docker-compose -f docker-compose-otel.yaml up
```
2. This starts:
- **docling-serve**: API server with UI
- **docling-worker**: RQ worker for distributed processing (scales independently)
- **redis**: Message queue for RQ
- **otel-collector**: Receives and routes telemetry
- **prometheus**: Metrics storage
- **tempo**: Trace storage
- **grafana**: Visualization UI
3. Access:
- Docling Serve UI: `http://localhost:5001/ui`
- Metrics endpoint: `http://localhost:5001/metrics`
- Grafana: `http://localhost:3000` (pre-configured with Prometheus & Tempo)
- Prometheus: `http://localhost:9090`
- Tempo: `http://localhost:3200`
4. Scale workers (optional):
```bash
docker-compose -f docker-compose-otel.yaml up --scale docling-worker=3
```
## Available Metrics
### HTTP Metrics (from OpenTelemetry FastAPI instrumentation)
- `http_server_request_duration` - Request duration histogram
- `http_server_active_requests` - Active requests gauge
- `http_server_request_size` - Request size histogram
- `http_server_response_size` - Response size histogram
### RQ Metrics (when using RQ engine)
- `rq_workers` - Number of workers by state
- `rq_workers_success` - Successful job count per worker
- `rq_workers_failed` - Failed job count per worker
- `rq_workers_working_time` - Total working time per worker
- `rq_jobs` - Job counts by queue and status
- `rq_request_processing_seconds` - RQ metrics collection time
## Traces
Traces are automatically generated for:
- All HTTP requests to FastAPI endpoints
- Document conversion operations
- **RQ job execution (distributed tracing)**: When using RQ engine, traces propagate from API requests to worker jobs, providing end-to-end visibility across the distributed system
View traces in Grafana Tempo or any OTLP-compatible backend.
### Distributed Tracing in RQ Mode
When running with the RQ engine (`DOCLING_SERVE_ENG_KIND=rq`), traces automatically propagate from the API to RQ workers:
1. **API Request**: FastAPI creates a trace when a document conversion request arrives
2. **Job Enqueue**: The trace context is injected into the RQ job metadata
3. **Worker Execution**: The RQ worker extracts the trace context and continues the trace
4. **End-to-End View**: You can see the complete request flow from API to worker in Grafana
This allows you to:
- Track document processing latency across API and workers
- Identify bottlenecks in the conversion pipeline
- Debug distributed processing issues
- Monitor queue wait times and processing times separately
## Example Files
See the `examples/` directory:
- `prometheus-scrape.yaml` - Prometheus scrape configuration examples
- `docker-compose-otel.yaml` - Full observability stack
- `otel-collector-config.yaml` - OTEL collector configuration
- `prometheus.yaml` - Prometheus configuration
- `tempo.yaml` - Tempo trace storage configuration
- `grafana-datasources.yaml` - Grafana data source provisioning
## Production Considerations
1. **Security**: Add authentication to the `/metrics` endpoint if needed
2. **Performance**: Metrics collection has minimal overhead (<1ms per scrape)
3. **Storage**: Configure retention policies in Prometheus/Tempo
4. **Sampling**: Configure trace sampling for high-volume services
5. **Labels**: Keep cardinality low to avoid metric explosion
## Disabling OTEL
To disable all OTEL features:
```bash
DOCLING_SERVE_OTEL_ENABLE_METRICS=false
DOCLING_SERVE_OTEL_ENABLE_TRACES=false
```