mirror of
https://github.com/docling-project/docling-serve.git
synced 2026-03-07 22:33:44 +00:00
145 lines
4.9 KiB
Markdown
145 lines
4.9 KiB
Markdown
# OpenTelemetry Integration for Docling Serve
|
|
|
|
Docling Serve includes built-in OpenTelemetry instrumentation for metrics and distributed tracing.
|
|
|
|
## Features
|
|
|
|
- **Metrics**: Prometheus-compatible metrics endpoint at `/metrics`
|
|
- **Traces**: OTLP trace export to OpenTelemetry collectors
|
|
- **FastAPI Auto-instrumentation**: HTTP request metrics and traces
|
|
- **RQ Metrics**: Worker and job queue metrics (when using RQ engine)
|
|
|
|
## Configuration
|
|
|
|
All settings are controlled via environment variables:
|
|
|
|
```bash
|
|
# Enable/disable features
|
|
DOCLING_SERVE_OTEL_ENABLE_METRICS=true # Enable metrics collection
|
|
DOCLING_SERVE_OTEL_ENABLE_TRACES=true # Enable trace collection
|
|
DOCLING_SERVE_OTEL_ENABLE_PROMETHEUS=true # Enable Prometheus /metrics endpoint
|
|
DOCLING_SERVE_OTEL_ENABLE_OTLP_METRICS=false # Enable OTLP metrics export
|
|
|
|
# Service identification
|
|
DOCLING_SERVE_OTEL_SERVICE_NAME=docling-serve
|
|
|
|
# OTLP endpoint (for traces and optional metrics)
|
|
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Option 1: Direct Prometheus Scraping
|
|
|
|
1. Start docling-serve with default settings:
|
|
```bash
|
|
uv run docling-serve
|
|
```
|
|
|
|
2. Add to your `prometheus.yml`:
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'docling-serve'
|
|
static_configs:
|
|
- targets: ['localhost:5001']
|
|
```
|
|
|
|
3. Access metrics at `http://localhost:5001/metrics`
|
|
|
|
### Option 2: Full OTEL Stack with Docker Compose
|
|
|
|
1. Use the provided compose file:
|
|
```bash
|
|
cd examples
|
|
mkdit tempo-data
|
|
docker-compose -f docker-compose-otel.yaml up
|
|
```
|
|
|
|
2. This starts:
|
|
- **docling-serve**: API server with UI
|
|
- **docling-worker**: RQ worker for distributed processing (scales independently)
|
|
- **redis**: Message queue for RQ
|
|
- **otel-collector**: Receives and routes telemetry
|
|
- **prometheus**: Metrics storage
|
|
- **tempo**: Trace storage
|
|
- **grafana**: Visualization UI
|
|
|
|
3. Access:
|
|
- Docling Serve UI: `http://localhost:5001/ui`
|
|
- Metrics endpoint: `http://localhost:5001/metrics`
|
|
- Grafana: `http://localhost:3000` (pre-configured with Prometheus & Tempo)
|
|
- Prometheus: `http://localhost:9090`
|
|
- Tempo: `http://localhost:3200`
|
|
|
|
4. Scale workers (optional):
|
|
```bash
|
|
docker-compose -f docker-compose-otel.yaml up --scale docling-worker=3
|
|
```
|
|
|
|
## Available Metrics
|
|
|
|
### HTTP Metrics (from OpenTelemetry FastAPI instrumentation)
|
|
- `http_server_request_duration` - Request duration histogram
|
|
- `http_server_active_requests` - Active requests gauge
|
|
- `http_server_request_size` - Request size histogram
|
|
- `http_server_response_size` - Response size histogram
|
|
|
|
### RQ Metrics (when using RQ engine)
|
|
- `rq_workers` - Number of workers by state
|
|
- `rq_workers_success` - Successful job count per worker
|
|
- `rq_workers_failed` - Failed job count per worker
|
|
- `rq_workers_working_time` - Total working time per worker
|
|
- `rq_jobs` - Job counts by queue and status
|
|
- `rq_request_processing_seconds` - RQ metrics collection time
|
|
|
|
## Traces
|
|
|
|
Traces are automatically generated for:
|
|
- All HTTP requests to FastAPI endpoints
|
|
- Document conversion operations
|
|
- **RQ job execution (distributed tracing)**: When using RQ engine, traces propagate from API requests to worker jobs, providing end-to-end visibility across the distributed system
|
|
|
|
View traces in Grafana Tempo or any OTLP-compatible backend.
|
|
|
|
### Distributed Tracing in RQ Mode
|
|
|
|
When running with the RQ engine (`DOCLING_SERVE_ENG_KIND=rq`), traces automatically propagate from the API to RQ workers:
|
|
|
|
1. **API Request**: FastAPI creates a trace when a document conversion request arrives
|
|
2. **Job Enqueue**: The trace context is injected into the RQ job metadata
|
|
3. **Worker Execution**: The RQ worker extracts the trace context and continues the trace
|
|
4. **End-to-End View**: You can see the complete request flow from API to worker in Grafana
|
|
|
|
This allows you to:
|
|
- Track document processing latency across API and workers
|
|
- Identify bottlenecks in the conversion pipeline
|
|
- Debug distributed processing issues
|
|
- Monitor queue wait times and processing times separately
|
|
|
|
## Example Files
|
|
|
|
See the `examples/` directory:
|
|
- `prometheus-scrape.yaml` - Prometheus scrape configuration examples
|
|
- `docker-compose-otel.yaml` - Full observability stack
|
|
- `otel-collector-config.yaml` - OTEL collector configuration
|
|
- `prometheus.yaml` - Prometheus configuration
|
|
- `tempo.yaml` - Tempo trace storage configuration
|
|
- `grafana-datasources.yaml` - Grafana data source provisioning
|
|
|
|
## Production Considerations
|
|
|
|
1. **Security**: Add authentication to the `/metrics` endpoint if needed
|
|
2. **Performance**: Metrics collection has minimal overhead (<1ms per scrape)
|
|
3. **Storage**: Configure retention policies in Prometheus/Tempo
|
|
4. **Sampling**: Configure trace sampling for high-volume services
|
|
5. **Labels**: Keep cardinality low to avoid metric explosion
|
|
|
|
## Disabling OTEL
|
|
|
|
To disable all OTEL features:
|
|
|
|
```bash
|
|
DOCLING_SERVE_OTEL_ENABLE_METRICS=false
|
|
DOCLING_SERVE_OTEL_ENABLE_TRACES=false
|
|
```
|