Chandra

Chandra is a highly accurate OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.

Features

Convert documents to markdown, html, or json with detailed layout information
Math equation support (LaTeX)
Reconstructs forms, including checkboxes
Precise table reconstruction
Support for 40+ languages
Two inference modes: local (HuggingFace) and remote (vLLM server)

Installation

uv sync
source .venv/bin/activate

Usage

Streamlit Web App

Launch the interactive demo:

streamlit run chandra_app.py --server.fileWatcherType none --server.headless true

Inference Modes:

hf: Loads model locally using HuggingFace Transformers (requires GPU)
vllm: Connects to a running vLLM server for optimized batch inference

vLLM Server (Optional)

For production deployments or batch processing, use the vLLM server:

python scripts/start_vllm.py

This launches a Docker container with optimized inference settings. Configure via environment variables:

VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
VLLM_MODEL_NAME: Model name for the server (default: chandra)
VLLM_GPUS: GPU device IDs (default: 0)
HF_TOKEN: HuggingFace token for model access

Configuration

Settings can be configured via environment variables or a local.env file:

# Model settings
MODEL_CHECKPOINT=datalab-to/chandra-0.2.8
MAX_OUTPUT_TOKENS=8192

# vLLM settings
VLLM_API_BASE=http://localhost:8000/v1
VLLM_MODEL_NAME=chandra
VLLM_GPUS=0

Output Formats

Chandra provides three output formats:

HTML: Structured HTML with layout blocks and bounding boxes
Markdown: Clean, readable Markdown conversion
Layout Image: Visual representation of detected layout blocks