2026-01-13 10:17:49 -05:00
2025-10-21 09:27:09 -04:00
2025-11-19 12:18:57 -05:00
2025-10-20 15:11:12 -04:00
2025-10-21 11:15:29 -04:00
2025-10-16 21:21:01 -04:00
2025-10-08 17:34:01 -04:00
2025-10-21 09:27:09 -04:00
2025-10-21 13:35:24 -04:00
2025-11-03 17:11:51 -05:00
2025-10-20 15:11:12 -04:00
2026-01-12 17:59:28 -05:00
2025-10-20 20:34:18 -04:00

Chandra

Discord

An OCR model for complex documents — handwriting, tables, math equations, and messy forms.

Benchmarks

Overall scores on the olmocr bench:

Hosted API

A hosted API with additional accuracy improvements is available at datalab.to. Try the free playground without installing.

Community

Join Discord to discuss development and get help.

Quick Start

pip install chandra-ocr

# Start vLLM server, then run OCR
chandra_vllm
chandra input.pdf ./output

# Or use HuggingFace locally
chandra input.pdf ./output --method hf

# Interactive web app
chandra_app

Python:

from chandra.model import InferenceManager
from chandra.input import load_pdf_images

manager = InferenceManager(method="hf")
images = load_pdf_images("document.pdf")
results = manager.generate(images)
print(results[0].markdown)

How it Works.

  • Two inference modes: Run locally via HuggingFace Transformers, or deploy a vLLM server for production throughput
  • Layout-aware output: Every text block, table, and image comes with bounding box coordinates
  • Structured formats: Output as Markdown, HTML, or JSON with full layout metadata
  • 40+ languages supported

What It Handles

Handwriting — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR.

Tables — Preserves structure including merged cells (colspan/rowspan). Works on financial filings, invoices, and data tables.

Math — Inline and block equations rendered as LaTeX. Handles textbooks, worksheets, and research papers.

Forms — Reconstructs checkboxes, radio buttons, and form fields with their values.

Complex Layouts — Multi-column documents, newspapers, textbooks with figures and captions.

Examples


Handwriting

Tables

Math

Newspapers
More examples
Type Name Link
Tables 10K Filing View
Forms Lease Agreement View
Handwriting Math Homework View
Books Geography Textbook View
Books Exercise Problems View
Math Attention Diagram View
Math Worksheet View
Newspapers LA Times View
Other Transcript View
Other Flowchart View

Installation

pip install chandra-ocr

For HuggingFace inference, we recommend installing flash attention for better performance.

From source:

git clone https://github.com/datalab-to/chandra.git
cd chandra
uv sync
source .venv/bin/activate

Usage

CLI

# Single file with vLLM server
chandra input.pdf ./output --method vllm

# Directory with local model
chandra ./documents ./output --method hf

Options:

  • --method [hf|vllm]: Inference method (default: vllm)
  • --page-range TEXT: Page range for PDFs (e.g., "1-5,7,9-12")
  • --max-output-tokens INTEGER: Max tokens per page
  • --max-workers INTEGER: Parallel workers for vLLM
  • --include-images/--no-images: Extract and save images (default: include)
  • --include-headers-footers/--no-headers-footers: Include page headers/footers (default: exclude)
  • --batch-size INTEGER: Pages per batch (default: 1)

Output structure:

output/
└── filename/
    ├── filename.md           # Markdown
    ├── filename.html         # HTML with bounding boxes
    ├── filename_metadata.json
    └── images/               # Extracted images

vLLM Server

For production or batch processing:

chandra_vllm

Launches a Docker container with optimized inference. Configure via environment:

  • VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
  • VLLM_MODEL_NAME: Model name (default: chandra)
  • VLLM_GPUS: GPU device IDs (default: 0)

Configuration

Settings via environment variables or local.env:

MODEL_CHECKPOINT=datalab-to/chandra
MAX_OUTPUT_TOKENS=8192
VLLM_API_BASE=http://localhost:8000/v1
VLLM_GPUS=0

Commercial Usage

Code is Apache 2.0. Model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $2M funding/revenue. Cannot be used competitively with our API. For broader commercial licensing, see pricing.

Credits

Description
No description provided
Readme Apache-2.0 14 MiB
Languages
Python 75.2%
HTML 24.8%