LLM/chandra

Fork 0

mirror of https://github.com/datalab-to/chandra.git synced 2026-01-20 05:50:42 +00:00

Files

Vik Paruchuri 8cb96bfd65 README

2026-01-12 17:59:28 -05:00

6.0 KiB

Raw Blame History

Chandra

An OCR model for complex documents — handwriting, tables, math equations, and messy forms.

Benchmarks

Overall scores on the olmocr bench:

Hosted API

A hosted API with additional accuracy improvements is available at datalab.to. Try the free playground without installing.

Community

Join Discord to discuss development and get help.

Quick Start

pip install chandra-ocr

# Start vLLM server, then run OCR
chandra_vllm
chandra input.pdf ./output

# Or use HuggingFace locally
chandra input.pdf ./output --method hf

# Interactive web app
chandra_app

Python:

from chandra.model import InferenceManager
from chandra.input import load_pdf_images

manager = InferenceManager(method="hf")
images = load_pdf_images("document.pdf")
results = manager.generate(images)
print(results[0].markdown)

How it Works.

Two inference modes: Run locally via HuggingFace Transformers, or deploy a vLLM server for production throughput
Layout-aware output: Every text block, table, and image comes with bounding box coordinates
Structured formats: Output as Markdown, HTML, or JSON with full layout metadata
40+ languages supported

What It Handles

Handwriting — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR.

Tables — Preserves structure including merged cells (colspan/rowspan). Works on financial filings, invoices, and data tables.

Math — Inline and block equations rendered as LaTeX. Handles textbooks, worksheets, and research papers.

Forms — Reconstructs checkboxes, radio buttons, and form fields with their values.

Complex Layouts — Multi-column documents, newspapers, textbooks with figures and captions.

Examples


Handwriting	Tables
Math	Newspapers

More examples

Type	Name	Link
Tables	10K Filing	View
Forms	Lease Agreement	View
Handwriting	Math Homework	View
Books	Geography Textbook	View
Books	Exercise Problems	View
Math	Attention Diagram	View
Math	Worksheet	View
Newspapers	LA Times	View
Other	Transcript	View
Other	Flowchart	View

Installation

pip install chandra-ocr

For HuggingFace inference, we recommend installing flash attention for better performance.

From source:

git clone https://github.com/datalab-to/chandra.git
cd chandra
uv sync
source .venv/bin/activate

Usage

CLI

# Single file with vLLM server
chandra input.pdf ./output --method vllm

# Directory with local model
chandra ./documents ./output --method hf

Options:

--method [hf|vllm]: Inference method (default: vllm)
--page-range TEXT: Page range for PDFs (e.g., "1-5,7,9-12")
--max-output-tokens INTEGER: Max tokens per page
--max-workers INTEGER: Parallel workers for vLLM
--include-images/--no-images: Extract and save images (default: include)
--include-headers-footers/--no-headers-footers: Include page headers/footers (default: exclude)
--batch-size INTEGER: Pages per batch (default: 1)

Output structure:

output/
└── filename/
    ├── filename.md           # Markdown
    ├── filename.html         # HTML with bounding boxes
    ├── filename_metadata.json
    └── images/               # Extracted images

vLLM Server

For production or batch processing:

chandra_vllm

Launches a Docker container with optimized inference. Configure via environment:

VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
VLLM_MODEL_NAME: Model name (default: chandra)
VLLM_GPUS: GPU device IDs (default: 0)

Configuration

Settings via environment variables or local.env:

MODEL_CHECKPOINT=datalab-to/chandra
MAX_OUTPUT_TOKENS=8192
VLLM_API_BASE=http://localhost:8000/v1
VLLM_GPUS=0

Commercial Usage

Code is Apache 2.0. Model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $2M funding/revenue. Cannot be used competitively with our API. For broader commercial licensing, see pricing.

6.0 KiB Raw Blame History