LLM/chandra

Fork 0

mirror of https://github.com/datalab-to/chandra.git synced 2026-03-07 22:33:48 +00:00

Go to file

sandy0kwon 8e2689a739 Rename Datalab Logo.png to datalab-logo.png

2026-01-13 10:17:49 -05:00

.github/workflows

Fix license and README

2025-10-21 09:27:09 -04:00

assets

Rename Datalab Logo.png to datalab-logo.png

2026-01-13 10:17:49 -05:00

chandra

Shift import

2025-11-19 12:18:57 -05:00

tests

Initial benchmarks

2025-10-20 15:11:12 -04:00

.gitignore

Fix attn impl

2025-10-21 11:15:29 -04:00

.pre-commit-config.yaml

Add precommit

2025-10-16 21:21:01 -04:00

.python-version

Initial commit

2025-10-08 17:34:01 -04:00

LICENSE

Fix license and README

2025-10-21 09:27:09 -04:00

MODEL_LICENSE

Add model license

2025-10-21 13:35:24 -04:00

pyproject.toml

Fix max repeats

2025-11-03 17:11:51 -05:00

pytest.ini

Initial benchmarks

2025-10-20 15:11:12 -04:00

README.md

README

2026-01-12 17:59:28 -05:00

uv.lock

Add examples

2025-10-20 20:34:18 -04:00

README.md

Chandra

An OCR model for complex documents — handwriting, tables, math equations, and messy forms.

Benchmarks

Overall scores on the olmocr bench:

Hosted API

A hosted API with additional accuracy improvements is available at datalab.to. Try the free playground without installing.

Community

Join Discord to discuss development and get help.

Quick Start

pip install chandra-ocr

# Start vLLM server, then run OCR
chandra_vllm
chandra input.pdf ./output

# Or use HuggingFace locally
chandra input.pdf ./output --method hf

# Interactive web app
chandra_app

Python:

from chandra.model import InferenceManager
from chandra.input import load_pdf_images

manager = InferenceManager(method="hf")
images = load_pdf_images("document.pdf")
results = manager.generate(images)
print(results[0].markdown)

How it Works.

Two inference modes: Run locally via HuggingFace Transformers, or deploy a vLLM server for production throughput
Layout-aware output: Every text block, table, and image comes with bounding box coordinates
Structured formats: Output as Markdown, HTML, or JSON with full layout metadata
40+ languages supported

What It Handles

Handwriting — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR.

Tables — Preserves structure including merged cells (colspan/rowspan). Works on financial filings, invoices, and data tables.

Math — Inline and block equations rendered as LaTeX. Handles textbooks, worksheets, and research papers.

Forms — Reconstructs checkboxes, radio buttons, and form fields with their values.

Complex Layouts — Multi-column documents, newspapers, textbooks with figures and captions.

Examples


Handwriting	Tables
Math	Newspapers

More examples

Type	Name	Link
Tables	10K Filing	View
Forms	Lease Agreement	View
Handwriting	Math Homework	View
Books	Geography Textbook	View
Books	Exercise Problems	View
Math	Attention Diagram	View
Math	Worksheet	View
Newspapers	LA Times	View
Other	Transcript	View
Other	Flowchart	View

Installation

pip install chandra-ocr

For HuggingFace inference, we recommend installing flash attention for better performance.

From source:

git clone https://github.com/datalab-to/chandra.git
cd chandra
uv sync
source .venv/bin/activate

Usage

CLI

# Single file with vLLM server
chandra input.pdf ./output --method vllm

# Directory with local model
chandra ./documents ./output --method hf

Options:

--method [hf|vllm]: Inference method (default: vllm)
--page-range TEXT: Page range for PDFs (e.g., "1-5,7,9-12")
--max-output-tokens INTEGER: Max tokens per page
--max-workers INTEGER: Parallel workers for vLLM
--include-images/--no-images: Extract and save images (default: include)
--include-headers-footers/--no-headers-footers: Include page headers/footers (default: exclude)
--batch-size INTEGER: Pages per batch (default: 1)

Output structure:

output/
└── filename/
    ├── filename.md           # Markdown
    ├── filename.html         # HTML with bounding boxes
    ├── filename_metadata.json
    └── images/               # Extracted images

vLLM Server

For production or batch processing:

chandra_vllm

Launches a Docker container with optimized inference. Configure via environment:

VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
VLLM_MODEL_NAME: Model name (default: chandra)
VLLM_GPUS: GPU device IDs (default: 0)

Configuration

Settings via environment variables or local.env:

MODEL_CHECKPOINT=datalab-to/chandra
MAX_OUTPUT_TOKENS=8192
VLLM_API_BASE=http://localhost:8000/v1
VLLM_GPUS=0

Commercial Usage

Code is Apache 2.0. Model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $2M funding/revenue. Cannot be used competitively with our API. For broader commercial licensing, see pricing.

README.md

Chandra

Benchmarks

Hosted API

Community

Quick Start

How it Works.

What It Handles

Examples

Installation

Usage

CLI

vLLM Server

Configuration

Commercial Usage

Credits