Vik Paruchuri 1c8d63fa5a Add examples
2025-10-20 20:34:18 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 15:11:12 -04:00
2025-10-20 15:11:12 -04:00
2025-10-16 21:21:01 -04:00
2025-10-08 17:34:01 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 15:11:12 -04:00
2025-10-20 20:34:18 -04:00
2025-10-20 20:34:18 -04:00

Chandra

Chandra is an OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.

Features

  • Convert documents to markdown, html, or json with detailed layout information
  • Good handwriting support
  • Reconstructs forms accurately, including checkboxes
  • Good support for tables, math, and complex layouts
  • Captions images and extracts data from diagrams
  • Support for 40+ languages
  • Two inference modes: local (HuggingFace) and remote (vLLM server)

Hosted API

  • We have a hosted API for Chandra here, which also includes other accuracy improvements and document workflows.
  • There is a free playground here if you want to try it out without installing.

Installation

pip install chandra-ocr

From Source

git clone https://github.com/yourusername/chandra.git
cd chandra
uv sync
source .venv/bin/activate

Usage

CLI

Process single files or entire directories:

# Single file, with vllm server (see below for how to launch)
chandra input.pdf ./output --method vllm

# Process all files in a directory with local model
chandra ./documents ./output --method hf

CLI Options:

  • --method [hf|vllm]: Inference method (default: vllm)
  • --page-range TEXT: Page range for PDFs (e.g., "1-5,7,9-12")
  • --max-output-tokens INTEGER: Max tokens per page
  • --max-workers INTEGER: Parallel workers for vLLM
  • --include-images/--no-images: Extract and save images (default: include)
  • --include-headers-footers/--no-headers-footers: Include page headers/footers (default: exclude)
  • --batch-size INTEGER: Pages per batch (default: 1)

Output Structure:

Each processed file creates a subdirectory with:

  • <filename>.md - Markdown output
  • <filename>.html - HTML output
  • <filename>_metadata.json - Metadata (page info, token count, etc.)
  • images/ - Extracted images from the document

Streamlit Web App

Launch the interactive demo for single-page processing:

chandra_app

vLLM Server (Optional)

For production deployments or batch processing, use the vLLM server:

chandra_vllm

This launches a Docker container with optimized inference settings. Configure via environment variables:

  • VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
  • VLLM_MODEL_NAME: Model name for the server (default: chandra)
  • VLLM_GPUS: GPU device IDs (default: 0)

Configuration

Settings can be configured via environment variables or a local.env file:

# Model settings
MODEL_CHECKPOINT=datalab-to/chandra-0.2.8
MAX_OUTPUT_TOKENS=8192

# vLLM settings
VLLM_API_BASE=http://localhost:8000/v1
VLLM_MODEL_NAME=chandra
VLLM_GPUS=0

Benchmarks

Model ArXiv Old Scans Math Tables Old Scans Headers and Footers Multi column Long tiny text Base Overall
Datalab Chandra v0.1.0 81.4 80.3 89.4 50.0 88.3 81.0 91.6 99.9 82.7 ± 0.9
Datalab Marker v1.10.0 83.8 69.7 74.8 32.3 86.6 79.4 85.7 99.6 76.5 ± 1.0
Mistral OCR API 77.2 67.5 60.6 29.3 93.6 71.3 77.1 99.4 72.0 ± 1.1
Deepseek OCR 75.2 67.9 79.1 32.9 96.1 66.3 78.5 97.7 74.2 ± 1.0
GPT-4o (Anchored) 53.5 74.5 70.0 40.7 93.8 69.3 60.6 96.8 69.9 ± 1.1
Gemini Flash 2 (Anchored) 54.5 56.1 72.1 34.2 64.7 61.5 71.5 95.6 63.8 ± 1.2
Qwen 3 VL 70.2 75.1 45.6 37.5 89.1 62.1 43.0 94.3 64.6 ± 1.1
olmOCR v0.3.0 78.6 79.9 72.9 43.9 95.1 77.3 81.2 98.9 78.5 ± 1.1

Examples

Type Name Link
Tables Water Damage Form View
Tables 10K Filing View
Forms Handwritten Form View
Forms Lease Agreement View
Handwriting Doctor Note View
Handwriting Math Homework View
Books Geography Textbook View
Books Exercise Problems View
Math Attention Diagram View
Math Worksheet View
Math EGA Page View
Newspapers New York Times View
Newspapers LA Times View
Other Transcript View
Other Flowchart View

Commercial usage

This code is GPL-3, and our model weights use a modified OpenRAIL-M license (free for research, personal use, and startups under $2M funding/revenue). To remove the GPL license requirements, or for broader commercial licensing, visit our pricing page here.

Description
No description provided
Readme Apache-2.0 14 MiB
Languages
Python 75.2%
HTML 24.8%