# Chandra [![Discord](https://img.shields.io/badge/Discord-Join%20us-5865F2?logo=discord&logoColor=white)](https://discord.gg/KuZwXNGnfH) An OCR model for complex documents — handwriting, tables, math equations, and messy forms. ## Benchmarks Overall scores on the [olmocr bench](https://github.com/allenai/olmocr): ## Hosted API A hosted API with additional accuracy improvements is available at [datalab.to](https://www.datalab.to/). Try the [free playground](https://www.datalab.to/playground) without installing. ## Community Join [Discord](https://discord.gg//KuZwXNGnfH) to discuss development and get help. ## Quick Start ```shell pip install chandra-ocr # Start vLLM server, then run OCR chandra_vllm chandra input.pdf ./output # Or use HuggingFace locally chandra input.pdf ./output --method hf # Interactive web app chandra_app ``` **Python:** ```python from chandra.model import InferenceManager from chandra.input import load_pdf_images manager = InferenceManager(method="hf") images = load_pdf_images("document.pdf") results = manager.generate(images) print(results[0].markdown) ``` ## How it Works. - **Two inference modes**: Run locally via HuggingFace Transformers, or deploy a vLLM server for production throughput - **Layout-aware output**: Every text block, table, and image comes with bounding box coordinates - **Structured formats**: Output as Markdown, HTML, or JSON with full layout metadata - **40+ languages** supported ## What It Handles **Handwriting** — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR. **Tables** — Preserves structure including merged cells (colspan/rowspan). Works on financial filings, invoices, and data tables. **Math** — Inline and block equations rendered as LaTeX. Handles textbooks, worksheets, and research papers. **Forms** — Reconstructs checkboxes, radio buttons, and form fields with their values. **Complex Layouts** — Multi-column documents, newspapers, textbooks with figures and captions. ## Examples | | | |---|---| |
**Handwriting** |
**Tables** | |
**Math** |
**Newspapers** |
More examples | Type | Name | Link | |------|------|------| | Tables | 10K Filing | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/tables/10k.png) | | Forms | Lease Agreement | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/forms/lease.png) | | Handwriting | Math Homework | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/handwriting/math_hw.png) | | Books | Geography Textbook | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/books/geo_textbook_page.png) | | Books | Exercise Problems | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/books/exercises.png) | | Math | Attention Diagram | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/math/attn_all.png) | | Math | Worksheet | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/math/worksheet.png) | | Newspapers | LA Times | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/newspapers/la_times.png) | | Other | Transcript | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/other/transcript.png) | | Other | Flowchart | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/other/flowchart.png) |
## Installation ```bash pip install chandra-ocr ``` For HuggingFace inference, we recommend installing [flash attention](https://github.com/Dao-AILab/flash-attention) for better performance. **From source:** ```bash git clone https://github.com/datalab-to/chandra.git cd chandra uv sync source .venv/bin/activate ``` ## Usage ### CLI ```bash # Single file with vLLM server chandra input.pdf ./output --method vllm # Directory with local model chandra ./documents ./output --method hf ``` **Options:** - `--method [hf|vllm]`: Inference method (default: vllm) - `--page-range TEXT`: Page range for PDFs (e.g., "1-5,7,9-12") - `--max-output-tokens INTEGER`: Max tokens per page - `--max-workers INTEGER`: Parallel workers for vLLM - `--include-images/--no-images`: Extract and save images (default: include) - `--include-headers-footers/--no-headers-footers`: Include page headers/footers (default: exclude) - `--batch-size INTEGER`: Pages per batch (default: 1) **Output structure:** ``` output/ └── filename/ ├── filename.md # Markdown ├── filename.html # HTML with bounding boxes ├── filename_metadata.json └── images/ # Extracted images ``` ### vLLM Server For production or batch processing: ```bash chandra_vllm ``` Launches a Docker container with optimized inference. Configure via environment: - `VLLM_API_BASE`: Server URL (default: `http://localhost:8000/v1`) - `VLLM_MODEL_NAME`: Model name (default: `chandra`) - `VLLM_GPUS`: GPU device IDs (default: `0`) ### Configuration Settings via environment variables or `local.env`: ```bash MODEL_CHECKPOINT=datalab-to/chandra MAX_OUTPUT_TOKENS=8192 VLLM_API_BASE=http://localhost:8000/v1 VLLM_GPUS=0 ``` ## Commercial Usage Code is Apache 2.0. Model weights use a modified OpenRAIL-M license: free for research, personal use, and startups under $2M funding/revenue. Cannot be used competitively with our API. For broader commercial licensing, see [pricing](https://www.datalab.to/pricing?utm_source=gh-chandra). ## Credits - [Huggingface Transformers](https://github.com/huggingface/transformers) - [vLLM](https://github.com/vllm-project/vllm) - [olmocr](https://github.com/allenai/olmocr) - [Qwen3 VL](https://github.com/QwenLM/Qwen3)