Vik Paruchuri aa59df2996 Inference fixes
2025-10-16 13:22:32 -04:00
2025-10-16 13:22:32 -04:00
2025-10-15 16:06:57 -04:00
2025-10-08 17:34:01 -04:00
2025-10-08 17:34:01 -04:00
2025-10-16 13:22:32 -04:00
2025-10-16 13:22:32 -04:00
2025-10-16 13:22:32 -04:00
2025-10-16 13:22:32 -04:00

Chandra

Chandra is a highly accurate OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.

Features

  • Convert documents to markdown, html, or json with detailed layout information
  • Math equation support (LaTeX)
  • Reconstructs forms, including checkboxes
  • Precise table reconstruction
  • Support for 40+ languages
  • Two inference modes: local (HuggingFace) and remote (vLLM server)

Installation

uv sync
source .venv/bin/activate

Usage

Streamlit Web App

Launch the interactive demo:

streamlit run chandra_app.py --server.fileWatcherType none --server.headless true

Inference Modes:

  • hf: Loads model locally using HuggingFace Transformers (requires GPU)
  • vllm: Connects to a running vLLM server for optimized batch inference

vLLM Server (Optional)

For production deployments or batch processing, use the vLLM server:

python scripts/start_vllm.py

This launches a Docker container with optimized inference settings. Configure via environment variables:

  • VLLM_API_BASE: Server URL (default: http://localhost:8000/v1)
  • VLLM_MODEL_NAME: Model name for the server (default: chandra)
  • VLLM_GPUS: GPU device IDs (default: 0)
  • HF_TOKEN: HuggingFace token for model access

Configuration

Settings can be configured via environment variables or a local.env file:

# Model settings
MODEL_CHECKPOINT=datalab-to/chandra-0.2.8
MAX_OUTPUT_TOKENS=8192

# vLLM settings
VLLM_API_BASE=http://localhost:8000/v1
VLLM_MODEL_NAME=chandra
VLLM_GPUS=0

Output Formats

Chandra provides three output formats:

  1. HTML: Structured HTML with layout blocks and bounding boxes
  2. Markdown: Clean, readable Markdown conversion
  3. Layout Image: Visual representation of detected layout blocks
Description
No description provided
Readme Apache-2.0 14 MiB
Languages
Python 75.2%
HTML 24.8%