Shift import

Hide imports
Enable piping through params
2025-11-29 16:43:11 +00:00 · 2025-11-19 12:18:57 -05:00 · 2025-11-19 12:06:07 -05:00 · 2025-11-12 18:06:05 -05:00 · 2025-11-12 17:33:50 -05:00 · 2025-11-12 17:17:15 -05:00
17 changed files with 401 additions and 189 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,7 @@
 local.env
 experiments
 .claude
+.DS_Store

 # Byte-compiled / optimized / DLL files
 __pycache__/
--- a/59
+++ b/59
@@ -0,0 +1,59 @@
+                   AI PUBS OPEN RAIL-M LICENSE (MODIFIED)
+
+Version 0.1, March 2, 2023 (Modified)
+http://licenses.ai/
+
+PLEASE READ THESE TERMS CAREFULLY BEFORE USING THE MODEL OR A DERIVATIVE WORKS OF THE MODEL MADE AVAILABLE IN CONNECTION WITH THESE TERMS.  BY DOWNLOADING, REPRODUCING, DISTRIBUTING OR USING THE MODEL OR A DERIVATIVE WORK OF THE MODEL IN ANY MANNER, YOU (“YOU”) AGREE TO BE BOUND BY THESE TERMS (THE “AGREEMENT”) TO THE EXCLUSION OF ALL OTHER TERMS. YOU REPRESENT AND WARRANT THAT YOU HAVE THE AUTHORITY TO ENTER INTO THIS AGREEMENT; IF YOU ARE ENTERING INTO THIS AGREEMENT ON BEHALF OF AN ORGANIZATION OR ENTITY, REFERENCES TO AND “YOU” IN THIS AGREEMENT, REFER TO THAT ORGANIZATION OR ENTITY. IF YOU DO NOT AGREE TO ALL OF THE FOLLOWING, YOU MAY NOT DOWNLOAD, REPRODUCE, DISTRIBUTE OR USE THE MODEL OR A DERIVATIVE WORK OF THE MODEL IN ANY MANNER.
+ Section  I:  PREAMBLE
+This OpenRAIL-M License, as modified, is generally applicable to any machine-learning Model.
+The “Open” nomenclature indicates that the licensed Model is be freely accessible to downstream and other users.  The “RAIL” nomenclature indicates that there are use restrictions prohibiting the use of the Model. These restrictions are intended to avoid potential misuse. This License specifies that the  use restrictions in the original License must apply to such derivatives.
+NOW THEREFORE, You and Licensor agree as follows:
+1. Definitions
+(a) “Complementary Material” means the applicable source code and scripts used to define, run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if any. This includes any accompanying documentation, tutorials, examples, and any related information, if any. Complementary Material is not licensed under this License.
+(b) "Contribution" means any work, including the original version of the Model and any modifications or additions to that Model or Derivatives of the Model thereof, that is intentionally submitted to Licensor for inclusion in the Model by the rights owner or by an individual or legal entity authorized to submit on behalf of the rights owner. For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Model, but excluding communication that is conspicuously marked or otherwise designated in writing by the rights owner as "Not a Contribution."
+(c) "Contributor"  means Licensor and any individual or legal entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Model.
+(d) “Data” means a collection of information and/or content extracted from the dataset used with the Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under this License.
+(e) “Derivatives of the Model” means all modifications to the Model, works based on the Model, or any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
+(f) “Distribution” means any transmission, reproduction, publication, distribution, or other sharing of the Model or Derivatives of the Model to a third party, including providing the Model as a hosted service made available by electronic or other remote means, including but not limited to API-based or web access.
+(g) “Harm” includes but is not limited to physical, mental, psychological, financial and reputational damage, pain, or loss
+(h) "License" means the terms and conditions for use, reproduction, and Distribution as defined in this document.
+(i) “Licensor” means the rights owner or entity authorized by the rights owner that is granting the License, including the persons or entities that may have rights in the Model and/or distributing the Model.
+(j) “Model” means any accompanying machine-learning based assemblies (including checkpoints), consisting of learnt weights, parameters (including optimizer states), corresponding to the model architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or in part on the Data, using the Complementary Material.
+(k) “Output” means the results of operating a Model as embodied in informational content resulting therefrom.
+(l) “Third Parties” means individuals or legal entities that are not under common control with Licensor or You.
+(m) "You" (or "Your")  means an individual or legal entity exercising permissions granted by this License and/or making use of the Model for whichever purpose and in any field of use, including usage of the Model in an end-use application, including but not limited to a chatbot, translator, or image generator.
+                                              Section II:   INTELLECTUAL PROPERTY RIGHTS
+Both copyright and patent grants may apply to the Model and Derivatives of the Model. The Model and Derivatives of the Model are subject to additional terms as described in Section III, which shall govern the use of the Model and Derivatives of the Model even in the event Section II is held unenforceable.
+2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the Model and Derivatives of the Model.
+3. Grant of Patent License. Subject to the terms and conditions of this License and where and as applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Model and/or Derivatives of the Model where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Model or Derivatives of the Model to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Model or Derivative of the Model and/or a Contribution incorporated within the Model or Derivative of the Model constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for the Model and/or Derivative of the Model shall terminate as of the date such litigation is asserted or filed.
+Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
+4. Distribution and Redistribution. You may host the Model or Derivatives of the Model for remote access by Third Parties, including but not limited to  software-as-a-service, reproduce,  or Distribute copies of the Model or Derivatives of the Model thereof in any medium, with or without modifications, provided that You meet the conditions in this Section III:
+(a) Use-based restrictions in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (for example, a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model and Derivatives of the Model are subject to paragraph 5;
+(b) You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License;
+(c) You must cause any modified files to carry prominent notices stating that You changed the files; and
+(d) You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model or  Derivatives of the Model.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions, consistent with paragraph 4.a., for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
+5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions. Accordingly, You cannot use the Model or the Derivatives of the Model in violation of such restrictions. You may use the Model subject to this License, including only for lawful purposes and in accordance with the License. Use may include creating any content with, fine-tuning, updating, running, training, evaluating and/or re-parametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model to comply with the terms of this paragraph 5.
+6.  The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You generate using the Model. You are solely responsible for the Output you generate and its subsequent uses. No use of the Output can contravene any provision as stated in the License.
+7.  Attribution.  In connection with any Output, or use of Distribution of any Model or Derivatives of the Model, You agree to give appropriate credit and attribution to Licensor, provide a link to the original Model or Derivatives of the Model, provide a copy of this License, and identify any changes You have made to the Model or Derivatives of the Model (collectively, the “Attribution”).  The Attribution must not suggest endorsement by any Licensor.
+8.  Share-a-Like.  As a condition to the license and authorizations herein, You agree to apply this License (to the exclusion of all others) to any and all copies of the Model, Derivatives of the Model, any changes or improvements to the Model or Derivatives of the Model, and to the Output and any derivatives, changes or improvements to or of the Output.
+Section IV: OTHER PROVISIONS
+9. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model through electronic means, or cause modification to the Output resulting from updates to the Model based.
+10. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
+11. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Model (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Model and Derivatives of the Model, and assume any risks associated with Your exercise of permissions under this License.
+12. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Model (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+13. Accepting Warranty or Additional Liability. While Distributing the Model or Derivatives of the Model, You may choose to charge a fee in exchange for support, warranty, indemnity, or other obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor or Licensor, and only if You agree to indemnify, defend, and hold each Contributor and the Licensor harmless for any liability incurred by, or claims asserted against, such Contributor or Licensor by reason of your accepting any such warranty or additional liability.
+14. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
+END OF TERMS AND CONDITIONS
+
+Attachment A
+USE RESTRICTIONS
+As conditions to the Licenses set forth in this Agreement, You agree not to use, reproduce, modify, create or Distribute the Model, Derivatives of the Model, or Output (collectively, “Use”)  in any of the following ways:
+1. Legal:
+(a) In any way that violates any applicable national, federal, state, local or international law or regulation; or
+(b) to directly or indirectly infringe or misappropriate any third party intellectual property rights (including those of Licensor or any Contributor)
+2. Commercial:
+(a) for any purpose if You (your employer, or the entity you are affiliated with) generated more than two million US Dollars ($2,000,000) in gross revenue in the prior year, except where Your Use is limited to personal use or research purposes;
+(b) for any purpose if You (your employer, or the entity you are affiliated with) has raised more than two million US dollars ($2,000,000) in total equity or debt funding from any source, except where Your Use is limited to personal use or research purposes; or
+(c)  for any purpose if You (your employer, or the entity you are affiliated with) provides or otherwise makes available any product or service that competes with any product or service offered by or made available by Licensor or any of its affiliates.
+Commercial and broader use licenses may be available from Licensor at the following URL: https://www.datalab.to/
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Chandra

-Chandra is an OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.
+Chandra is a highly accurate OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving layout information.

 ## Features

@@ -65,6 +65,10 @@ See full scores [below](#benchmark-table).
 | Other | Transcript | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/other/transcript.png) |
 | Other | Flowchart | [View](https://github.com/datalab-to/chandra/blob/master/assets/examples/other/flowchart.png) |

+## Community
+
+[Discord](https://discord.gg//KuZwXNGnfH) is where we discuss future development.
+
 ## Installation

 ### Package
@@ -73,6 +77,8 @@ See full scores [below](#benchmark-table).
 pip install chandra-ocr
 ```

+If you're going to use the huggingface method, we also recommend installing [flash attention](https://github.com/Dao-AILab/flash-attention).
+
 ### From Source

 ```bash
@@ -152,24 +158,25 @@ VLLM_MODEL_NAME=chandra
 VLLM_GPUS=0
 ```

-## Benchmark table
-
-| **Model** |  ArXiv   | Old Scans Math |  Tables  | Old Scans | Headers and Footers | Multi column | Long tiny text | Base |    Overall     | Source |
-|:----------|:--------:|:--------------:|:--------:|:---------:|:-------------------:|:------------:|:--------------:|:----:|:--------------:|:------:|
-| Datalab Chandra v0.1.0 |   82.2   | **80.3** | **88.0** | **50.4**  |        90.8         |     81.2     |    **92.3**    | **99.9** | **83.1 ± 0.9** | Own benchmarks |
-| Datalab Marker v1.10.0 | **83.8** | 69.7 |   74.8   |   32.3    |        86.6         |     79.4     |      85.7      | 99.6 |   76.5 ± 1.0   | Own benchmarks |
-| Mistral OCR API |   77.2   | 67.5 |   60.6   |   29.3    |        93.6         |     71.3     |      77.1      | 99.4 |   72.0 ± 1.1   | olmocr repo |
-| Deepseek OCR |   75.2   | 72.3 |   79.7   |   33.3    |        96.1         |     66.7     |      80.1      | 99.7 |   75.4 ± 1.0   | Own benchmarks |
-| GPT-4o (Anchored) |   53.5   | 74.5 |   70.0   |   40.7    |        93.8         |     69.3     |      60.6      | 96.8 |   69.9 ± 1.1   | olmocr repo |
-| Gemini Flash 2 (Anchored) |   54.5   | 56.1 |   72.1   |   34.2    |        64.7         |     61.5     |      71.5      | 95.6 |   63.8 ± 1.2   | olmocr repo |
-| Qwen 3 VL |   70.2   | 75.1 |   45.6   |   37.5    |        89.1         |     62.1     |      43.0      | 94.3 |   64.6 ± 1.1   | Own benchmarks |
-| olmOCR v0.3.0 |   78.6   | 79.9 |   72.9   |   43.9    |      **95.1**       |     77.3     |      81.2      | 98.9 |   78.5 ± 1.1   | olmocr repo |
-| dots.ocr |   82.1   | 64.2 |   88.3   |   40.9    |        94.1         |   **82.4**   |      81.2      | 99.5 |   79.1 ± 1.0   | dots.ocr repo |
-
 # Commercial usage

 This code is Apache 2.0, and our model weights use a modified OpenRAIL-M license (free for research, personal use, and startups under $2M funding/revenue, cannot be used competitively with our API). To remove the OpenRAIL license requirements, or for broader commercial licensing, visit our pricing page [here](https://www.datalab.to/pricing?utm_source=gh-chandra).

+# Benchmark table
+
+| **Model**                 |  ArXiv   | Old Scans Math |  Tables  | Old Scans | Headers and Footers | Multi column | Long tiny text | Base |    Overall     | Source |
+|:--------------------------|:--------:|:--------------:|:--------:|:---------:|:-------------------:|:------------:|:--------------:|:----:|:--------------:|:------:|
+| Datalab Chandra v0.1.0    |   82.2   | **80.3** | **88.0** | **50.4**  |        90.8         |     81.2     |    **92.3**    | **99.9** | **83.1 ± 0.9** | Own benchmarks |
+| Datalab Marker v1.10.0    | **83.8** | 69.7 |   74.8   |   32.3    |        86.6         |     79.4     |      85.7      | 99.6 |   76.5 ± 1.0   | Own benchmarks |
+| Mistral OCR API           |   77.2   | 67.5 |   60.6   |   29.3    |        93.6         |     71.3     |      77.1      | 99.4 |   72.0 ± 1.1   | olmocr repo |
+| Deepseek OCR              |   75.2   | 72.3 |   79.7   |   33.3    |        96.1         |     66.7     |      80.1      | 99.7 |   75.4 ± 1.0   | Own benchmarks |
+| GPT-4o (Anchored)         |   53.5   | 74.5 |   70.0   |   40.7    |        93.8         |     69.3     |      60.6      | 96.8 |   69.9 ± 1.1   | olmocr repo |
+| Gemini Flash 2 (Anchored) |   54.5   | 56.1 |   72.1   |   34.2    |        64.7         |     61.5     |      71.5      | 95.6 |   63.8 ± 1.2   | olmocr repo |
+| Qwen 3 VL 8B              |   70.2   | 75.1 |   45.6   |   37.5    |        89.1         |     62.1     |      43.0      | 94.3 |   64.6 ± 1.1   | Own benchmarks |
+| olmOCR v0.3.0             |   78.6   | 79.9 |   72.9   |   43.9    |      **95.1**       |     77.3     |      81.2      | 98.9 |   78.5 ± 1.1   | olmocr repo |
+| dots.ocr                  |   82.1   | 64.2 |   88.3   |   40.9    |        94.1         |   **82.4**   |      81.2      | 99.5 |   79.1 ± 1.0   | dots.ocr repo |
+
+
 # Credits

 Thank you to the following open source projects:
--- a/chandra/input.py
+++ b/chandra/input.py
@@ -2,20 +2,48 @@ from typing import List
 import filetype
 from PIL import Image
 import pypdfium2 as pdfium
+import pypdfium2.raw as pdfium_c

 from chandra.settings import settings


-def load_pdf_images(filepath: str, page_range: List[int]):
+def flatten(page, flag=pdfium_c.FLAT_NORMALDISPLAY):
+    rc = pdfium_c.FPDFPage_Flatten(page, flag)
+    if rc == pdfium_c.FLATTEN_FAIL:
+        print(f"Failed to flatten annotations / form fields on page {page}.")
+
+
+def load_image(
+    filepath: str, min_image_dim: int = settings.MIN_IMAGE_DIM
+) -> Image.Image:
+    image = Image.open(filepath).convert("RGB")
+    if image.width < min_image_dim or image.height < min_image_dim:
+        scale = min_image_dim / min(image.width, image.height)
+        new_size = (int(image.width * scale), int(image.height * scale))
+        image = image.resize(new_size, Image.Resampling.LANCZOS)
+    return image
+
+
+def load_pdf_images(
+    filepath: str,
+    page_range: List[int],
+    image_dpi: int = settings.IMAGE_DPI,
+    min_pdf_image_dim: int = settings.MIN_PDF_IMAGE_DIM,
+) -> List[Image.Image]:
    doc = pdfium.PdfDocument(filepath)
+    doc.init_forms()
+
    images = []
    for page in range(len(doc)):
        if not page_range or page in page_range:
            page_obj = doc[page]
            min_page_dim = min(page_obj.get_width(), page_obj.get_height())
-            scale_dpi = (settings.MIN_IMAGE_DIM / min_page_dim) * 72
-            scale_dpi = max(scale_dpi, settings.IMAGE_DPI)
-            pil_image = doc[page].render(scale=scale_dpi / 72).to_pil().convert("RGB")
+            scale_dpi = (min_pdf_image_dim / min_page_dim) * 72
+            scale_dpi = max(scale_dpi, image_dpi)
+            page_obj = doc[page]
+            flatten(page_obj)
+            page_obj = doc[page]
+            pil_image = page_obj.render(scale=scale_dpi / 72).to_pil().convert("RGB")
            images.append(pil_image)

    doc.close()
@@ -44,5 +72,5 @@ def load_file(filepath: str, config: dict):
    if input_type and input_type.extension == "pdf":
        images = load_pdf_images(filepath, page_range)
    else:
-        images = [Image.open(filepath).convert("RGB")]
-    return images
+        images = [load_image(filepath)]
+    return images
--- a/chandra/model/init.py
+++ b/chandra/model/init.py
@@ -4,6 +4,7 @@ from chandra.model.hf import load_model, generate_hf
 from chandra.model.schema import BatchInputItem, BatchOutputItem
 from chandra.model.vllm import generate_vllm
 from chandra.output import parse_markdown, parse_html, parse_chunks, extract_images
+from chandra.settings import settings


 class InferenceManager:
@@ -26,19 +27,29 @@ class InferenceManager:
            output_kwargs["include_headers_footers"] = kwargs.pop(
                "include_headers_footers"
            )
+        bbox_scale = kwargs.pop("bbox_scale", settings.BBOX_SCALE)
+        vllm_api_base = kwargs.pop("vllm_api_base", settings.VLLM_API_BASE)

        if self.method == "vllm":
            results = generate_vllm(
-                batch, max_output_tokens=max_output_tokens, **kwargs
+                batch,
+                max_output_tokens=max_output_tokens,
+                bbox_scale=bbox_scale,
+                vllm_api_base=vllm_api_base,
+                **kwargs,
            )
        else:
            results = generate_hf(
-                batch, self.model, max_output_tokens=max_output_tokens, **kwargs
+                batch,
+                self.model,
+                max_output_tokens=max_output_tokens,
+                bbox_scale=bbox_scale,
+                **kwargs,
            )

        output = []
        for result, input_item in zip(results, batch):
-            chunks = parse_chunks(result.raw, input_item.image)
+            chunks = parse_chunks(result.raw, input_item.image, bbox_scale=bbox_scale)
            output.append(
                BatchOutputItem(
                    markdown=parse_markdown(result.raw, **output_kwargs),
@@ -48,6 +59,7 @@ class InferenceManager:
                    page_box=[0, 0, input_item.image.width, input_item.image.height],
                    token_count=result.token_count,
                    images=extract_images(result.raw, chunks, input_item.image),
+                    error=result.error,
                )
            )
        return output
--- a/chandra/model/hf.py
+++ b/chandra/model/hf.py
@@ -1,8 +1,5 @@
 from typing import List

-from qwen_vl_utils import process_vision_info
-from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
-
 from chandra.model.schema import BatchInputItem, GenerationResult
 from chandra.model.util import scale_to_fit
 from chandra.prompts import PROMPT_MAPPING
@@ -10,12 +7,20 @@ from chandra.settings import settings


 def generate_hf(
-    batch: List[BatchInputItem], model, max_output_tokens=None, **kwargs
+    batch: List[BatchInputItem],
+    model,
+    max_output_tokens=None,
+    bbox_scale: int = settings.BBOX_SCALE,
+    **kwargs,
 ) -> List[GenerationResult]:
+    from qwen_vl_utils import process_vision_info
+
    if max_output_tokens is None:
        max_output_tokens = settings.MAX_OUTPUT_TOKENS

-    messages = [process_batch_element(item, model.processor) for item in batch]
+    messages = [
+        process_batch_element(item, model.processor, bbox_scale) for item in batch
+    ]
    text = model.processor.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
@@ -48,12 +53,12 @@ def generate_hf(
    return results


-def process_batch_element(item: BatchInputItem, processor):
+def process_batch_element(item: BatchInputItem, processor, bbox_scale: int):
    prompt = item.prompt
    prompt_type = item.prompt_type

    if not prompt:
-        prompt = PROMPT_MAPPING[prompt_type]
+        prompt = PROMPT_MAPPING[prompt_type].replace("{bbox_scale}", str(bbox_scale))

    content = []
    image = scale_to_fit(item.image)  # Guarantee max size
@@ -65,14 +70,22 @@ def process_batch_element(item: BatchInputItem, processor):


 def load_model():
+    import torch
+    from transformers import Qwen3VLForConditionalGeneration, Qwen3VLProcessor
+
    device_map = "auto"
    if settings.TORCH_DEVICE:
        device_map = {"": settings.TORCH_DEVICE}
+
+    kwargs = {
+        "dtype": torch.bfloat16,
+        "device_map": device_map,
+    }
+    if settings.TORCH_ATTN:
+        kwargs["attn_implementation"] = settings.TORCH_ATTN
+
    model = Qwen3VLForConditionalGeneration.from_pretrained(
-        settings.MODEL_CHECKPOINT,
-        dtype=settings.TORCH_DTYPE,
-        device_map=device_map,
-        attn_implementation=settings.TORCH_ATTN_IMPLEMENTATION,
+        settings.MODEL_CHECKPOINT, **kwargs
    )
    model = model.eval()
    processor = Qwen3VLProcessor.from_pretrained(settings.MODEL_CHECKPOINT)
--- a/chandra/model/schema.py
+++ b/chandra/model/schema.py
@@ -27,3 +27,4 @@ class BatchOutputItem:
    page_box: List[int]
    token_count: int
    images: dict
+    error: bool
--- a/chandra/model/util.py
+++ b/chandra/model/util.py
@@ -43,7 +43,11 @@ def scale_to_fit(


 def detect_repeat_token(
-    predicted_tokens: str, max_repeats: int = 4, window_size: int = 500, cut_from_end: int = 0
+    predicted_tokens: str,
+    base_max_repeats: int = 4,
+    window_size: int = 500,
+    cut_from_end: int = 0,
+    scaling_factor: float = 3.0,
 ):
    try:
        predicted_tokens = parse_markdown(predicted_tokens)
@@ -54,11 +58,13 @@ def detect_repeat_token(
    if cut_from_end > 0:
        predicted_tokens = predicted_tokens[:-cut_from_end]

-    # Try different sequence lengths (1 to window_size//2)
    for seq_len in range(1, window_size // 2 + 1):
        # Extract the potential repeating sequence from the end
        candidate_seq = predicted_tokens[-seq_len:]

+        # Inverse scaling: shorter sequences need more repeats
+        max_repeats = int(base_max_repeats * (1 + scaling_factor / seq_len))
+
        # Count how many times this sequence appears consecutively at the end
        repeat_count = 0
        pos = len(predicted_tokens) - seq_len
@@ -72,12 +78,7 @@ def detect_repeat_token(
            else:
                break

-        # If we found more than max_repeats consecutive occurrences
        if repeat_count > max_repeats:
            return True

    return False
-
-
-def layout_failed(predicted_tokens: str, image: Image.Image):
-    pass
--- a/chandra/model/vllm.py
+++ b/chandra/model/vllm.py
@@ -1,5 +1,6 @@
 import base64
 import io
+import time
 from concurrent.futures import ThreadPoolExecutor
 from itertools import repeat
 from typing import List
@@ -25,10 +26,15 @@ def generate_vllm(
    max_output_tokens: int = None,
    max_retries: int = None,
    max_workers: int | None = None,
+    custom_headers: dict | None = None,
+    max_failure_retries: int | None = None,
+    bbox_scale: int = settings.BBOX_SCALE,
+    vllm_api_base: str = settings.VLLM_API_BASE,
 ) -> List[GenerationResult]:
    client = OpenAI(
        api_key=settings.VLLM_API_KEY,
-        base_url=settings.VLLM_API_BASE,
+        base_url=vllm_api_base,
+        default_headers=custom_headers,
    )
    model_name = settings.VLLM_MODEL_NAME

@@ -50,7 +56,9 @@ def generate_vllm(
    ) -> GenerationResult:
        prompt = item.prompt
        if not prompt:
-            prompt = PROMPT_MAPPING[item.prompt_type]
+            prompt = PROMPT_MAPPING[item.prompt_type].replace(
+                "{bbox_scale}", str(bbox_scale)
+            )

        content = []
        image = scale_to_fit(item.image)
@@ -68,41 +76,68 @@ def generate_vllm(
            completion = client.chat.completions.create(
                model=model_name,
                messages=[{"role": "user", "content": content}],
-                max_tokens=settings.MAX_OUTPUT_TOKENS,
+                max_tokens=max_output_tokens,
                temperature=temperature,
                top_p=top_p,
            )
+            raw = completion.choices[0].message.content
+            result = GenerationResult(
+                raw=raw,
+                token_count=completion.usage.completion_tokens,
+                error=False,
+            )
        except Exception as e:
            print(f"Error during VLLM generation: {e}")
            return GenerationResult(raw="", token_count=0, error=True)

-        return GenerationResult(
-            raw=completion.choices[0].message.content,
-            token_count=completion.usage.completion_tokens,
-            error=False,
-        )
+        return result

-    def process_item(item, max_retries):
+    def process_item(item, max_retries, max_failure_retries=None):
        result = _generate(item)
        retries = 0

-        while retries < max_retries and (
-            detect_repeat_token(result.raw)
-            or (
-                len(result.raw) > 50
-                and detect_repeat_token(result.raw, cut_from_end=50)
-            )
-            or result.error
-        ):
-            print(
-                f"Detected repeat token or error, retrying generation (attempt {retries + 1})..."
-            )
+        while _should_retry(result, retries, max_retries, max_failure_retries):
            result = _generate(item, temperature=0.3, top_p=0.95)
            retries += 1

        return result

+    def _should_retry(result, retries, max_retries, max_failure_retries):
+        has_repeat = detect_repeat_token(result.raw) or (
+            len(result.raw) > 50 and detect_repeat_token(result.raw, cut_from_end=50)
+        )
+
+        if retries < max_retries and has_repeat:
+            print(
+                f"Detected repeat token, retrying generation (attempt {retries + 1})..."
+            )
+            return True
+
+        if retries < max_retries and result.error:
+            print(
+                f"Detected vllm error, retrying generation (attempt {retries + 1})..."
+            )
+            time.sleep(2 * (retries + 1))  # Sleeping can help under load
+            return True
+
+        if (
+            result.error
+            and max_failure_retries is not None
+            and retries < max_failure_retries
+        ):
+            print(
+                f"Detected vllm error, retrying generation (attempt {retries + 1})..."
+            )
+            time.sleep(2 * (retries + 1))  # Sleeping can help under load
+            return True
+
+        return False
+
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
-        results = list(executor.map(process_item, batch, repeat(max_retries)))
+        results = list(
+            executor.map(
+                process_item, batch, repeat(max_retries), repeat(max_failure_retries)
+            )
+        )

    return results
--- a/chandra/output.py
+++ b/chandra/output.py
@@ -6,9 +6,11 @@ from functools import lru_cache

 import six
 from PIL import Image
-from bs4 import BeautifulSoup, NavigableString
+from bs4 import BeautifulSoup
 from markdownify import MarkdownConverter, re_whitespace

+from chandra.settings import settings
+

@lru_cache
 def _hash_html(html: str):
@@ -30,7 +32,11 @@ def extract_images(html: str, chunks: dict, image: Image.Image):
            if not img:
                continue
            bbox = chunk["bbox"]
-            block_image = image.crop(bbox)
+            try:
+                block_image = image.crop(bbox)
+            except ValueError:
+                # Happens when bbox coordinates are invalid
+                continue
            img_name = get_image_name(html, div_idx)
            images[img_name] = block_image
    return images
@@ -67,44 +73,22 @@ def parse_html(
            else:
                img = BeautifulSoup(f"<img src='{img_src}'/>", "html.parser")
                div.append(img)
+
+        # Wrap text content in <p> tags if no inner HTML tags exist
+        if label in ["Text"] and not re.search(
+            "<.+>", str(div.decode_contents()).strip()
+        ):
+            # Add inner p tags if missing for text blocks
+            text_content = str(div.decode_contents()).strip()
+            text_content = f"<p>{text_content}</p>"
+            div.clear()
+            div.append(BeautifulSoup(text_content, "html.parser"))
+
        content = str(div.decode_contents())
        out_html += content
    return out_html


-def escape_dollars(text):
-    return text.replace("$", r"\$")
-
-
-def get_formatted_table_text(element):
-    text = []
-    for content in element.contents:
-        if content is None:
-            continue
-
-        if isinstance(content, NavigableString):
-            stripped = content.strip()
-            if stripped:
-                text.append(escape_dollars(stripped))
-        elif content.name == "br":
-            text.append("<br>")
-        elif content.name == "math":
-            text.append("$" + content.text + "$")
-        else:
-            content_str = escape_dollars(str(content))
-            text.append(content_str)
-
-    full_text = ""
-    for i, t in enumerate(text):
-        if t == "<br>":
-            full_text += t
-        elif i > 0 and text[i - 1] != "<br>":
-            full_text += " " + t
-        else:
-            full_text += t
-    return full_text
-
-
 class Markdownify(MarkdownConverter):
    def __init__(
        self,
@@ -204,19 +188,25 @@ class LayoutBlock:
    content: str


-def parse_layout(html: str, image: Image.Image):
+def parse_layout(html: str, image: Image.Image, bbox_scale=settings.BBOX_SCALE):
    soup = BeautifulSoup(html, "html.parser")
    top_level_divs = soup.find_all("div", recursive=False)
    width, height = image.size
-    width_scaler = width / 1024
-    height_scaler = height / 1024
+    width_scaler = width / bbox_scale
+    height_scaler = height / bbox_scale
    layout_blocks = []
    for div in top_level_divs:
        bbox = div.get("data-bbox")
+
        try:
            bbox = json.loads(bbox)
+            assert len(bbox) == 4, "Invalid bbox length"
        except Exception:
-            bbox = [0, 0, 1, 1]  # Fallback to a default bbox if parsing fails
+            try:
+                bbox = bbox.split(" ")
+                assert len(bbox) == 4, "Invalid bbox length"
+            except Exception:
+                bbox = [0, 0, 1, 1]

        bbox = list(map(int, bbox))
        # Normalize bbox
@@ -232,7 +222,7 @@ def parse_layout(html: str, image: Image.Image):
    return layout_blocks


-def parse_chunks(html: str, image: Image.Image):
-    layout = parse_layout(html, image)
+def parse_chunks(html: str, image: Image.Image, bbox_scale=settings.BBOX_SCALE):
+    layout = parse_layout(html, image, bbox_scale=bbox_scale)
    chunks = [asdict(block) for block in layout]
    return chunks
--- a/chandra/prompts.py
+++ b/chandra/prompts.py
@@ -65,7 +65,7 @@ Guidelines:
 """.strip()

 OCR_LAYOUT_PROMPT = f"""
-OCR this image to HTML, arranged as layout blocks.  Each layout block should be a div with the data-bbox attribute representing the bounding box of the block in [x0, y0, x1, y1] format.  Bboxes are normalized 0-1024. The data-label attribute is the label for the block.
+OCR this image to HTML, arranged as layout blocks.  Each layout block should be a div with the data-bbox attribute representing the bounding box of the block in [x0, y0, x1, y1] format.  Bboxes are normalized 0-{{bbox_scale}}. The data-label attribute is the label for the block.

 Use the following labels:
 - Caption
--- a/chandra/scripts/cli.py
+++ b/chandra/scripts/cli.py
@@ -87,7 +87,7 @@ def save_merged_output(

        # Save extracted images if requested
        if save_images and result.images:
-            images_dir = file_output_dir / "images"
+            images_dir = file_output_dir
            images_dir.mkdir(exist_ok=True)

            for img_name, pil_image in result.images.items():
@@ -172,7 +172,7 @@ def save_merged_output(
@click.option(
    "--batch-size",
    type=int,
-    default=1,
+    default=None,
    help="Number of pages to process in a batch.",
 )
@click.option(
@@ -194,6 +194,16 @@ def main(
    batch_size: int,
    paginate_output: bool,
 ):
+    if method == "hf":
+        click.echo(
+            "When using '--method hf', ensure that the batch size is set correctly.  We will default to batch size of 1."
+        )
+        if batch_size is None:
+            batch_size = 1
+    elif method == "vllm":
+        if batch_size is None:
+            batch_size = 28
+
    click.echo("Chandra CLI - Starting OCR processing")
    click.echo(f"Input: {input_path}")
    click.echo(f"Output: {output_path}")
--- a/chandra/scripts/screenshot_app.py
+++ b/chandra/scripts/screenshot_app.py
@@ -143,6 +143,7 @@ def process():
                "image_height": img_height,
                "blocks": blocks_data,
                "html": html_with_images,
+                "markdown": result.markdown,
            }
        )

--- a/chandra/scripts/templates/screenshot.html
+++ b/chandra/scripts/templates/screenshot.html
@@ -64,6 +64,20 @@
            cursor: not-allowed;
        }

+        .controls label {
+            display: flex;
+            align-items: center;
+            gap: 8px;
+            color: white;
+            font-size: 14px;
+            cursor: pointer;
+            user-select: none;
+        }
+
+        .controls input[type="checkbox"] {
+            cursor: pointer;
+        }
+
        .loading {
            display: none;
            color: #f39c12;
@@ -75,6 +89,11 @@
            font-weight: bold;
        }

+        .success {
+            color: #27ae60;
+            font-weight: bold;
+        }
+
        .screenshot-container {
            display: none;
            margin-top: 60px;
@@ -88,8 +107,18 @@
            display: flex;
        }

-        .left-panel, .right-panel {
-            flex: 1;
+        .left-panel {
+            flex: 0 0 40%;
+            display: flex;
+            flex-direction: column;
+            background: white;
+            border-radius: 8px;
+            overflow: hidden;
+            box-shadow: 0 4px 12px rgba(0,0,0,0.3);
+        }
+
+        .right-panel {
+            flex: 0 0 60%;
            display: flex;
            flex-direction: column;
            background: white;
@@ -137,6 +166,7 @@
            padding: 30px;
            line-height: 1.6;
            color: #333;
+            font-size: 24px;
        }

        .markdown-content h1, .markdown-content h2, .markdown-content h3 {
@@ -215,8 +245,14 @@
        <input type="text" id="filePath" placeholder="Enter file path (e.g., /path/to/document.pdf)">
        <input type="number" id="pageNumber" placeholder="Page" value="0" min="0">
        <button id="processBtn" onclick="processFile()">Process</button>
+        <label>
+            <input type="checkbox" id="showLayoutBoxes" checked onchange="toggleLayoutBoxes()">
+            Show Layout Boxes
+        </label>
+        <button id="copyMarkdownBtn" onclick="copyMarkdown()" style="display: none;">Copy Markdown</button>
        <span class="loading" id="loading">Processing...</span>
        <span class="error" id="error"></span>
+        <span class="success" id="success"></span>
    </div>

    <div class="screenshot-container" id="container">
@@ -242,6 +278,11 @@
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.8.1/github-markdown.min.css" integrity="sha512-BrOPA520KmDMqieeM7XFe6a3u3Sb3F1JBaQnrIAmWg3EYrciJ+Qqe6ZcKCdfPv26rGcgTrJnZ/IdQEct8h3Zhw==" crossorigin="anonymous" referrerpolicy="no-referrer" />
    <script>
+        // Global state to store markdown and canvas data
+        let currentMarkdown = null;
+        let currentData = null;
+        let currentImageSrc = null;
+
        async function processFile() {
            const filePath = document.getElementById('filePath').value;
            const pageNumber = parseInt(document.getElementById('pageNumber').value) || 0;
@@ -285,6 +326,10 @@
        }

        function renderResults(data) {
+            // Store data for toggle functionality
+            currentData = data;
+            currentImageSrc = data.image_base64;
+
            const canvas = document.getElementById('layoutCanvas');
            const ctx = canvas.getContext('2d');
            const markdownContent = document.getElementById('markdownContent');
@@ -292,51 +337,14 @@
            // Draw image with layout overlays
            const img = new Image();
            img.onload = function() {
-                canvas.width = data.image_width;
-                canvas.height = data.image_height;
-
-                // Draw image
-                ctx.drawImage(img, 0, 0, data.image_width, data.image_height);
-
-                // Draw layout blocks
-                ctx.lineWidth = 3;
-                ctx.font = 'bold 14px -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif';
-
-                const labelCounts = {};
-                data.blocks.forEach((block) => {
-                    const [x1, y1, x2, y2] = block.bbox;
-                    const width = x2 - x1;
-                    const height = y2 - y1;
-
-                    // Draw rectangle with semi-transparent fill
-                    ctx.strokeStyle = block.color;
-                    ctx.fillStyle = block.color + '33';
-                    ctx.fillRect(x1, y1, width, height);
-                    ctx.strokeRect(x1, y1, width, height);
-
-                    // Count labels for unique identification
-                    labelCounts[block.label] = (labelCounts[block.label] || 0) + 1;
-                    const labelWithCount = `${block.label} #${labelCounts[block.label]}`;
-
-                    // Draw label with background
-                    const textMetrics = ctx.measureText(labelWithCount);
-                    const textWidth = textMetrics.width;
-                    const textHeight = 16;
-                    const padding = 6;
-
-                    const labelX = x1;
-                    const labelY = Math.max(y1 - textHeight - padding, textHeight);
-
-                    ctx.fillStyle = block.color;
-                    ctx.fillRect(labelX, labelY - textHeight, textWidth + padding * 2, textHeight + padding);
-
-                    ctx.fillStyle = 'white';
-                    ctx.textBaseline = 'top';
-                    ctx.fillText(labelWithCount, labelX + padding, labelY - textHeight + padding/2);
-                });
+                drawCanvas(img, data, ctx);
            };
            img.src = data.image_base64;

+            // Store markdown and show copy button
+            currentMarkdown = data.markdown;
+            document.getElementById('copyMarkdownBtn').style.display = 'inline-block';
+
            // Render HTML directly (with images embedded)
            markdownContent.innerHTML = data.html;

@@ -362,6 +370,85 @@
            });
        }

+        function drawCanvas(img, data, ctx) {
+            const canvas = document.getElementById('layoutCanvas');
+            canvas.width = data.image_width;
+            canvas.height = data.image_height;
+
+            // Draw image
+            ctx.drawImage(img, 0, 0, data.image_width, data.image_height);
+
+            // Check if layout boxes should be shown
+            const showBoxes = document.getElementById('showLayoutBoxes').checked;
+            if (!showBoxes) return;
+
+            // Draw layout blocks
+            ctx.lineWidth = 3;
+            ctx.font = 'bold 14px -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif';
+
+            const labelCounts = {};
+            data.blocks.forEach((block) => {
+                const [x1, y1, x2, y2] = block.bbox;
+                const width = x2 - x1;
+                const height = y2 - y1;
+
+                // Draw rectangle with semi-transparent fill
+                ctx.strokeStyle = block.color;
+                ctx.fillStyle = block.color + '33';
+                ctx.fillRect(x1, y1, width, height);
+                ctx.strokeRect(x1, y1, width, height);
+
+                // Count labels for unique identification
+                labelCounts[block.label] = (labelCounts[block.label] || 0) + 1;
+                const labelWithCount = `${block.label} #${labelCounts[block.label]}`;
+
+                // Draw label with background
+                const textMetrics = ctx.measureText(labelWithCount);
+                const textWidth = textMetrics.width;
+                const textHeight = 16;
+                const padding = 6;
+
+                const labelX = x1;
+                const labelY = Math.max(y1 - textHeight - padding, textHeight);
+
+                ctx.fillStyle = block.color;
+                ctx.fillRect(labelX, labelY - textHeight, textWidth + padding * 2, textHeight + padding);
+
+                ctx.fillStyle = 'white';
+                ctx.textBaseline = 'top';
+                ctx.fillText(labelWithCount, labelX + padding, labelY - textHeight + padding/2);
+            });
+        }
+
+        function toggleLayoutBoxes() {
+            if (!currentData || !currentImageSrc) return;
+
+            const canvas = document.getElementById('layoutCanvas');
+            const ctx = canvas.getContext('2d');
+            const img = new Image();
+            img.onload = function() {
+                drawCanvas(img, currentData, ctx);
+            };
+            img.src = currentImageSrc;
+        }
+
+        function copyMarkdown() {
+            if (!currentMarkdown) {
+                document.getElementById('error').textContent = 'No markdown to copy';
+                return;
+            }
+
+            navigator.clipboard.writeText(currentMarkdown).then(() => {
+                const success = document.getElementById('success');
+                success.textContent = 'Markdown copied!';
+                setTimeout(() => {
+                    success.textContent = '';
+                }, 2000);
+            }).catch((err) => {
+                document.getElementById('error').textContent = 'Failed to copy: ' + err.message;
+            });
+        }
+
        // Allow Enter key to trigger processing
        document.getElementById('filePath').addEventListener('keypress', function(e) {
            if (e.key === 'Enter') processFile();
--- a/chandra/scripts/vllm.py
+++ b/chandra/scripts/vllm.py
@@ -17,8 +17,6 @@ def main():
        "-v",
        f"{os.path.expanduser('~')}/.cache/huggingface:/root/.cache/huggingface",
        "--env",
-        f"HUGGING_FACE_HUB_TOKEN={os.getenv('HF_TOKEN')}",
-        "--env",
        "VLLM_ATTENTION_BACKEND=TORCH_SDPA",
        "-p",
        "8000:8000",
--- a/chandra/settings.py
+++ b/chandra/settings.py
@@ -1,7 +1,5 @@
 from dotenv import find_dotenv
-from pydantic import computed_field
 from pydantic_settings import BaseSettings
-import torch
 import os


@@ -9,11 +7,13 @@ class Settings(BaseSettings):
    # Paths
    BASE_DIR: str = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    IMAGE_DPI: int = 192
-    MIN_IMAGE_DIM: int = 1024
+    MIN_PDF_IMAGE_DIM: int = 1024
+    MIN_IMAGE_DIM: int = 1536
    MODEL_CHECKPOINT: str = "datalab-to/chandra"
    TORCH_DEVICE: str | None = None
-    MAX_OUTPUT_TOKENS: int = 8192
+    MAX_OUTPUT_TOKENS: int = 12384
    TORCH_ATTN: str | None = None
+    BBOX_SCALE: int = 1024

    # vLLM server settings
    VLLM_API_KEY: str = "EMPTY"
@@ -22,37 +22,6 @@ class Settings(BaseSettings):
    VLLM_GPUS: str = "0"
    MAX_VLLM_RETRIES: int = 6

-    # Transformers settings
-    @computed_field
-    @property
-    def TORCH_DEVICE_MODEL(self) -> str:
-        if self.TORCH_DEVICE is not None:
-            return self.TORCH_DEVICE
-
-        if torch.cuda.is_available():
-            return "cuda"
-
-        if torch.backends.mps.is_available():
-            return "mps"
-
-        return "cpu"
-
-    @computed_field
-    @property
-    def TORCH_DTYPE(self) -> torch.dtype:
-        return torch.bfloat16
-
-    @computed_field
-    @property
-    def TORCH_ATTN_IMPLEMENTATION(self) -> str:
-        if self.TORCH_ATTN is not None:
-            return self.TORCH_ATTN
-
-        if self.TORCH_DEVICE_MODEL == "cuda":
-            return "flash_attention_2"
-        else:
-            return "sdpa"
-
    class Config:
        env_file = find_dotenv("local.env")
        extra = "ignore"
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "chandra-ocr"
-version = "0.1.1"
+version = "0.1.9"
 description = "OCR model that converts documents to markdown, HTML, or JSON."
 readme = "README.md"
 requires-python = ">=3.10"
Author	SHA1	Message	Date
Vik Paruchuri	c049e7524f	Shift import	2025-11-19 12:18:57 -05:00
Vik Paruchuri	b96eb84094	Hide imports	2025-11-19 12:06:07 -05:00
Vik Paruchuri	0f5f3d485c	Enable piping through params	2025-11-12 18:06:05 -05:00
Vik Paruchuri	1bab4bf73a	fix issue with pop	2025-11-12 17:33:50 -05:00
Vik Paruchuri	34f825351c	Enable passing bbox scale	2025-11-12 17:17:15 -05:00
Vik Paruchuri	068db0311e	Add a small sleep	2025-11-12 16:06:02 -05:00
Vik Paruchuri	aafbb70ce8	Merge pull request #36 from datalab-to/vik/bbox Fix retry settings	2025-11-12 16:03:10 -05:00
Vik Paruchuri	22639087e7	Fix retry settings	2025-11-12 16:01:43 -05:00
Vik Paruchuri	910bcf100f	Merge pull request #31 from datalab-to/vik/bbox Vik/bbox	2025-11-10 11:36:50 -05:00
Vik Paruchuri	3958707a80	Support multiple formats	2025-11-10 11:12:00 -05:00
Vik Paruchuri	fe28f26fc2	Adjust bbox format	2025-11-07 13:18:38 -05:00
Vik Paruchuri	4470243560	Merge remote-tracking branch 'origin/dev' into dev	2025-11-05 13:46:20 -05:00
Vik Paruchuri	a3889b12fb	bbox scale	2025-11-05 13:45:59 -05:00
Vik Paruchuri	d69d18d6e8	Merge pull request #24 from datalab-to/tokens fix: respect max output tokens	2025-11-04 13:19:06 -05:00
Zach Nussbaum	d1cde9b608	fix: respect max output tokens	2025-11-04 13:16:57 -05:00
Vik Paruchuri	aabfed2ed3	Fix max repeats	2025-11-03 17:11:51 -05:00
Vik Paruchuri	4b01146865	Support different bbox format	2025-10-30 20:06:45 -04:00
Vik Paruchuri	7cf96f3911	Enable passing custom headers	2025-10-30 10:21:11 -04:00
Vik Paruchuri	607205211a	Improve robustness	2025-10-29 18:16:40 -04:00
Vik Paruchuri	358358134e	Fix lanczos	2025-10-26 10:38:04 -04:00
Vik Paruchuri	2d2d7ab331	Change image rendering	2025-10-26 10:27:49 -04:00
Vik Paruchuri	528b58c16f	Track errors properly	2025-10-23 16:55:16 -04:00
Vik Paruchuri	5acfd8dc6a	Patch image behavior	2025-10-23 12:19:41 -04:00
Vik Paruchuri	17d49eec2e	Flatten in annotation	2025-10-22 09:16:12 -04:00
Vik Paruchuri	0fde883a52	Add model license	2025-10-21 13:35:24 -04:00
Vik Paruchuri	47bd444f20	Code cleanup	2025-10-21 12:11:37 -04:00
Vik Paruchuri	2151833414	Fix file output dir	2025-10-21 11:54:05 -04:00
Vik Paruchuri	8c1bfe277f	Set proper batch sizes	2025-10-21 11:43:09 -04:00
Vik Paruchuri	ad6508fbc3	Fix vllm token	2025-10-21 11:33:56 -04:00
Vik Paruchuri	2e455aeb2c	Fix attn impl	2025-10-21 11:15:29 -04:00