ci: add spellchecker with custom vocabulary and fix typos (#268)

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
This commit is contained in:
Michele Dolfi
2025-07-15 14:17:35 +02:00
committed by GitHub
parent b922824e5b
commit 8222cf8955
11 changed files with 246 additions and 185 deletions

View File

@@ -0,0 +1,35 @@
[Dd]ocling
precommit
asgi
async
(?i)urls
uvicorn
[Ww]ebserver
keyfile
[Ww]ebsocket(s?)
[Kk]ubernetes
UI
(?i)vllm
APIs
[Ss]ubprocesses
(?i)api
Kubeflow
(?i)Jobkit
(?i)cpu
(?i)PyTorch
(?i)CUDA
(?i)NVIDIA
(?i)env
Gradio
bool
Ollama
inbody
LGTMs
Dolfi
Lysak
Nikos
Nassar
Panos
Vagenas
Staar
Livathinos

11
.github/vale.ini vendored Normal file
View File

@@ -0,0 +1,11 @@
StylesPath = styles
MinAlertLevel = suggestion
; Packages = write-good, proselint
Vocab = Docling
[*.md]
BasedOnStyles = Vale
[CHANGELOG.md]
BasedOnStyles =

View File

@@ -21,6 +21,17 @@ repos:
pass_filenames: false
language: system
files: '\.py$'
- repo: https://github.com/errata-ai/vale
rev: v3.12.0 # Use latest stable version
hooks:
- id: vale
name: vale sync
pass_filenames: false
args: [sync, "--config=.github/vale.ini"]
- id: vale
name: Spell and Style Check with Vale
args: ["--config=.github/vale.ini"]
files: \.md$
- repo: https://github.com/astral-sh/uv-pre-commit
# uv version.
rev: 0.7.13

View File

@@ -1,11 +1,11 @@
# MAINTAINERS
- Christoph Auer - [@cau-git](https://github.com/cau-git)
- Michele Dolfi - [@dolfim-ibm](https://github.com/dolfim-ibm)
- Maxim Lysak - [@maxmnemonic](https://github.com/maxmnemonic)
- Nikos Livathinos - [@nikos-livathinos](https://github.com/nikos-livathinos)
- Ahmed Nassar - [@nassarofficial](https://github.com/nassarofficial)
- Panos Vagenas - [@vagenas](https://github.com/vagenas)
- Peter Staar - [@PeterStaar-IBM](https://github.com/PeterStaar-IBM)
- Christoph Auer - [`@cau-git`](https://github.com/cau-git)
- Michele Dolfi - [`@dolfim-ibm`](https://github.com/dolfim-ibm)
- Maxim Lysak - [`@maxmnemonic`](https://github.com/maxmnemonic)
- Nikos Livathinos - [`@nikos-livathinos`](https://github.com/nikos-livathinos)
- Ahmed Nassar - [`@nassarofficial`](https://github.com/nassarofficial)
- Panos Vagenas - [`@vagenas`](https://github.com/vagenas)
- Peter Staar - [`@PeterStaar-IBM`](https://github.com/PeterStaar-IBM)
Maintainers can be contacted at [deepsearch-core@zurich.ibm.com](mailto:deepsearch-core@zurich.ibm.com).

View File

@@ -12,7 +12,7 @@ Running [Docling](https://github.com/docling-project/docling) as an API service.
- Learning how to [configure the webserver](./docs/configuration.md)
- Get to know all [runtime options](./docs/usage.md) of the API
- Explore usefule [deployment examples](./docs/deployment.md)
- Explore useful [deployment examples](./docs/deployment.md)
- And more
> [!NOTE] Migration to the `v1` API
@@ -62,15 +62,15 @@ Available container images:
| [`ghcr.io/docling-project/docling-serve-cu126`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu126) <br /> [`quay.io/docling-project/docling-serve-cu126`](https://quay.io/repository/docling-project/docling-serve-cu126) | Cuda 12.6 image which installs `torch` from the pytorch cu126 index. | `linux/amd64` | 8.7 GB |
| [`ghcr.io/docling-project/docling-serve-cu128`](https://github.com/docling-project/docling-serve/pkgs/container/docling-serve-cu128) <br /> [`quay.io/docling-project/docling-serve-cu128`](https://quay.io/repository/docling-project/docling-serve-cu128) | Cuda 12.8 image which installs `torch` from the pytorch cu128 index. | `linux/amd64` | 8.7 GB |
Coming soon: `docling-serve-slim` images will reduce the size by skipping the model weights download.
Coming son: `docling-serve-slim` images will reduce the size by skipping the model weights download.
### Demonstration UI
An easy to use UI is available at the `/ui` endpoint.
![ui-input.png](img/ui-input.png)
![Input controllers in the UI](img/ui-input.png)
![ui-output.png](img/ui-output.png)
![Output visualization in the UI](img/ui-output.png)
## Get help and support

View File

@@ -62,7 +62,7 @@ from docling_serve.orchestrator_factory import get_async_orchestrator
from docling_serve.response_preparation import prepare_response
from docling_serve.settings import docling_serve_settings
from docling_serve.storage import get_scratch
from docling_serve.websocker_notifier import WebsocketNotifier
from docling_serve.websocket_notifier import WebsocketNotifier
# Set up custom logging as we'll be intermixes with FastAPI/Uvicorn's logging

View File

@@ -7,7 +7,7 @@ server and the actual app-specific configurations.
> [!WARNING]
> When the server is running with `reload` or with multiple `workers`, uvicorn
> will spawn multiple subprocessed. This invalidates all the values configured
> will spawn multiple subprocesses. This invalidates all the values configured
> via the CLI command line options. Please use environment variables in this
> type of deployments.
@@ -36,7 +36,7 @@ THe following table describes the options to configure the Docling Serve app.
| CLI option | ENV | Default | Description |
| -----------|-----|---------|-------------|
| `--artifacts-path` | `DOCLING_SERVE_ARTIFACTS_PATH` | unset | If set to a valid directory, the model weights will be loaded from this path |
| | `DOCLING_SERVE_STATIC_PATH` | unset | If set to a valid directory, the static assets for the docs and ui will be loaded from this path |
| | `DOCLING_SERVE_STATIC_PATH` | unset | If set to a valid directory, the static assets for the docs and UI will be loaded from this path |
| | `DOCLING_SERVE_SCRATCH_PATH` | | If set, this directory will be used as scratch workspace, e.g. storing the results before they get requested. If unset, a temporary created is created for this purpose. |
| `--enable-ui` | `DOCLING_SERVE_ENABLE_UI` | `false` | Enable the demonstrator UI. |
| | `DOCLING_SERVE_ENABLE_REMOTE_SERVICES` | `false` | Allow pipeline components making remote connections. For example, this is needed when using a vision-language model via APIs. |

View File

@@ -74,7 +74,7 @@ This document provides examples for pre-loading docling models to a persistent v
Manifest example: [docling-model-cache-job.yaml](./deploy-examples/docling-model-cache-job.yaml)
3. Now we can mount volume in the docling-serve deployment and set env `DOCLING_SERVE_ARTIFACTS_PATH` to point to it.
Following additions to deploymeny should be made:
Following additions to deployment should be made:
```yaml
spec:
@@ -98,6 +98,6 @@ This document provides examples for pre-loading docling models to a persistent v
Make sure that value of `DOCLING_SERVE_ARTIFACTS_PATH` is the same as where models were downloaded and where volume is mounted.
Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mouted volume.
Now when docling-serve is executing tasks, the underlying docling installation will load model weights from mounted volume.
Manifest example: [docling-model-cache-deployment.yaml](./deploy-examples/docling-model-cache-deployment.yaml)

View File

@@ -9,7 +9,7 @@ On top of the source of file (see below), both endpoints support the same parame
- `from_formats` (List[str]): Input format(s) to convert from. Allowed values: `docx`, `pptx`, `html`, `image`, `pdf`, `asciidoc`, `md`. Defaults to all formats.
- `to_formats` (List[str]): Output format(s) to convert to. Allowed values: `md`, `json`, `html`, `text`, `doctags`. Defaults to `md`.
- `pipeline` (str). The choice of which pipeline to use. Allowed values are `standard` and `vlm`. Defaults to `standard`.
- `page_range` (tuple). If speficied, only convert a range of pages. The page number starts at 1.
- `page_range` (tuple). If specified, only convert a range of pages. The page number starts at 1.
- `do_ocr` (bool): If enabled, the bitmap content will be processed using OCR. Defaults to `True`.
- `image_export_mode`: Image export mode for the document (only in case of JSON, Markdown or HTML). Allowed values: embedded, placeholder, referenced. Optional, defaults to `embedded`.
- `force_ocr` (bool): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to `False`.
@@ -25,8 +25,8 @@ On top of the source of file (see below), both endpoints support the same parame
- `do_picture_classification` (bool): If enabled, classify pictures in documents. Defaults to false.
- `do_picture_description` (bool): If enabled, describe pictures in documents. Defaults to false.
- `picture_description_area_threshold` (float): Minimum percentage of the area for a picture to be processed with the models. Defaults to 0.05.
- `picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with picture_description_api.
- `picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with picture_description_local.
- `picture_description_local` (dict): Options for running a local vision-language model in the picture description. The parameters refer to a model hosted on Hugging Face. This parameter is mutually exclusive with `picture_description_api`.
- `picture_description_api` (dict): API details for using a vision-language model in the picture description. This parameter is mutually exclusive with `picture_description_local`.
- `include_images` (bool): If enabled, images will be extracted from the document. Defaults to false.
- `images_scale` (float): Scale factor for images. Defaults to 2.0.
@@ -307,7 +307,7 @@ Example URLs are:
}
```
- `http://localhost:11434/v1/chat/completions` for the local ollama api, with example `picture_description_api`:
- `http://localhost:11434/v1/chat/completions` for the local Ollama api, with example `picture_description_api`:
- the `granite3.2-vision:2b` model
```json
@@ -355,7 +355,7 @@ The response can be a JSON Document or a File.
Both `/v1/convert/source` and `/v1/convert/file` endpoints are available as asynchronous variants.
The advantage of the asynchronous endpoints is the possible to interrupt the connection, check for the progress update and fetch the result.
This approach is more resilient against network stabilities and allows the client application logic to easily interleave conversion with other tasks.
This approach is more resilient against network instabilities and allows the client application logic to easily interleave conversion with other tasks.
Launch an asynchronous conversion with:
@@ -402,7 +402,7 @@ while task["task_status"] not in ("success", "failure"):
### Subscribe with websockets
Using websocket you can get the client application being notified about updates of the conversion task.
To start the websocker connection, use the endpoint:
To start the websocket connection, use the endpoint:
- `/v1/status/ws/{task_id}`
@@ -417,7 +417,7 @@ Websocket messages are JSON object with the following structure:
```
<details>
<summary>Example websocker usage:</summary>
<summary>Example websocket usage:</summary>
```python
from websockets.sync.client import connect

328
uv.lock generated

File diff suppressed because one or more lines are too long