Bump version to 0.1.6

logging in ASR proc. includes internal buffer duration and transcription lag
errors handling when end of transcription
2026-03-07 22:33:36 +00:00 · 2025-05-07 11:45:33 +02:00 · 2025-05-07 11:45:00 +02:00 · 2025-05-07 10:56:04 +02:00 · 2025-05-07 10:55:44 +02:00 · 2025-05-07 10:55:12 +02:00
17 changed files with 894 additions and 321 deletions
--- a/82
+++ b/82
@@ -0,0 +1,82 @@
 FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04
 ENV DEBIAN_FRONTEND=noninteractive
 ENV PYTHONUNBUFFERED=1
 WORKDIR /app
 ARG EXTRAS
 ARG HF_PRECACHE_DIR
 ARG HF_TKN_FILE
 # Install system dependencies
 #RUN apt-get update && \
 #    apt-get install -y ffmpeg git && \
 #    apt-get clean && \
 #    rm -rf /var/lib/apt/lists/*
 # 2) Install system dependencies + Python + pip
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3 \
        python3-pip \
        ffmpeg \
        git && \
    rm -rf /var/lib/apt/lists/*
 RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 COPY . .
 # Install WhisperLiveKit directly, allowing for optional dependencies
 #   Note: For gates modedls, need to add your HF toke. See README.md
 #         for more details.
 RUN if [ -n "$EXTRAS" ]; then \
      echo "Installing with extras: [$EXTRAS]"; \
      pip install --no-cache-dir .[$EXTRAS]; \
    else \
      echo "Installing base package only"; \
      pip install --no-cache-dir .; \
    fi
 # Enable in-container caching for Hugging Face models by: 
 # Note: If running multiple containers, better to map a shared
 # bucket. 
 #
 # A) Make the cache directory persistent via an anonymous volume.
 #    Note: This only persists for a single, named container. This is 
 #          only for convenience at de/test stage. 
 #          For prod, it is better to use a named volume via host mount/k8s.
 VOLUME ["/root/.cache/huggingface/hub"]
 # or
 # B) Conditionally copy a local pre-cache from the build context to the 
 #    container's cache via the HF_PRECACHE_DIR build-arg.
 #    WARNING: This will copy ALL files in the pre-cache location.
 # Conditionally copy a cache directory if provided
 RUN if [ -n "$HF_PRECACHE_DIR" ]; then \
      echo "Copying Hugging Face cache from $HF_PRECACHE_DIR"; \
      mkdir -p /root/.cache/huggingface/hub && \
      cp -r $HF_PRECACHE_DIR/* /root/.cache/huggingface/hub; \
    else \
      echo "No local Hugging Face cache specified, skipping copy"; \
    fi
 # Conditionally copy a Hugging Face token if provided
 RUN if [ -n "$HF_TKN_FILE" ]; then \
      echo "Copying Hugging Face token from $HF_TKN_FILE"; \
      mkdir -p /root/.cache/huggingface && \
      cp $HF_TKN_FILE /root/.cache/huggingface/token; \
    else \
      echo "No Hugging Face token file specified, skipping token setup"; \
    fi
 # Expose port for the transcription server
 EXPOSE 8000
 ENTRYPOINT ["whisperlivekit-server", "--host", "0.0.0.0"]
 # Default args
 CMD ["--model", "tiny.en"]
--- a/38
+++ b/38
@@ -1,21 +1,33 @@
 MIT License
-Copyright (c) 2023 ÚFAL
+Copyright (c) 2025 Quentin Fuxa.  
 Based on:
 - The original work by ÚFAL. License: https://github.com/ufal/whisper_streaming/blob/main/LICENSE  
 - The work by Snakers4 (silero-vad). License: https://github.com/snakers4/silero-vad/blob/f6b1294cb27590fb2452899df98fb234dfef1134/LICENSE  
 - The work in Diart by juanmc2005. License: https://github.com/juanmc2005/diart/blob/main/LICENSE
-Permission is hereby granted, free of charge, to any person obtaining a copy
+Permission is hereby granted, free of charge, to any person obtaining a copy  
-of this software and associated documentation files (the "Software"), to deal
+of this software and associated documentation files (the "Software"), to deal  
-in the Software without restriction, including without limitation the rights
+in the Software without restriction, including without limitation the rights  
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell  
-copies of the Software, and to permit persons to whom the Software is
+copies of the Software, and to permit persons to whom the Software is  
 furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
+The above copyright notice and this permission notice shall be included in all  
 copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR  
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,  
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE  
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER  
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,  
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE  
 SOFTWARE.
 ---
 Third-party components included in this software:
 - **whisper_streaming** by ÚFAL – MIT License – https://github.com/ufal/whisper_streaming  
 - **silero-vad** by Snakers4 – MIT License – https://github.com/snakers4/silero-vad  
 - **Diart** by juanmc2005 – MIT License – https://github.com/juanmc2005/diart
--- a/README.md
+++ b/README.md
@@ -1,105 +1,166 @@
 <h1 align="center">WhisperLiveKit</h1>
 <p align="center"><b>Real-time, Fully Local Whisper's Speech-to-Text and Speaker Diarization</b></p>
 This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
 <p align="center">
-  <img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/demo.png" alt="Demo Screenshot" width="730">
+  <img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/demo.png" alt="WhisperLiveKit Demo" width="730">
 </p>
-### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
+<p align="center"><b>Real-time, Fully Local Speech-to-Text with Speaker Diarization</b></p>
-#### ⚙️ **Core Improvements**  
+<p align="center">
  <a href="https://pypi.org/project/whisperlivekit/"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/whisperlivekit?color=g"></a>
  <a href="https://pepy.tech/project/whisperlivekit"><img alt="PyPI Downloads" src="https://static.pepy.tech/personalized-badge/whisperlivekit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads"></a>
  <a href="https://pypi.org/project/whisperlivekit/"><img alt="Python Versions" src="https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-dark_green"></a>
  <a href="https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/github/license/QuentinFuxa/WhisperLiveKit?color=blue"></a>
 </p>
 ## 🚀 Overview
 This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. WhisperLiveKit provides a complete backend solution for real-time speech transcription with a functional and simple frontend that you can customize for your own needs. Everything runs locally on your machine ✨
 ### 🔄 Architecture
 WhisperLiveKit consists of three main components:
 - **Frontend**: A basic HTML & JavaScript interface that captures microphone audio and streams it to the backend via WebSockets. You can use and adapt the provided template at [whisperlivekit/web/live_transcription.html](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html) for your specific use case.
 - **Backend (Web Server)**: A FastAPI-based WebSocket server that receives streamed audio data, processes it in real time, and returns transcriptions to the frontend. This is where the WebSocket logic and routing live.
 - **Core Backend (Library Logic)**: A server-agnostic core that handles audio processing, ASR, and diarization. It exposes reusable components that take in audio bytes and return transcriptions. This makes it easy to plug into any WebSocket or audio stream pipeline.
 ### ✨ Key Features
 - **🎙️ Real-time Transcription** - Convert speech to text instantly as you speak
 - **👥 Speaker Diarization** - Identify different speakers in real-time using [Diart](https://github.com/juanmc2005/diart)
 - **🔒 Fully Local** - All processing happens on your machine - no data sent to external servers
 - **📱 Multi-User Support** - Handle multiple users simultaneously with a single backend/server
 ### ⚙️ Core differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
 - **Automatic Silence Chunking** – Automatically chunks when no audio is detected to limit buffer size
 - **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online ASR
 - **Confidence Validation** – Immediately validate high-confidence tokens for faster inference
 - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing
 - **Buffering Preview** – Displays unvalidated transcription segments
 - **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online asr
 - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.  
 - **Confidence validation** – Immediately validate high-confidence tokens for faster inference
-#### 🎙️ **Speaker Identification**  
+## 📖 Quick Start
 - **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
-#### 🌐 **Web & API**  
+```bash
- **Built-in Web UI** – Simple raw html browser interface with no frontend setup required
+# Install the package
- **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.  
+pip install whisperlivekit
 - **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
-## Installation
+# Start the transcription server
 whisperlivekit-server --model tiny.en
-### Via pip (recommended)
+# Open your browser at http://localhost:8000
 ```
 ### Quick Start with SSL
 ```bash
 # You must provide a certificate and key
 whisperlivekit-server -ssl-certfile public.crt --ssl-keyfile private.key
 # Open your browser at https://localhost:8000
 ```
 That's it! Start speaking and watch your words appear on screen.
 ## 🛠️ Installation Options
 ### Install from PyPI (Recommended)
 ```bash
 pip install whisperlivekit
 ```
-### From source
+### Install from Source
-1. **Clone the Repository**:
+```bash
-
+git clone https://github.com/QuentinFuxa/WhisperLiveKit
-   ```bash
+cd WhisperLiveKit
-   git clone https://github.com/QuentinFuxa/WhisperLiveKit
+pip install -e .
-   cd WhisperLiveKit
+```
   pip install -e .
   ```
 ### System Dependencies
-You need to install FFmpeg on your system:
+FFmpeg is required:
 ```bash
-# For Ubuntu/Debian:
+# Ubuntu/Debian
 sudo apt install ffmpeg
-# For macOS:
+# macOS
 brew install ffmpeg
-# For Windows:
+# Windows
 # Download from https://ffmpeg.org/download.html and add to PATH
 ```
 ### Optional Dependencies
 ```bash
-# If you want to use VAC (Voice Activity Controller). Useful for preventing hallucinations
+# Voice Activity Controller (prevents hallucinations)
 pip install torch
-   
+
-# If you choose sentences as buffer trimming strategy
+# Sentence-based buffer trimming
 pip install mosestokenizer wtpsplit
 pip install tokenize_uk  # If you work with Ukrainian text
-# If you want to use diarization
+# Speaker diarization
 pip install diart
 # Alternative Whisper backends (default is faster-whisper)
 pip install whisperlivekit[whisper]              # Original Whisper
 pip install whisperlivekit[whisper-timestamped]  # Improved timestamps
 pip install whisperlivekit[mlx-whisper]          # Apple Silicon optimization
 pip install whisperlivekit[openai]               # OpenAI API
 ```
-Diart uses [pyannote.audio](https://github.com/pyannote/pyannote-audio) models from the _huggingface hub_. To use them, please follow the steps described [here](https://github.com/juanmc2005/diart?tab=readme-ov-file#get-access-to--pyannote-models).
+### 🎹 Pyannote Models Setup
-## Usage
+For diarization, you need access to pyannote.audio models:
-### Using the command-line tool
+1. [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
 2. [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model
 3. [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
 4. Login with HuggingFace:
   ```bash
   pip install huggingface_hub
   huggingface-cli login
   ```
-After installation, you can start the server using the provided command-line tool:
+## 💻 Usage Examples
 ### Command-line Interface
 Start the transcription server with various options:
 ```bash
-whisperlivekit-server --host 0.0.0.0 --port 8000 --model tiny.en
+# Basic server with English model
 whisperlivekit-server --model tiny.en
 # Advanced configuration with diarization
 whisperlivekit-server --host 0.0.0.0 --port 8000 --model medium --diarization --language auto
 ```
-Then open your browser at `http://localhost:8000` (or your specified host and port).
+### Python API Integration (Backend)
 ### Using the library in your code
 ```python
 from whisperlivekit import WhisperLiveKit
 from whisperlivekit.audio_processor import AudioProcessor
 from fastapi import FastAPI, WebSocket
 import asyncio
 from fastapi.responses import HTMLResponse
 # Initialize components
 app = FastAPI()
 kit = WhisperLiveKit(model="medium", diarization=True)
 app = FastAPI() # Create a FastAPI application
 # Serve the web interface
@app.get("/")
 async def get():
-    return HTMLResponse(kit.web_interface()) # Use the built-in web interface
+    return HTMLResponse(kit.web_interface())  # Use the built-in web interface
-async def handle_websocket_results(websocket, results_generator): # Sends results to frontend
+# Process WebSocket connections
 async def handle_websocket_results(websocket, results_generator):
    async for response in results_generator:
        await websocket.send_json(response)
@@ -108,67 +169,158 @@ async def websocket_endpoint(websocket: WebSocket):
    audio_processor = AudioProcessor()
    await websocket.accept()
    results_generator = await audio_processor.create_tasks()
-    websocket_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
+    websocket_task = asyncio.create_task(
        handle_websocket_results(websocket, results_generator)
    )
-    while True:
+    try:
-        message = await websocket.receive_bytes()
+        while True:
-        await audio_processor.process_audio(message)
+            message = await websocket.receive_bytes()
            await audio_processor.process_audio(message)
    except Exception as e:
        print(f"WebSocket error: {e}")
        websocket_task.cancel()
 ```
-For a complete audio processing example, check [whisper_fastapi_online_server.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisper_fastapi_online_server.py)
+### Frontend Implementation
 The package includes a simple HTML/JavaScript implementation that you can adapt for your project. You can get in in [whisperlivekit/web/live_transcription.html](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html), or using :
-## Configuration Options
+```python
 kit.web_interface()
 ```
-The following parameters are supported when initializing `WhisperLiveKit`:
+## ⚙️ Configuration Reference
-  - `--host` and `--port` let you specify the server's IP/port. 
+WhisperLiveKit offers extensive configuration options:
  - `-min-chunk-size` sets the minimum chunk size for audio processing. Make sure this value aligns with the chunk size selected in the frontend. If not aligned, the system will work but may unnecessarily over-process audio data.
  - `--transcription`: Enable/disable transcription (default: True)
  - `--diarization`: Enable/disable speaker diarization (default: False)
  - `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
  - `--warmup-file`: The path to a speech audio wav file to warm up Whisper so that the very first chunk processing is fast. :
    - If not set, uses https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav.
    - If False, no warmup is performed.
  - `--min-chunk-size` Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.
  - `--model` {_tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, large-v3-turbo_}
                        Name size of the Whisper model to use (default: tiny). The model is automatically downloaded from the model hub if not present in model cache dir.
  - `--model_cache_dir` Overriding the default model cache dir where models downloaded from the hub are saved
  - `--model_dir` Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.
  - `--lan`, --language Source language code, e.g. en,de,cs, or 'auto' for language detection.
  - `--task` {_transcribe, translate_} Transcribe or translate. If translate is set, we recommend avoiding the _large-v3-turbo_ backend, as it [performs significantly worse](https://github.com/QuentinFuxa/whisper_streaming_web/issues/40#issuecomment-2652816533) than other models for translation.
  - `--backend` {_faster-whisper, whisper_timestamped, openai-api, mlx-whisper_} Load only this backend for Whisper processing.
  - `--vac` Use VAC = voice activity controller. Requires torch.
  - `--vac-chunk-size` VAC sample size in seconds.
  - `--vad` Use VAD = voice activity detection, with the default parameters.
  - `--buffer_trimming` {_sentence, segment_} Buffer trimming strategy -- trim completed sentences marked with punctuation mark and detected by sentence segmenter, or the completed segments returned by Whisper. Sentence segmenter must be installed for "sentence" option.
  - `--buffer_trimming_sec` Buffer trimming length threshold in seconds. If buffer length is longer, trimming sentence/segment is triggered.
-5. **Open the Provided HTML**:
+| Parameter | Description | Default |
 |-----------|-------------|---------|
 | `--host` | Server host address | `localhost` |
 | `--port` | Server port | `8000` |
 | `--model` | Whisper model size | `tiny` |
 | `--language` | Source language code or `auto` | `en` |
 | `--task` | `transcribe` or `translate` | `transcribe` |
 | `--backend` | Processing backend | `faster-whisper` |
 | `--diarization` | Enable speaker identification | `False` |
 | `--confidence-validation` | Use confidence scores for faster validation | `False` |
 | `--min-chunk-size` | Minimum audio chunk size (seconds) | `1.0` |
 | `--vac` | Use Voice Activity Controller | `False` |
 | `--no-vad` | Disable Voice Activity Detection | `False` |
 | `--buffer_trimming` | Buffer trimming strategy (`sentence` or `segment`) | `segment` |
 | `--warmup-file` | Audio file path for model warmup | `jfk.wav` |
 | `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
 | `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |
-    - By default, the server root endpoint `/` serves a simple `live_transcription.html` page.  
+## 🔧 How It Works
    - Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).  
    - The page uses vanilla JavaScript and the WebSocket API to capture your microphone and stream audio to the server in real time.
 <p align="center">
  <img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/demo.png" alt="WhisperLiveKit in Action" width="500">
 </p>
-## How the Live Interface Works
+1. **Audio Capture**: Browser's MediaRecorder API captures audio in webm/opus format
 2. **Streaming**: Audio chunks are sent to the server via WebSocket
 3. **Processing**: Server decodes audio with FFmpeg and streams into Whisper for transcription
 4. **Real-time Output**: 
   - Partial transcriptions appear immediately in light gray (the 'aperçu')
   - Finalized text appears in normal color
   - (When enabled) Different speakers are identified and highlighted
- Once you **allow microphone access**, the page records small chunks of audio using the **MediaRecorder** API in **webm/opus** format.  
+## 🚀 Deployment Guide
 - These chunks are sent over a **WebSocket** to the FastAPI endpoint at `/asr`.  
 - The Python server decodes `.webm` chunks on the fly using **FFmpeg** and streams them into the **whisper streaming** implementation for transcription.  
 - **Partial transcription** appears as soon as enough audio is processed. The "unvalidated" text is shown in **lighter or grey color** (i.e., an 'aperçu') to indicate it's still buffered partial output. Once Whisper finalizes that segment, it's displayed in normal text.
 - You can watch the transcription update in near real time, ideal for demos, prototyping, or quick debugging.
-### Deploying to a Remote Server
+To deploy WhisperLiveKit in production:
-If you want to **deploy** this setup:
+1. **Server Setup** (Backend):
   ```bash
   # Install production ASGI server
   pip install uvicorn gunicorn
-1. **Host the FastAPI app** behind a production-grade HTTP(S) server (like **Uvicorn + Nginx** or Docker). If you use HTTPS, use "wss" instead of "ws" in WebSocket URL.
+   # Launch with multiple workers
-2. The **HTML/JS page** can be served by the same FastAPI app or a separate static host.  
+   gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app
-3. Users open the page in **Chrome/Firefox** (any modern browser that supports MediaRecorder + WebSocket).  
+   ```
-No additional front-end libraries or frameworks are required. The WebSocket logic in `live_transcription.html` is minimal enough to adapt for your own custom UI or embed in other pages.
+2. **Frontend Integration**:
   - Host your customized version of the example HTML/JS in your web application
   - Ensure WebSocket connection points to your server's address
-## Acknowledgments
+3. **Nginx Configuration** (recommended for production):
   ```nginx
   server {
       listen 80;
       server_name your-domain.com;
-This project builds upon the foundational work of the Whisper Streaming project. We extend our gratitude to the original authors for their contributions.
+       location / {
           proxy_pass http://localhost:8000;
           proxy_set_header Upgrade $http_upgrade;
           proxy_set_header Connection "upgrade";
           proxy_set_header Host $host;
       }
   }
   ```
 4. **HTTPS Support**: For secure deployments, use "wss://" instead of "ws://" in WebSocket URL
 ### 🐋 Docker
 A basic Dockerfile is provided which allows re-use of Python package installation options. See below usage examples:
 **NOTE:** For **larger** models, ensure that your **docker runtime** has enough **memory** available.
 #### All defaults
 - Create a reusable image with only the basics and then run as a named container:
 ```bash
 docker build -t whisperlivekit-defaults .
 docker create --gpus all --name whisperlivekit -p 8000:8000 whisperlivekit-defaults
 docker start -i whisperlivekit
 ```
 > **Note**: If you're running on a system without NVIDIA GPU support (such as Mac with Apple Silicon or any system without CUDA capabilities), you need to **remove the `--gpus all` flag** from the `docker create` command. Without GPU acceleration, transcription will use CPU only, which may be significantly slower. Consider using small models for better performance on CPU-only systems.
 #### Customization
 - Customize the container options:
 ```bash
 docker build -t whisperlivekit-defaults .
 docker create --gpus all --name whisperlivekit-base -p 8000:8000 whisperlivekit-defaults --model base
 docker start -i whisperlivekit-base
 ```
 - `--build-arg` Options:
  - `EXTRAS="whisper-timestamped"` - Add extras to the image's installation (no spaces). Remember to set necessary container options!
  - `HF_PRECACHE_DIR="./.cache/"` - Pre-load a model cache for faster first-time start
  - `HF_TOKEN="./token"` - Add your Hugging Face Hub access token to download gated models
 ## 🔮 Use Cases
 - **Meeting Transcription**: Capture discussions in real-time
 - **Accessibility Tools**: Help hearing-impaired users follow conversations
 - **Content Creation**: Transcribe podcasts or videos automatically
 - **Customer Service**: Transcribe support calls with speaker identification
 ## 🤝 Contributing
 Contributions are welcome! Here's how to get started:
 1. Fork the repository
 2. Create a feature branch: `git checkout -b feature/amazing-feature`
 3. Commit your changes: `git commit -m 'Add amazing feature'`
 4. Push to your branch: `git push origin feature/amazing-feature`
 5. Open a Pull Request
 ## 🙏 Acknowledgments
 This project builds upon the foundational work of:
 - [Whisper Streaming](https://github.com/ufal/whisper_streaming)
 - [Diart](https://github.com/juanmc2005/diart)
 - [OpenAI Whisper](https://github.com/openai/whisper)
 We extend our gratitude to the original authors for their contributions.
 ## 📄 License
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 ## 🔗 Links
 - [GitHub Repository](https://github.com/QuentinFuxa/WhisperLiveKit)
 - [PyPI Package](https://pypi.org/project/whisperlivekit/)
 - [Issue Tracker](https://github.com/QuentinFuxa/WhisperLiveKit/issues)
--- a/demo.png
+++ b/demo.png
--- a/setup.py
+++ b/setup.py
@@ -1,8 +1,7 @@
 from setuptools import setup, find_packages
 setup(
    name="whisperlivekit",
-    version="0.1.0",
+    version="0.1.6",
    description="Real-time, Fully Local Whisper's Speech-to-Text and Speaker Diarization",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
@@ -22,6 +21,10 @@ setup(
        "diarization": ["diart"],
        "vac": ["torch"],
        "sentence": ["mosestokenizer", "wtpsplit"],
        "whisper": ["whisper"],
        "whisper-timestamped": ["whisper-timestamped"],
        "mlx-whisper": ["mlx-whisper"],
        "openai": ["openai"],
    },
    package_data={
        'whisperlivekit': ['web/*.html'],
--- a/whisper_fastapi_online_server.py
+++ b/whisper_fastapi_online_server.py
@@ -1,82 +0,0 @@
 from contextlib import asynccontextmanager
 from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.responses import HTMLResponse
 from fastapi.middleware.cors import CORSMiddleware
 from whisperlivekit import WhisperLiveKit
 from whisperlivekit.audio_processor import AudioProcessor
 import asyncio
 import logging
 import os
 logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 logging.getLogger().setLevel(logging.WARNING)
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.DEBUG)
 kit = None
@asynccontextmanager
 async def lifespan(app: FastAPI):
    global kit
    kit = WhisperLiveKit()
    yield
 app = FastAPI(lifespan=lifespan)
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
@app.get("/")
 async def get():
    return HTMLResponse(kit.web_interface())
 async def handle_websocket_results(websocket, results_generator):
    """Consumes results from the audio processor and sends them via WebSocket."""
    try:
        async for response in results_generator:
            await websocket.send_json(response)
    except Exception as e:
        logger.warning(f"Error in WebSocket results handler: {e}")
@app.websocket("/asr")
 async def websocket_endpoint(websocket: WebSocket):
    audio_processor = AudioProcessor()
    await websocket.accept()
    logger.info("WebSocket connection opened.")
    results_generator = await audio_processor.create_tasks()
    websocket_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
    try:
        while True:
            message = await websocket.receive_bytes()
            await audio_processor.process_audio(message)
    except WebSocketDisconnect:
        logger.warning("WebSocket disconnected.")
    finally:
        websocket_task.cancel()
        await audio_processor.cleanup()
        logger.info("WebSocket endpoint cleaned up.")
 if __name__ == "__main__":
    import uvicorn
    temp_kit = WhisperLiveKit(transcription=False, diarization=False)
    uvicorn.run(
        "whisper_fastapi_online_server:app", 
        host=temp_kit.args.host, 
        port=temp_kit.args.port, 
        reload=True,
        log_level="info"
    )
--- a/whisperlivekit/audio_processor.py
+++ b/whisperlivekit/audio_processor.py
@@ -6,7 +6,6 @@ import math
 import logging
 import traceback
 from datetime import timedelta
 from typing import List, Dict, Any
 from whisperlivekit.timed_objects import ASRToken
 from whisperlivekit.whisper_streaming_custom.whisper_online import online_factory
 from whisperlivekit.core import WhisperLiveKit
@@ -16,6 +15,8 @@ logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(
 logger = logging.getLogger(__name__)
 logger.setLevel(logging.DEBUG)
 SENTINEL = object() # unique sentinel object for end of stream marker
 def format_time(seconds: float) -> str:
    """Format seconds as HH:MM:SS."""
    return str(timedelta(seconds=int(seconds)))
@@ -39,8 +40,12 @@ class AudioProcessor:
        self.bytes_per_sample = 2
        self.bytes_per_sec = self.samples_per_sec * self.bytes_per_sample
        self.max_bytes_per_sec = 32000 * 5  # 5 seconds of audio at 32 kHz
-        
+        self.last_ffmpeg_activity = time()
        self.ffmpeg_health_check_interval = 5
        self.ffmpeg_max_idle_time = 10
        # State management
        self.is_stopping = False
        self.tokens = []
        self.buffer_transcription = ""
        self.buffer_diarization = ""
@@ -60,6 +65,13 @@ class AudioProcessor:
        self.transcription_queue = asyncio.Queue() if self.args.transcription else None
        self.diarization_queue = asyncio.Queue() if self.args.diarization else None
        self.pcm_buffer = bytearray()
        # Task references
        self.transcription_task = None
        self.diarization_task = None
        self.ffmpeg_reader_task = None
        self.watchdog_task = None
        self.all_tasks_for_cleanup = []
        # Initialize transcription engine if enabled
        if self.args.transcription:
@@ -78,14 +90,50 @@ class AudioProcessor:
    async def restart_ffmpeg(self):
        """Restart the FFmpeg process after failure."""
        logger.warning("Restarting FFmpeg process...")
        if self.ffmpeg_process:
            try:
-                self.ffmpeg_process.kill()
+                # we check if process is still running
-                await asyncio.get_event_loop().run_in_executor(None, self.ffmpeg_process.wait)
+                if self.ffmpeg_process.poll() is None:
                    logger.info("Terminating existing FFmpeg process")
                    self.ffmpeg_process.stdin.close()
                    self.ffmpeg_process.terminate()
                    # wait for termination with timeout
                    try:
                        await asyncio.wait_for(
                            asyncio.get_event_loop().run_in_executor(None, self.ffmpeg_process.wait),
                            timeout=5.0
                        )
                    except asyncio.TimeoutError:
                        logger.warning("FFmpeg process did not terminate, killing forcefully")
                        self.ffmpeg_process.kill()
                        await asyncio.get_event_loop().run_in_executor(None, self.ffmpeg_process.wait)
            except Exception as e:
-                logger.warning(f"Error killing FFmpeg process: {e}")
+                logger.error(f"Error during FFmpeg process termination: {e}")
                logger.error(traceback.format_exc())
        # we start new process
        try:
            logger.info("Starting new FFmpeg process")
            self.ffmpeg_process = self.start_ffmpeg_decoder()
            self.pcm_buffer = bytearray()
            self.last_ffmpeg_activity = time()
            logger.info("FFmpeg process restarted successfully")
        except Exception as e:
            logger.error(f"Failed to restart FFmpeg process: {e}")
            logger.error(traceback.format_exc())
            # try again after 5s
            await asyncio.sleep(5)
            try:
                self.ffmpeg_process = self.start_ffmpeg_decoder()
                self.pcm_buffer = bytearray()
                self.last_ffmpeg_activity = time()
                logger.info("FFmpeg process restarted successfully on second attempt")
            except Exception as e2:
                logger.critical(f"Failed to restart FFmpeg process on second attempt: {e2}")
                logger.critical(traceback.format_exc())
    async def update_transcription(self, new_tokens, buffer, end_buffer, full_transcription, sep):
        """Thread-safe update of transcription with new data."""
@@ -154,25 +202,25 @@ class AudioProcessor:
        while True:
            try:
-                # Calculate buffer size based on elapsed time
+                current_time = time()
-                elapsed_time = math.floor((time() - beg) * 10) / 10  # Round to 0.1 sec
+                elapsed_time = math.floor((current_time - beg) * 10) / 10
                buffer_size = max(int(32000 * elapsed_time), 4096)
-                beg = time()
+                beg = current_time
-                # Read chunk with timeout
+                # Detect idle state much more quickly
-                try:
+                if current_time - self.last_ffmpeg_activity > self.ffmpeg_max_idle_time:
-                    chunk = await asyncio.wait_for(
+                    logger.warning(f"FFmpeg process idle for {current_time - self.last_ffmpeg_activity:.2f}s. Restarting...")
                        loop.run_in_executor(None, self.ffmpeg_process.stdout.read, buffer_size),
                        timeout=15.0
                    )
                except asyncio.TimeoutError:
                    logger.warning("FFmpeg read timeout. Restarting...")
                    await self.restart_ffmpeg()
                    beg = time()
                    self.last_ffmpeg_activity = time()
                    continue
                chunk = await loop.run_in_executor(None, self.ffmpeg_process.stdout.read, buffer_size)
                if chunk:
                    self.last_ffmpeg_activity = time()
                if not chunk:
-                    logger.info("FFmpeg stdout closed.")
+                    logger.info("FFmpeg stdout closed, no more data to read.")
                    break
                self.pcm_buffer.extend(chunk)
@@ -183,7 +231,7 @@ class AudioProcessor:
                        self.convert_pcm_to_float(self.pcm_buffer).copy()
                    )
-                # Process when we have enough data
+                # Process when enough data
                if len(self.pcm_buffer) >= self.bytes_per_sec:
                    if len(self.pcm_buffer) > self.max_bytes_per_sec:
                        logger.warning(
@@ -207,6 +255,15 @@ class AudioProcessor:
                logger.warning(f"Exception in ffmpeg_stdout_reader: {e}")
                logger.warning(f"Traceback: {traceback.format_exc()}")
                break
        logger.info("FFmpeg stdout processing finished. Signaling downstream processors.")
        if self.args.transcription and self.transcription_queue:
            await self.transcription_queue.put(SENTINEL)
            logger.debug("Sentinel put into transcription_queue.")
        if self.args.diarization and self.diarization_queue:
            await self.diarization_queue.put(SENTINEL)
            logger.debug("Sentinel put into diarization_queue.")
    async def transcription_processor(self):
        """Process audio chunks for transcription."""
@@ -216,8 +273,23 @@ class AudioProcessor:
        while True:
            try:
                pcm_array = await self.transcription_queue.get()
                if pcm_array is SENTINEL:
                    logger.debug("Transcription processor received sentinel. Finishing.")
                    self.transcription_queue.task_done()
                    break
-                logger.info(f"{len(self.online.audio_buffer) / self.online.SAMPLING_RATE} seconds of audio to process.")
+                if not self.online: # Should not happen if queue is used
                    logger.warning("Transcription processor: self.online not initialized.")
                    self.transcription_queue.task_done()
                    continue
                asr_internal_buffer_duration_s = len(self.online.audio_buffer) / self.online.SAMPLING_RATE
                transcription_lag_s = max(0.0, time() - self.beg_loop - self.end_buffer)
                logger.info(
                    f"ASR processing: internal_buffer={asr_internal_buffer_duration_s:.2f}s, "
                    f"lag={transcription_lag_s:.2f}s."
                )
                # Process transcription
                self.online.insert_audio_chunk(pcm_array)
@@ -240,12 +312,15 @@ class AudioProcessor:
                await self.update_transcription(
                    new_tokens, buffer, end_buffer, self.full_transcription, self.sep
                )
                self.transcription_queue.task_done()
            except Exception as e:
                logger.warning(f"Exception in transcription_processor: {e}")
                logger.warning(f"Traceback: {traceback.format_exc()}")
-            finally:
+                if 'pcm_array' in locals() and pcm_array is not SENTINEL : # Check if pcm_array was assigned from queue
-                self.transcription_queue.task_done()
+                    self.transcription_queue.task_done()
        logger.info("Transcription processor task finished.")
    async def diarization_processor(self, diarization_obj):
        """Process audio chunks for speaker diarization."""
@@ -254,6 +329,10 @@ class AudioProcessor:
        while True:
            try:
                pcm_array = await self.diarization_queue.get()
                if pcm_array is SENTINEL:
                    logger.debug("Diarization processor received sentinel. Finishing.")
                    self.diarization_queue.task_done()
                    break
                # Process diarization
                await diarization_obj.diarize(pcm_array)
@@ -265,12 +344,15 @@ class AudioProcessor:
                )
                await self.update_diarization(new_end, buffer_diarization)
                self.diarization_queue.task_done()
            except Exception as e:
                logger.warning(f"Exception in diarization_processor: {e}")
                logger.warning(f"Traceback: {traceback.format_exc()}")
-            finally:
+                if 'pcm_array' in locals() and pcm_array is not SENTINEL:
-                self.diarization_queue.task_done()
+                    self.diarization_queue.task_done()
        logger.info("Diarization processor task finished.")
    async def results_formatter(self):
        """Format processing results for output."""
@@ -360,50 +442,188 @@ class AudioProcessor:
                    yield response
                    self.last_response_content = response_content
                # Check for termination condition
                if self.is_stopping:
                    all_processors_done = True
                    if self.args.transcription and self.transcription_task and not self.transcription_task.done():
                        all_processors_done = False
                    if self.args.diarization and self.diarization_task and not self.diarization_task.done():
                        all_processors_done = False
                    if all_processors_done:
                        logger.info("Results formatter: All upstream processors are done and in stopping state. Terminating.")
                        final_state = await self.get_current_state()
                        return
                await asyncio.sleep(0.1)  # Avoid overwhelming the client
            except Exception as e:
                logger.warning(f"Exception in results_formatter: {e}")
                logger.warning(f"Traceback: {traceback.format_exc()}")
                await asyncio.sleep(0.5)  # Back off on error
-                
+        
    async def create_tasks(self):
        """Create and start processing tasks."""
-            
+        self.all_tasks_for_cleanup = []
-        tasks = []    
+        processing_tasks_for_watchdog = []
        if self.args.transcription and self.online:
-            tasks.append(asyncio.create_task(self.transcription_processor()))
+            self.transcription_task = asyncio.create_task(self.transcription_processor())
            self.all_tasks_for_cleanup.append(self.transcription_task)
            processing_tasks_for_watchdog.append(self.transcription_task)
        if self.args.diarization and self.diarization:
-            tasks.append(asyncio.create_task(self.diarization_processor(self.diarization)))
+            self.diarization_task = asyncio.create_task(self.diarization_processor(self.diarization))
            self.all_tasks_for_cleanup.append(self.diarization_task)
            processing_tasks_for_watchdog.append(self.diarization_task)
-        tasks.append(asyncio.create_task(self.ffmpeg_stdout_reader()))
+        self.ffmpeg_reader_task = asyncio.create_task(self.ffmpeg_stdout_reader())
-        self.tasks = tasks
+        self.all_tasks_for_cleanup.append(self.ffmpeg_reader_task)
        processing_tasks_for_watchdog.append(self.ffmpeg_reader_task)
        # Monitor overall system health
        self.watchdog_task = asyncio.create_task(self.watchdog(processing_tasks_for_watchdog))
        self.all_tasks_for_cleanup.append(self.watchdog_task)
        return self.results_formatter()
    async def watchdog(self, tasks_to_monitor):
        """Monitors the health of critical processing tasks."""
        while True:
            try:
                await asyncio.sleep(10)
                current_time = time()
                for i, task in enumerate(tasks_to_monitor):
                    if task.done():
                        exc = task.exception()
                        task_name = task.get_name() if hasattr(task, 'get_name') else f"Monitored Task {i}"
                        if exc:
                            logger.error(f"{task_name} unexpectedly completed with exception: {exc}")
                        else:
                            logger.info(f"{task_name} completed normally.")
                ffmpeg_idle_time = current_time - self.last_ffmpeg_activity
                if ffmpeg_idle_time > 15:
                    logger.warning(f"FFmpeg idle for {ffmpeg_idle_time:.2f}s - may need attention.")
                    if ffmpeg_idle_time > 30 and not self.is_stopping:
                        logger.error("FFmpeg idle for too long and not in stopping phase, forcing restart.")
                        await self.restart_ffmpeg()
            except asyncio.CancelledError:
                logger.info("Watchdog task cancelled.")
                break
            except Exception as e:
                logger.error(f"Error in watchdog task: {e}", exc_info=True)
    async def cleanup(self):
        """Clean up resources when processing is complete."""
-        for task in self.tasks:
+        logger.info("Starting cleanup of AudioProcessor resources.")
-            task.cancel()
+        for task in self.all_tasks_for_cleanup:
            if task and not task.done():
                task.cancel()
        created_tasks = [t for t in self.all_tasks_for_cleanup if t]
        if created_tasks:
            await asyncio.gather(*created_tasks, return_exceptions=True)
        logger.info("All processing tasks cancelled or finished.")
        if self.ffmpeg_process:
            if self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed:
                try:
                    self.ffmpeg_process.stdin.close()
                except Exception as e:
                    logger.warning(f"Error closing ffmpeg stdin during cleanup: {e}")
-        try:
+            # Wait for ffmpeg process to terminate
-            await asyncio.gather(*self.tasks, return_exceptions=True)
+            if self.ffmpeg_process.poll() is None: # Check if process is still running
-            self.ffmpeg_process.stdin.close()
+                logger.info("Waiting for FFmpeg process to terminate...")
-            self.ffmpeg_process.wait()
+                try:
-        except Exception as e:
+                    # Run wait in executor to avoid blocking async loop
-            logger.warning(f"Error during cleanup: {e}")
+                    await asyncio.get_event_loop().run_in_executor(None, self.ffmpeg_process.wait, 5.0) # 5s timeout
-            
+                except Exception as e: # subprocess.TimeoutExpired is not directly caught by asyncio.wait_for with run_in_executor
-        if self.args.diarization and hasattr(self, 'diarization'):
+                    logger.warning(f"FFmpeg did not terminate gracefully, killing. Error: {e}")
                    self.ffmpeg_process.kill()
                    await asyncio.get_event_loop().run_in_executor(None, self.ffmpeg_process.wait) # Wait for kill
            logger.info("FFmpeg process terminated.")
        if self.args.diarization and hasattr(self, 'diarization') and hasattr(self.diarization, 'close'):
            self.diarization.close()
        logger.info("AudioProcessor cleanup complete.")
    async def process_audio(self, message):
        """Process incoming audio data."""
-        try:
+        # If already stopping or stdin is closed, ignore further audio, especially residual chunks.
-            self.ffmpeg_process.stdin.write(message)
+        if self.is_stopping or (self.ffmpeg_process and self.ffmpeg_process.stdin and self.ffmpeg_process.stdin.closed):
-            self.ffmpeg_process.stdin.flush()
+            logger.warning(f"AudioProcessor is stopping or stdin is closed. Ignoring incoming audio message (length: {len(message)}).")
-        except (BrokenPipeError, AttributeError) as e:
+            if not message and self.ffmpeg_process and self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed:
-            logger.warning(f"Error writing to FFmpeg: {e}. Restarting...")
+                 logger.info("Received empty message while already in stopping state; ensuring stdin is closed.")
-            await self.restart_ffmpeg()
+                 try:
-            self.ffmpeg_process.stdin.write(message)
+                    self.ffmpeg_process.stdin.close()
-            self.ffmpeg_process.stdin.flush()
+                 except Exception as e:
                    logger.warning(f"Error closing ffmpeg stdin on redundant stop signal during stopping state: {e}")
            return
        if not message: # primary signal to start stopping
            logger.info("Empty audio message received, initiating stop sequence.")
            self.is_stopping = True
            if self.ffmpeg_process and self.ffmpeg_process.stdin and not self.ffmpeg_process.stdin.closed:
                try:
                    self.ffmpeg_process.stdin.close()
                    logger.info("FFmpeg stdin closed due to primary stop signal.")
                except Exception as e:
                    logger.warning(f"Error closing ffmpeg stdin on stop: {e}")
            return
        retry_count = 0
        max_retries = 3
        # Log periodic heartbeats showing ongoing audio proc
        current_time = time()
        if not hasattr(self, '_last_heartbeat') or current_time - self._last_heartbeat >= 10:
            logger.debug(f"Processing audio chunk, last FFmpeg activity: {current_time - self.last_ffmpeg_activity:.2f}s ago")
            self._last_heartbeat = current_time
        while retry_count < max_retries:
            try:
                if not self.ffmpeg_process or not hasattr(self.ffmpeg_process, 'stdin') or self.ffmpeg_process.poll() is not None:
                    logger.warning("FFmpeg process not available, restarting...")
                    await self.restart_ffmpeg()
                loop = asyncio.get_running_loop()                
                try:
                    await asyncio.wait_for(
                        loop.run_in_executor(None, lambda: self.ffmpeg_process.stdin.write(message)),
                        timeout=2.0
                    )
                except asyncio.TimeoutError:
                    logger.warning("FFmpeg write operation timed out, restarting...")
                    await self.restart_ffmpeg()
                    retry_count += 1
                    continue
                try:
                    await asyncio.wait_for(
                        loop.run_in_executor(None, self.ffmpeg_process.stdin.flush),
                        timeout=2.0
                    )
                except asyncio.TimeoutError:
                    logger.warning("FFmpeg flush operation timed out, restarting...")
                    await self.restart_ffmpeg()
                    retry_count += 1
                    continue
                self.last_ffmpeg_activity = time()
                return
            except (BrokenPipeError, AttributeError, OSError) as e:
                retry_count += 1
                logger.warning(f"Error writing to FFmpeg: {e}. Retry {retry_count}/{max_retries}...")
                if retry_count < max_retries:
                    await self.restart_ffmpeg()
                    await asyncio.sleep(0.5)
                else:
                    logger.error("Maximum retries reached for FFmpeg process")
                    await self.restart_ffmpeg()
                    return
--- a/whisperlivekit/basic_server.py
+++ b/whisperlivekit/basic_server.py
@@ -3,12 +3,13 @@ from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.responses import HTMLResponse
 from fastapi.middleware.cors import CORSMiddleware
-from whisperlivekit import WhisperLiveKit
+from whisperlivekit import WhisperLiveKit, parse_args
 from whisperlivekit.audio_processor import AudioProcessor
 import asyncio
 import logging
-import os
+import os, sys
 import argparse
 logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 logging.getLogger().setLevel(logging.WARNING)
@@ -43,6 +44,11 @@ async def handle_websocket_results(websocket, results_generator):
    try:
        async for response in results_generator:
            await websocket.send_json(response)
        # when the results_generator finishes it means all audio has been processed
        logger.info("Results generator finished. Sending 'ready_to_stop' to client.")
        await websocket.send_json({"type": "ready_to_stop"})
    except WebSocketDisconnect:
        logger.info("WebSocket disconnected while handling results (client likely closed connection).")
    except Exception as e:
        logger.warning(f"Error in WebSocket results handler: {e}")
@@ -61,26 +67,58 @@ async def websocket_endpoint(websocket: WebSocket):
        while True:
            message = await websocket.receive_bytes()
            await audio_processor.process_audio(message)
    except KeyError as e:
        if 'bytes' in str(e):
            logger.warning(f"Client has closed the connection.")
        else:
            logger.error(f"Unexpected KeyError in websocket_endpoint: {e}", exc_info=True)
    except WebSocketDisconnect:
-        logger.warning("WebSocket disconnected.")
+        logger.info("WebSocket disconnected by client during message receiving loop.")
    except Exception as e:
        logger.error(f"Unexpected error in websocket_endpoint main loop: {e}", exc_info=True)
    finally:
-        websocket_task.cancel()
+        logger.info("Cleaning up WebSocket endpoint...")
        if not websocket_task.done():
            websocket_task.cancel()
        try:
            await websocket_task
        except asyncio.CancelledError:
            logger.info("WebSocket results handler task was cancelled.")
        except Exception as e:
            logger.warning(f"Exception while awaiting websocket_task completion: {e}")
        await audio_processor.cleanup()
-        logger.info("WebSocket endpoint cleaned up.")
+        logger.info("WebSocket endpoint cleaned up successfully.")
 def main():
    """Entry point for the CLI command."""
    import uvicorn
-    temp_kit = WhisperLiveKit(transcription=False, diarization=False)
+    args = parse_args()
-    uvicorn.run(
+    uvicorn_kwargs = {
-        "whisperlivekit.basic_server:app", 
+        "app": "whisperlivekit.basic_server:app",
-        host=temp_kit.args.host, 
+        "host":args.host, 
-        port=temp_kit.args.port, 
+        "port":args.port, 
-        reload=True,
+        "reload": False,
-        log_level="info"
+        "log_level": "info",
-    )
+        "lifespan": "on",
    }
    ssl_kwargs = {}
    if args.ssl_certfile or args.ssl_keyfile:
        if not (args.ssl_certfile and args.ssl_keyfile):
            raise ValueError("Both --ssl-certfile and --ssl-keyfile must be specified together.")
        ssl_kwargs = {
            "ssl_certfile": args.ssl_certfile,
            "ssl_keyfile": args.ssl_keyfile
        }
    if ssl_kwargs:
        uvicorn_kwargs = {**uvicorn_kwargs, **ssl_kwargs}
    uvicorn.run(**uvicorn_kwargs)
 if __name__ == "__main__":
    main()
--- a/whisperlivekit/core.py
+++ b/whisperlivekit/core.py
@@ -1,7 +1,7 @@
 try:
    from whisperlivekit.whisper_streaming_custom.whisper_online import backend_factory, warmup_asr
-except:
+except ImportError:
-    from whisper_streaming_custom.whisper_online import backend_factory, warmup_asr
+    from .whisper_streaming_custom.whisper_online import backend_factory, warmup_asr
 from argparse import Namespace, ArgumentParser
 def parse_args():
@@ -29,23 +29,21 @@ def parse_args():
    parser.add_argument(
        "--confidence-validation",
-        type=bool,
+        action="store_true",
        default=False,
        help="Accelerates validation of tokens using confidence scores. Transcription will be faster but punctuation might be less accurate.",
    )
    parser.add_argument(
        "--diarization",
-        type=bool,
+        action="store_true",
-        default=True,
+        default=False,
-        help="Whether to enable speaker diarization.",
+        help="Enable speaker diarization.",
    )
    parser.add_argument(
-        "--transcription",
+        "--no-transcription",
-        type=bool,
+        action="store_true",
-        default=True,
+        help="Disable transcription to only see live diarization results.",
        help="To disable to only see live diarization results.",
    )
    parser.add_argument(
@@ -54,15 +52,14 @@ def parse_args():
        default=0.5,
        help="Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.",
    )
    parser.add_argument(
        "--model",
        type=str,
        default="tiny",
-        choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large,large-v3-turbo".split(
+        help="Name size of the Whisper model to use (default: tiny). Suggested values: tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large,large-v3-turbo. The model is automatically downloaded from the model hub if not present in model cache dir.",
            ","
        ),
        help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.",
    )
    parser.add_argument(
        "--model_cache_dir",
        type=str,
@@ -105,12 +102,13 @@ def parse_args():
    parser.add_argument(
        "--vac-chunk-size", type=float, default=0.04, help="VAC sample size in seconds."
    )
    parser.add_argument(
-        "--vad",
+        "--no-vad",
        action="store_true",
-        default=True,
+        help="Disable VAD (voice activity detection).",
        help="Use VAD = voice activity detection, with the default parameters.",
    )
    parser.add_argument(
        "--buffer_trimming",
        type=str,
@@ -132,8 +130,17 @@ def parse_args():
        help="Set the log level",
        default="DEBUG",
    )
    parser.add_argument("--ssl-certfile", type=str, help="Path to the SSL certificate file.", default=None)
    parser.add_argument("--ssl-keyfile", type=str, help="Path to the SSL private key file.", default=None)
    args = parser.parse_args()
    args.transcription = not args.no_transcription
    args.vad = not args.no_vad    
    delattr(args, 'no_transcription')
    delattr(args, 'no_vad')
    return args
 class WhisperLiveKit:
--- a/whisperlivekit/diarization/init.py
+++ b/whisperlivekit/diarization/init.py
--- a/whisperlivekit/web/init.py
+++ b/whisperlivekit/web/init.py
--- a/whisperlivekit/web/live_transcription.html
+++ b/whisperlivekit/web/live_transcription.html
@@ -38,7 +38,6 @@
            transform: scale(0.95);
        }
        /* Shape inside the button */
        .shape-container {
            width: 25px;
            height: 25px;
@@ -56,6 +55,10 @@
            transition: all 0.3s ease;
        }
        #recordButton:disabled .shape {
            background-color: #6e6d6d;
        }
        #recordButton.recording .shape {
            border-radius: 5px;
            width: 25px;
@@ -279,7 +282,7 @@
            </div>
            <div>
                <label for="websocketInput">WebSocket URL:</label>
-                <input id="websocketInput" type="text" value="ws://localhost:8000/asr" />
+                <input id="websocketInput" type="text" />
            </div>
        </div>
    </div>
@@ -304,6 +307,8 @@
        let waveCanvas = document.getElementById("waveCanvas");
        let waveCtx = waveCanvas.getContext("2d");
        let animationFrame = null;
        let waitingForStop = false;
        let lastReceivedData = null;
        waveCanvas.width = 60 * (window.devicePixelRatio || 1);
        waveCanvas.height = 30 * (window.devicePixelRatio || 1);
        waveCtx.scale(window.devicePixelRatio || 1, window.devicePixelRatio || 1);
@@ -315,6 +320,13 @@
        const linesTranscriptDiv = document.getElementById("linesTranscript");
        const timerElement = document.querySelector(".timer");
        const host = window.location.hostname || "localhost";
        const port = window.location.port || "8000";
        const protocol = window.location.protocol === "https:" ? "wss" : "ws";
        const defaultWebSocketUrl = `${protocol}://${host}:${port}/asr`;
        websocketInput.value = defaultWebSocketUrl;
        websocketUrl = defaultWebSocketUrl;
        chunkSelector.addEventListener("change", () => {
            chunkDuration = parseInt(chunkSelector.value);
        });
@@ -346,12 +358,31 @@
                websocket.onclose = () => {
                    if (userClosing) {
-                        statusText.textContent = "WebSocket closed by user.";
+                        if (waitingForStop) {
                            statusText.textContent = "Processing finalized or connection closed.";
                            if (lastReceivedData) {
                                renderLinesWithBuffer(
                                    lastReceivedData.lines || [],
                                    lastReceivedData.buffer_diarization || "",
                                    lastReceivedData.buffer_transcription || "",
                                    0, 0, true // isFinalizing = true
                                );
                            }
                        }
                        // If ready_to_stop was received, statusText is already "Finished processing..."
                        // and waitingForStop is false.
                    } else {
-                        statusText.textContent =
+                        statusText.textContent = "Disconnected from the WebSocket server. (Check logs if model is loading.)";
-                            "Disconnected from the WebSocket server. (Check logs if model is loading.)";
+                        if (isRecording) {
                            stopRecording(); 
                        }
                    }
-                    userClosing = false;
+                    isRecording = false;  
                    waitingForStop = false; 
                    userClosing = false;  
                    lastReceivedData = null;  
                    websocket = null;    
                    updateUI();  
                };
                websocket.onerror = () => {
@@ -363,6 +394,34 @@
                websocket.onmessage = (event) => {
                    const data = JSON.parse(event.data);
                    // Check for status messages
                    if (data.type === "ready_to_stop") {
                        console.log("Ready to stop received, finalizing display and closing WebSocket.");
                        waitingForStop = false;
                        if (lastReceivedData) {
                            renderLinesWithBuffer(
                                lastReceivedData.lines || [],
                                lastReceivedData.buffer_diarization || "",
                                lastReceivedData.buffer_transcription || "",
                                0, // No more lag
                                0, // No more lag
                                true // isFinalizing = true
                            );
                        }
                        statusText.textContent = "Finished processing audio! Ready to record again.";
                        recordButton.disabled = false;
                        if (websocket) {
                            websocket.close(); // will trigger onclose
                            // websocket = null; // onclose handle setting websocket to null
                        }
                        return;
                    }
                    lastReceivedData = data; 
                    // Handle normal transcription updates
                    const { 
                        lines = [], 
                        buffer_transcription = "", 
@@ -376,13 +435,14 @@
                        buffer_diarization, 
                        buffer_transcription, 
                        remaining_time_diarization,
-                        remaining_time_transcription
+                        remaining_time_transcription,
                        false // isFinalizing = false for normal updates
                    );
                };
            });
        }
-        function renderLinesWithBuffer(lines, buffer_diarization, buffer_transcription, remaining_time_diarization, remaining_time_transcription) {
+        function renderLinesWithBuffer(lines, buffer_diarization, buffer_transcription, remaining_time_diarization, remaining_time_transcription, isFinalizing = false) {
            const linesHtml = lines.map((item, idx) => {
                let timeInfo = "";
                if (item.beg !== undefined && item.end !== undefined) {
@@ -392,30 +452,46 @@
                let speakerLabel = "";
                if (item.speaker === -2) {
                    speakerLabel = `<span class="silence">Silence<span id='timeInfo'>${timeInfo}</span></span>`;
-                } else if (item.speaker == 0) {
+                } else if (item.speaker == 0 && !isFinalizing) {
                    speakerLabel = `<span class='loading'><span class="spinner"></span><span id='timeInfo'>${remaining_time_diarization} second(s) of audio are undergoing diarization</span></span>`;
                } else if (item.speaker == -1) {
-                    speakerLabel = `<span id="speaker"><span id='timeInfo'>${timeInfo}</span></span>`;
+                    speakerLabel = `<span id="speaker">Speaker 1<span id='timeInfo'>${timeInfo}</span></span>`;
-                } else if (item.speaker !== -1) {
+                } else if (item.speaker !== -1 && item.speaker !== 0) {
                    speakerLabel = `<span id="speaker">Speaker ${item.speaker}<span id='timeInfo'>${timeInfo}</span></span>`;
                }
-                let textContent = item.text;
+
-                if (idx === lines.length - 1) {
+                let currentLineText = item.text || "";
-                    speakerLabel += `<span class="label_transcription"><span class="spinner"></span>Transcription lag <span id='timeInfo'>${remaining_time_transcription}s</span></span>`
+
-                }
+                if (idx === lines.length - 1) { 
-                if (idx === lines.length - 1 && buffer_diarization) {
+                    if (!isFinalizing) {
-                    speakerLabel += `<span class="label_diarization"><span class="spinner"></span>Diarization lag<span id='timeInfo'>${remaining_time_diarization}s</span></span>`
+                        if (remaining_time_transcription > 0) {
-                    textContent += `<span class="buffer_diarization">${buffer_diarization}</span>`;
+                             speakerLabel += `<span class="label_transcription"><span class="spinner"></span>Transcription lag <span id='timeInfo'>${remaining_time_transcription}s</span></span>`;
-                }
+                        }
-                if (idx === lines.length - 1) {
+                        if (buffer_diarization && remaining_time_diarization > 0) {
-                    textContent += `<span class="buffer_transcription">${buffer_transcription}</span>`;
+                             speakerLabel += `<span class="label_diarization"><span class="spinner"></span>Diarization lag<span id='timeInfo'>${remaining_time_diarization}s</span></span>`;
                        }
                    }
                    if (buffer_diarization) {
                        if (isFinalizing) {
                            currentLineText += (currentLineText.length > 0 && buffer_diarization.trim().length > 0 ? " " : "") + buffer_diarization.trim();
                        } else {
                            currentLineText += `<span class="buffer_diarization">${buffer_diarization}</span>`;
                        }
                    }
                    if (buffer_transcription) {
                        if (isFinalizing) {
                            currentLineText += (currentLineText.length > 0 && buffer_transcription.trim().length > 0 ? " " : "") + buffer_transcription.trim();
                        } else {
                            currentLineText += `<span class="buffer_transcription">${buffer_transcription}</span>`;
                        }
                    }
                }
-
+                return currentLineText.trim().length > 0 || speakerLabel.length > 0
-                return textContent
+                    ? `<p>${speakerLabel}<br/><div class='textcontent'>${currentLineText}</div></p>`
-                    ? `<p>${speakerLabel}<br/><div class='textcontent'>${textContent}</div></p>`
+                    : `<p>${speakerLabel}<br/></p>`; 
                    : `<p>${speakerLabel}<br/></p>`;
            }).join("");
            linesTranscriptDiv.innerHTML = linesHtml;
@@ -494,8 +570,17 @@
            }
        }
-        function stopRecording() {
+        async function stopRecording() {
            userClosing = true;
            waitingForStop = true;
            if (websocket && websocket.readyState === WebSocket.OPEN) {
                // Send empty audio buffer as stop signal
                const emptyBlob = new Blob([], { type: 'audio/webm' });
                websocket.send(emptyBlob);
                statusText.textContent = "Recording stopped. Processing final audio...";
            }
            if (recorder) {
                recorder.stop();
                recorder = null;
@@ -531,38 +616,60 @@
            timerElement.textContent = "00:00";
            startTime = null;
            isRecording = false;
            if (websocket) {
                websocket.close();
                websocket = null;
            }
            updateUI();	
        }
        async function toggleRecording() {
            if (!isRecording) {
-                linesTranscriptDiv.innerHTML = "";
+                if (waitingForStop) {
                    console.log("Waiting for stop, early return");
                    return;  // Early return, UI is already updated
                }
                console.log("Connecting to WebSocket");
                try {
-                    await setupWebSocket();
+                    // If we have an active WebSocket that's still processing, just restart audio capture
-                    await startRecording();
+                    if (websocket && websocket.readyState === WebSocket.OPEN) {
                        await startRecording();
                    } else {
                        // If no active WebSocket or it's closed, create new one
                        await setupWebSocket();
                        await startRecording();
                    }
                } catch (err) {
                    statusText.textContent = "Could not connect to WebSocket or access mic. Aborted.";
                    console.error(err);
                }
            } else {
                console.log("Stopping recording");
                stopRecording();
            }
        }
        function updateUI() {
            recordButton.classList.toggle("recording", isRecording);
-            statusText.textContent = isRecording ? "Recording..." : "Click to start transcription";
+            recordButton.disabled = waitingForStop;
            if (waitingForStop) {
                if (statusText.textContent !== "Recording stopped. Processing final audio...") {
                     statusText.textContent = "Please wait for processing to complete...";
                }
            } else if (isRecording) {
                statusText.textContent = "Recording...";
            } else {
                if (statusText.textContent !== "Finished processing audio! Ready to record again." &&
                    statusText.textContent !== "Processing finalized or connection closed.") {
                    statusText.textContent = "Click to start transcription";
                }
            }
            if (!waitingForStop) {
                recordButton.disabled = false;
            }
        }
        recordButton.addEventListener("click", toggleRecording);
    </script>
 </body>
-</html>
+</html>
--- a/whisperlivekit/whisper_streaming_custom/init.py
+++ b/whisperlivekit/whisper_streaming_custom/init.py
--- a/whisperlivekit/whisper_streaming_custom/backends.py
+++ b/whisperlivekit/whisper_streaming_custom/backends.py
@@ -3,7 +3,10 @@ import logging
 import io
 import soundfile as sf
 import math
-import torch
+try: 
    import torch
 except ImportError: 
    torch = None
 from typing import List
 import numpy as np
 from whisperlivekit.timed_objects import ASRToken
@@ -102,8 +105,9 @@ class FasterWhisperASR(ASRBase):
            model_size_or_path = modelsize
        else:
            raise ValueError("Either modelsize or model_dir must be set")
-        device = "cuda" if torch.cuda.is_available() else "cpu"
+        device = "auto" # Allow CTranslate2 to decide available device
-        compute_type = "float16" if device == "cuda" else "float32"
+        compute_type = "auto" # Allow CTranslate2 to decide faster compute type
        model = WhisperModel(
            model_size_or_path,
@@ -249,8 +253,8 @@ class OpenaiApiASR(ASRBase):
        no_speech_segments = []
        if self.use_vad_opt:
            for segment in segments.segments:
-                if segment["no_speech_prob"] > 0.8:
+                if segment.no_speech_prob > 0.8:
-                    no_speech_segments.append((segment.get("start"), segment.get("end")))
+                    no_speech_segments.append((segment.start, segment.end))
        tokens = []
        for word in segments.words:
            start = word.start
--- a/whisperlivekit/whisper_streaming_custom/online_asr.py
+++ b/whisperlivekit/whisper_streaming_custom/online_asr.py
@@ -216,31 +216,54 @@ class OnlineASRProcessor:
        """
        If the committed tokens form at least two sentences, chunk the audio
        buffer at the end time of the penultimate sentence.
        Also ensures chunking happens if audio buffer exceeds a time limit.
        """
        buffer_duration = len(self.audio_buffer) / self.SAMPLING_RATE        
        if not self.committed:
            if buffer_duration > self.buffer_trimming_sec:
                chunk_time = self.buffer_time_offset + (buffer_duration / 2)
                logger.debug(f"--- No speech detected, forced chunking at {chunk_time:.2f}")
                self.chunk_at(chunk_time)
            return
        logger.debug("COMPLETED SENTENCE: " + " ".join(token.text for token in self.committed))
        sentences = self.words_to_sentences(self.committed)
        for sentence in sentences:
            logger.debug(f"\tSentence: {sentence.text}")
-        if len(sentences) < 2:
+        
-            return
+        chunk_done = False
-        # Keep the last two sentences.
+        if len(sentences) >= 2:
-        while len(sentences) > 2:
+            while len(sentences) > 2:
-            sentences.pop(0)
+                sentences.pop(0)
-        chunk_time = sentences[-2].end
+            chunk_time = sentences[-2].end
-        logger.debug(f"--- Sentence chunked at {chunk_time:.2f}")
+            logger.debug(f"--- Sentence chunked at {chunk_time:.2f}")
-        self.chunk_at(chunk_time)
+            self.chunk_at(chunk_time)
            chunk_done = True
        if not chunk_done and buffer_duration > self.buffer_trimming_sec:
            last_committed_time = self.committed[-1].end
            logger.debug(f"--- Not enough sentences, chunking at last committed time {last_committed_time:.2f}")
            self.chunk_at(last_committed_time)
    def chunk_completed_segment(self, res):
        """
        Chunk the audio buffer based on segment-end timestamps reported by the ASR.
        Also ensures chunking happens if audio buffer exceeds a time limit.
        """
        buffer_duration = len(self.audio_buffer) / self.SAMPLING_RATE        
        if not self.committed:
            if buffer_duration > self.buffer_trimming_sec:
                chunk_time = self.buffer_time_offset + (buffer_duration / 2)
                logger.debug(f"--- No speech detected, forced chunking at {chunk_time:.2f}")
                self.chunk_at(chunk_time)
            return
        logger.debug("Processing committed tokens for segmenting")
        ends = self.asr.segments_end_ts(res)
-        last_committed_time = self.committed[-1].end
+        last_committed_time = self.committed[-1].end        
        chunk_done = False
        if len(ends) > 1:
            logger.debug("Multiple segments available for chunking")
            e = ends[-2] + self.buffer_time_offset
            while len(ends) > 2 and e > last_committed_time:
                ends.pop(-1)
@@ -248,11 +271,18 @@ class OnlineASRProcessor:
            if e <= last_committed_time:
                logger.debug(f"--- Segment chunked at {e:.2f}")
                self.chunk_at(e)
                chunk_done = True
            else:
                logger.debug("--- Last segment not within committed area")
        else:
            logger.debug("--- Not enough segments to chunk")
-
+        
        if not chunk_done and buffer_duration > self.buffer_trimming_sec:
            logger.debug(f"--- Buffer too large, chunking at last committed time {last_committed_time:.2f}")
            self.chunk_at(last_committed_time)
        logger.debug("Segment chunking complete")
    def chunk_at(self, time: float):
        """
        Trim both the hypothesis and audio buffer at the given time.
@@ -358,7 +388,7 @@ class VACOnlineASRProcessor:
        # Load a VAD model (e.g. Silero VAD)
        import torch
        model, _ = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad")
-        from silero_vad_iterator import FixedVADIterator
+        from .silero_vad_iterator import FixedVADIterator
        self.vac = FixedVADIterator(model)
        self.logfile = self.online.logfile
--- a/whisperlivekit/whisper_streaming_custom/silero_vad_iterator.py
+++ b/whisperlivekit/whisper_streaming_custom/silero_vad_iterator.py
--- a/whisperlivekit/whisper_streaming_custom/whisper_online.py
+++ b/whisperlivekit/whisper_streaming_custom/whisper_online.py
@@ -179,7 +179,7 @@ def warmup_asr(asr, warmup_file=None, timeout=5):
        logger.warning(f"Warmup file {warmup_file} invalid or missing.")
        return False
-    print(f"Warmping up Whisper with {warmup_file}")
+    print(f"Warming up Whisper with {warmup_file}")
    try:
        import librosa
        audio, sr = librosa.load(warmup_file, sr=16000)
Author	SHA1	Message	Date
Quentin Fuxa	fa29a24abe	Bump version to 0.1.6	2025-05-07 11:45:33 +02:00
Quentin Fuxa	fea3c3553c	logging in ASR proc. includes internal buffer duration and transcription lag	2025-05-07 11:45:00 +02:00
Quentin Fuxa	d6d65a663b	errors handling when end of transcription	2025-05-07 10:56:04 +02:00
Quentin Fuxa	083d5b2f44	uses sentinel object when end of transcription, to properly terminate tasks	2025-05-07 10:55:44 +02:00
Quentin Fuxa	8e4674b093	End of transcription : Properly sends signal back to the endpoint	2025-05-07 10:55:12 +02:00
Quentin Fuxa	bc7c32100f	Mention third-party components	2025-04-14 00:21:43 +02:00
Quentin Fuxa	c4150894af	Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web	2025-04-13 12:11:01 +02:00
Quentin Fuxa	25bf242ce1	bump version to 0.1.5	2025-04-13 12:10:53 +02:00
Quentin Fuxa	14cc601a5c	Update README.md	2025-04-13 11:07:53 +02:00
Quentin Fuxa	34d5d513fa	fix typo	2025-04-12 18:22:14 +02:00
Quentin Fuxa	2ab3dac948	remove whisper_fastapi_online_server.py	2025-04-12 18:21:04 +02:00
Quentin Fuxa	b56fcffde1	Solves stdin flushes blocking IO https://github.com/QuentinFuxa/WhisperLiveKit/issues/110 https://github.com/QuentinFuxa/WhisperLiveKit/issues/106 https://github.com/QuentinFuxa/WhisperLiveKit/issues/90 https://github.com/QuentinFuxa/WhisperLiveKit/issues/87 https://github.com/QuentinFuxa/WhisperLiveKit/issues/81 https://github.com/QuentinFuxa/WhisperLiveKit/issues/2	2025-04-12 15:25:46 +02:00
Quentin Fuxa	2def194893	add ssl certificate and key file arguments to parser	2025-04-11 12:20:22 +02:00
Quentin Fuxa	29978da301	adds ssl possibility in basic server	2025-04-11 12:20:08 +02:00
Quentin Fuxa	b708890788	protocol default to ws	2025-04-11 12:14:14 +02:00
Quentin Fuxa	3ac4c514cf	remove temp_kit method to get args. uvicorn reload to False for better perfs	2025-04-11 12:02:52 +02:00
Chris Margach	3c58bfcfa2	update readme for package launch with SSL	2025-04-10 13:47:09 +09:00
Chris Margach	d53b7a323a	update sample html to use wss in case of https	2025-04-10 13:46:52 +09:00
Chris Margach	02de5993e6	allow passing of cert and key locations to uvicorn via package	2025-04-10 13:42:30 +09:00
Quentin Fuxa	d94560ef37	Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web	2025-04-09 11:38:57 +02:00
Quentin Fuxa	f62baa80b7	to 0.1.4	2025-04-09 11:35:22 +02:00
Quentin Fuxa	0b43035701	enhance chunking to handle audio buffer time limits	2025-04-09 11:34:59 +02:00
Quentin Fuxa	704170ccf3	Logs for https://github.com/QuentinFuxa/WhisperLiveKit/issues/110 https://github.com/QuentinFuxa/WhisperLiveKit/issues/106 https://github.com/QuentinFuxa/WhisperLiveKit/issues/90 https://github.com/QuentinFuxa/WhisperLiveKit/issues/87 https://github.com/QuentinFuxa/WhisperLiveKit/issues/81 https://github.com/QuentinFuxa/WhisperLiveKit/issues/2	2025-04-09 11:34:27 +02:00
Quentin Fuxa	09279c572a	Merge pull request #116 from needabetterusername/solve-115 Solve #115: VAC Broken import	2025-04-09 10:16:49 +02:00
Quentin Fuxa	23e41f993f	typos	2025-04-09 10:14:23 +02:00
Quentin Fuxa	c791b1e125	Merge pull request #114 from needabetterusername/implement-69-clean (Re-) Implement #69 (Dockerfile)	2025-04-09 10:10:08 +02:00
Quentin Fuxa	3de2990ec4	Update README to clarify Docker usage for non-GPU systems	2025-04-09 10:08:48 +02:00
Chris Margach	51e6a6f6f9	update import after moving target file	2025-04-09 13:33:39 +09:00
Chris Margach	f6e53b2fab	return silero_vad_iterator.py to whisper_streaming(_custom) package.	2025-04-09 11:59:21 +09:00
Chris Margach	5d6f08ff7a	Update readme for Dockerfile	2025-04-08 19:06:42 +09:00
Chris Margach	583a26da88	Add Dockerfile w/ GPU support.	2025-04-08 19:06:11 +09:00
Chris Margach	5b3d8969e8	Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit	2025-04-08 09:44:20 +09:00
Quentin Fuxa	40cca184c1	Merge pull request #113 from needabetterusername/implement-107 Allow CTranslate2 backend to choose device and compute types.	2025-04-07 14:42:57 +02:00
Chris Margach	47ed345f9e	Merge branch 'implement-107'	2025-04-07 17:40:08 +09:00
Chris Margach	9c9c179684	Allow CTranslate2 backend to choose device and compute types.	2025-04-07 14:47:29 +09:00
Quentin Fuxa	b870c12f62	Merge pull request #109 from QuentinFuxa/needabetterusername/implement-69 Needabetterusername/implement 69	2025-04-04 11:10:08 +02:00
Quentin Fuxa	cfd5905fd4	Improve WebSocket fallback logic Use window.location.hostname and port if available, otherwise fallback to localhost:8000. Co-authored-by: Chris Margach <hcagramc@gmail.com>	2025-04-04 11:08:05 +02:00
Chris Margach	2399487e45	Implement #107	2025-04-04 10:54:15 +09:00
Quentin Fuxa	afd88310fd	Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web	2025-04-02 11:56:25 +02:00
Quentin Fuxa	080f446b0d	start implementing frontend part of https://github.com/QuentinFuxa/WhisperLiveKit/pull/80	2025-04-02 11:56:02 +02:00
Quentin Fuxa	8bd2b36488	Add files via upload	2025-04-01 11:03:22 +02:00
Quentin Fuxa	25fd924bf9	Merge pull request #103 from QuentinFuxa/readme Update README.md	2025-03-28 14:30:35 +01:00
Quentin Fuxa	ff8fd0ec72	Update README.md	2025-03-28 14:30:14 +01:00
Quentin Fuxa	e99f53e649	Corrects 'TranscriptionSegment' object is not subscriptable	2025-03-24 21:16:08 +01:00
Quentin Fuxa	e9022894b2	solve #100	2025-03-24 20:38:47 +01:00
Quentin Fuxa	ccf99cecdf	Solve #95 and #96	2025-03-24 17:55:52 +01:00
Quentin Fuxa	40e2814cd7	0.1.2	2025-03-20 11:08:40 +01:00
Quentin Fuxa	cd29eace3d	Update README.md	2025-03-20 10:23:14 +01:00