add SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS for instructions when simulstreaming files are not there

2026-03-07 14:23:18 +00:00 · 2025-06-30 17:42:45 +02:00
parent f668570292
commit d22916988e
3 changed files with 44 additions and 107 deletions
--- a/README.md
+++ b/README.md
@@ -13,32 +13,32 @@
 <a href="https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-MIT-dark_green"></a>
 </p>

-## 🚀 Overview
+## Overview

 This project is based on [WhisperStreaming](https://github.com/ufal/whisper_streaming) and [SimulStreaming](https://github.com/ufal/SimulStreaming), allowing you to transcribe audio directly from your browser. WhisperLiveKit provides a complete backend solution for real-time speech transcription with a functional, simple and customizable frontend. Everything runs locally on your machine ✨

-### 🔄 Architecture
+### Architecture

 WhisperLiveKit consists of three main components:

- **Frontend**: A basic html + JS interface that captures microphone audio and streams it to the backend via WebSockets. You can use and adapt the provided template at [whisperlivekit/web/live_transcription.html](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html).
+- **Frontend**: A basic html + JS interface that captures microphone audio and streams it to the backend via WebSockets. You can use and adapt the [provided template](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html).
 - **Backend (Web Server)**: A FastAPI-based WebSocket server that receives streamed audio data, processes it in real time, and returns transcriptions to the frontend. This is where the WebSocket logic and routing live.
 - **Core Backend (Library Logic)**: A server-agnostic core that handles audio processing, ASR, and diarization. It exposes reusable components that take in audio bytes and return transcriptions.


-### ✨ Key Features
+### Key Features

- **🎙️ Real-time Transcription** - Locally (or on-prem) convert speech to text instantly as you speak
- **👥 Speaker Diarization** - Identify different speakers in real-time using [Diart](https://github.com/juanmc2005/diart)
- **🌐 Multi-User Support** - Handle multiple users simultaneously with a single backend/server
- **🔇 Automatic Silence Chunking** – Automatically chunks when no audio is detected to limit buffer size
- **✅ Confidence Validation** – Immediately validate high-confidence tokens for faster inference (WhisperStreaming only)
- **👁️ Buffering Preview** – Displays unvalidated transcription segments (not compatible with SimulStreaming yet)
- **✒️ Punctuation-Based Speaker Splitting [BETA]** - Align speaker changes with natural sentence boundaries for more readable transcripts
- **⚡ SimulStreaming Backend** - Ultra-low latency transcription using state-of-the-art AlignAtt policy. The code is not directly included in the repo : To use, please copy [simul_whisper](https://github.com/ufal/SimulStreaming/tree/main/simul_whisper) content into `whisperlivekit/simul_whisper` . ⚠️ You must comply with the [Polyform license](https://github.com/ufal/SimulStreaming/blob/main/LICENCE.txt)
+- **Real-time Transcription** - Locally (or on-prem) convert speech to text instantly as you speak
+- **Speaker Diarization** - Identify different speakers in real-time using [Diart](https://github.com/juanmc2005/diart)
+- **Multi-User Support** - Handle multiple users simultaneously with a single backend/server
+- **Automatic Silence Chunking** – Automatically chunks when no audio is detected to limit buffer size
+- **Confidence Validation** – Immediately validate high-confidence tokens for faster inference (WhisperStreaming only)
+- **Buffering Preview** – Displays unvalidated transcription segments (not compatible with SimulStreaming yet)
+- **Punctuation-Based Speaker Splitting [BETA]** - Align speaker changes with natural sentence boundaries for more readable transcripts
+- **SimulStreaming Backend** - Ultra-low latency transcription using state-of-the-art AlignAtt policy. The code is not directly included in the repo : To use, please copy [simul_whisper](https://github.com/ufal/SimulStreaming/tree/main/simul_whisper) content into `whisperlivekit/simul_whisper` . ⚠️ You must comply with the [Polyform license](https://github.com/ufal/SimulStreaming/blob/main/LICENCE.txt)


-## 📖 Quick Start
+## Quick Start

 ```bash
 # Install the package
@@ -53,25 +53,19 @@ whisperlivekit-server --model tiny.en

 That's it! Start speaking and watch your words appear on screen.

-## 🛠️ Installation Options
-
-### Install from PyPI (Recommended)
+## Installation

 ```bash
+#Install from PyPI (Recommended)
 pip install whisperlivekit
-```

-### Install from Source
-
-```bash
+#Install from Source
 git clone https://github.com/QuentinFuxa/WhisperLiveKit
 cd WhisperLiveKit
 pip install -e .
 ```

-### System Dependencies
-
-FFmpeg is required:
+### FFmpeg Dependency

 ```bash
 # Ubuntu/Debian
@@ -140,40 +134,30 @@ whisperlivekit-server --backend simulstreaming --model large-v3 --frame-threshol
 Check [basic_server.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a complete example.

 ```python
-from whisperlivekit import TranscriptionEngine, AudioProcessor, get_web_interface_html, parse_args
+from whisperlivekit import TranscriptionEngine, AudioProcessor, parse_args
 from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.responses import HTMLResponse
 from contextlib import asynccontextmanager
 import asyncio

-# Global variable for the transcription engine
 transcription_engine = None

@asynccontextmanager
 async def lifespan(app: FastAPI):
    global transcription_engine
-    # Example: Initialize with specific parameters directly
+    transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
    # You can also load from command-line arguments using parse_args()
    # args = parse_args()
    # transcription_engine = TranscriptionEngine(**vars(args))
-    transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
    yield

 app = FastAPI(lifespan=lifespan)

-# Serve the web interface
-@app.get("/")
-async def get():
-    return HTMLResponse(get_web_interface_html())
-
 # Process WebSocket connections
 async def handle_websocket_results(websocket: WebSocket, results_generator):
-    try:
-        async for response in results_generator:
-            await websocket.send_json(response)
-        await websocket.send_json({"type": "ready_to_stop"})
-    except WebSocketDisconnect:
-        print("WebSocket disconnected during results handling.")
+    async for response in results_generator:
+        await websocket.send_json(response)
+    await websocket.send_json({"type": "ready_to_stop"})

@app.websocket("/asr")
 async def websocket_endpoint(websocket: WebSocket):
@@ -182,33 +166,19 @@ async def websocket_endpoint(websocket: WebSocket):
    # Create a new AudioProcessor for each connection, passing the shared engine
    audio_processor = AudioProcessor(transcription_engine=transcription_engine)    
    results_generator = await audio_processor.create_tasks()
-    send_results_to_client = handle_websocket_results(websocket, results_generator)
-    results_task = asyncio.create_task(send_results_to_client)
+    results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
    await websocket.accept()
-    try:
-        while True:
-            message = await websocket.receive_bytes()
-            await audio_processor.process_audio(message)        
-    except WebSocketDisconnect:
-        print(f"Client disconnected: {websocket.client}")
-    except Exception as e:
-        await websocket.close(code=1011, reason=f"Server error: {e}")
-    finally:
-        results_task.cancel()
-        try:
-            await results_task
-        except asyncio.CancelledError:
-            logger.info("Results task successfully cancelled.")
+    while True:
+        message = await websocket.receive_bytes()
+        await audio_processor.process_audio(message)        
 ```

 ### Frontend Implementation

-The package includes a simple HTML/JavaScript implementation that you can adapt for your project. You can find it in `whisperlivekit/web/live_transcription.html`, or load its content using the `get_web_interface_html()` function from `whisperlivekit`:
+The package includes a simple HTML/JavaScript implementation that you can adapt for your project. You can find it [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html), or load its content using `get_web_interface_html()` :

 ```python
 from whisperlivekit import get_web_interface_html
-
-# ... later in your code where you need the HTML string ...
 html_content = get_web_interface_html()
 ```

@@ -257,11 +227,8 @@ WhisperLiveKit offers extensive configuration options:

 1. **Audio Capture**: Browser's MediaRecorder API captures audio in webm/opus format
 2. **Streaming**: Audio chunks are sent to the server via WebSocket
-3. **Processing**: Server decodes audio with FFmpeg and streams into Whisper for transcription
-4. **Real-time Output**: 
-   - Partial transcriptions appear immediately in light gray (the 'aperçu')
-   - Finalized text appears in normal color
-   - (When enabled) Different speakers are identified and highlighted
+3. **Processing**: Server decodes audio with FFmpeg and streams into the model for transcription
+4. **Real-time Output**: Partial transcriptions appear immediately in light gray (the 'aperçu') and finalized text appears in normal color

 ## 🚀 Deployment Guide

@@ -291,17 +258,14 @@ To deploy WhisperLiveKit in production:
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
-    }
-}
-```
+    }}

 4. **HTTPS Support**: For secure deployments, use "wss://" instead of "ws://" in WebSocket URL

 ### 🐋 Docker

-A basic Dockerfile is provided which allows re-use of Python package installation options. See below usage examples:
+A basic Dockerfile is provided which allows re-use of Python package installation options. ⚠️ For **large** models, ensure that your **docker runtime** has enough **memory** available. See below usage examples:

-**NOTE:** For **larger** models, ensure that your **docker runtime** has enough **memory** available.

 #### All defaults
 - Create a reusable image with only the basics and then run as a named container:
@@ -327,40 +291,11 @@ docker start -i whisperlivekit-base
  - `HF_TOKEN="./token"` - Add your Hugging Face Hub access token to download gated models

 ## 🔮 Use Cases
-
- **Meeting Transcription**: Capture discussions in real-time
- **Accessibility Tools**: Help hearing-impaired users follow conversations
- **Content Creation**: Transcribe podcasts or videos automatically
- **Customer Service**: Transcribe support calls with speaker identification
-
-## 📄 License
-
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-
-**⚠️ Important**: When using the SimulStreaming backend, you must also comply with the **PolyForm Noncommercial License 1.0.0** that governs SimulStreaming. For commercial use of the SimulStreaming backend, obtain a commercial license from the [SimulStreaming authors](https://github.com/ufal/SimulStreaming#-licence-and-contributions).
-
-## 🤝 Contributing
-
-Contributions are welcome! Here's how to get started:
-
-1. Fork the repository
-2. Create a feature branch: `git checkout -b feature/amazing-feature`
-3. Commit your changes: `git commit -m 'Add amazing feature'`
-4. Push to your branch: `git push origin feature/amazing-feature`
-5. Open a Pull Request
+Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...

 ## 🙏 Acknowledgments

-This project builds upon the foundational work of:
- [Whisper Streaming](https://github.com/ufal/whisper_streaming)
- [SimulStreaming](https://github.com/ufal/SimulStreaming) (BETA backend)
- [Diart](https://github.com/juanmc2005/diart)
- [OpenAI Whisper](https://github.com/openai/whisper)
+We extend our gratitude to the original authors of:

-We extend our gratitude to the original authors for their contributions.
-
-## 🔗 Links
-
- [GitHub Repository](https://github.com/QuentinFuxa/WhisperLiveKit)
- [PyPI Package](https://pypi.org/project/whisperlivekit/)
- [Issue Tracker](https://github.com/QuentinFuxa/WhisperLiveKit/issues)
+| [Whisper Streaming](https://github.com/ufal/whisper_streaming)  | [SimulStreaming](https://github.com/ufal/SimulStreaming) | [Diart](https://github.com/juanmc2005/diart) | [OpenAI Whisper](https://github.com/openai/whisper) |
+| -------- | ------- | -------- | ------- |
--- a/whisperlivekit/whisper_streaming_custom/backends.py
+++ b/whisperlivekit/whisper_streaming_custom/backends.py
@@ -12,6 +12,11 @@ import numpy as np
 from whisperlivekit.timed_objects import ASRToken

 logger = logging.getLogger(__name__)
+SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS = ImportError(
+"""SimulStreaming dependencies are not available.
+Please install WhisperLiveKit using pip install "whisperlivekit[simulstreaming]".
+If you are building from source, you should also copy the content of the https://github.com/ufal/SimulStreaming/tree/main/simul_whisper directory into whisperlivekit/simul_whisper.
+""")

 try:
    from whisperlivekit.simul_whisper.config import AlignAttConfig
@@ -315,7 +320,7 @@ class SimulStreamingASR(ASRBase):

    def __init__(self, lan, modelsize=None, cache_dir=None, model_dir=None, logfile=sys.stderr, **kwargs):
        if not SIMULSTREAMING_AVAILABLE:
-            raise ImportError("""SimulStreaming dependencies are not available. Please install WhisperLiveKit using pip install "whisperlivekit[simulstreaming]". If you are building from source, you should also copy the content of the simul_whisper directory from the SimulStreaming repository into whisperlivekit/simul_whisper.""")
+            raise SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS
        with open("whisperlivekit/simul_whisper/dual_license_simulstreaming.md", "r") as f:
            print("*"*80 + f.read() + "*"*80)
        self.logfile = logfile
--- a/whisperlivekit/whisper_streaming_custom/whisper_online.py
+++ b/whisperlivekit/whisper_streaming_custom/whisper_online.py
@@ -5,7 +5,7 @@ import librosa
 from functools import lru_cache
 import time
 import logging
-from .backends import FasterWhisperASR, MLXWhisper, WhisperTimestampedASR, OpenaiApiASR, SimulStreamingASR, SIMULSTREAMING_AVAILABLE
+from .backends import FasterWhisperASR, MLXWhisper, WhisperTimestampedASR, OpenaiApiASR, SimulStreamingASR, SIMULSTREAMING_AVAILABLE, SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS
 from .online_asr import OnlineASRProcessor, VACOnlineASRProcessor, SimulStreamingOnlineProcessor, SIMULSTREAMING_AVAILABLE as SIMULSTREAMING_ONLINE_AVAILABLE

 logger = logging.getLogger(__name__)
@@ -72,10 +72,7 @@ def backend_factory(args):
    elif backend == "simulstreaming":
        logger.debug("Using SimulStreaming backend.")
        if not SIMULSTREAMING_AVAILABLE:
-            raise ImportError(
-                "SimulStreaming backend is not available. Please install SimulStreaming dependencies. "
-                "See the documentation for installation instructions."
-            )
+            raise SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS
        
        simulstreaming_kwargs = {}
        for attr in ['frame_threshold', 'beams', 'decoder_type', 'audio_max_len', 'audio_min_len', 
@@ -144,7 +141,7 @@ def backend_factory(args):
 def online_factory(args, asr, tokenizer, logfile=sys.stderr):
    if args.backend == "simulstreaming":
        if not SIMULSTREAMING_ONLINE_AVAILABLE:
-            raise ImportError("SimulStreaming online processor is not available.")
+            raise SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS
        
        logger.debug("Creating SimulStreaming online processor")
        online = SimulStreamingOnlineProcessor(