Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web

bump version to 0.1.5
Update README.md
2026-03-07 22:33:36 +00:00 · 2025-04-13 12:11:01 +02:00 · 2025-04-13 12:10:53 +02:00 · 2025-04-13 11:07:53 +02:00 · 2025-04-12 18:22:14 +02:00 · 2025-04-12 18:21:04 +02:00
9 changed files with 83 additions and 125 deletions
--- a/README.md
+++ b/README.md
@@ -15,16 +15,16 @@

 ## 🚀 Overview

-This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. WhisperLiveKit provides a complete backend solution for real-time speech transcription with an example frontend that you can customize for your own needs. Everything runs locally on your machine ✨
+This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. WhisperLiveKit provides a complete backend solution for real-time speech transcription with a functional and simple frontend that you can customize for your own needs. Everything runs locally on your machine ✨

 ### 🔄 Architecture

-WhisperLiveKit consists of two main components:
+WhisperLiveKit consists of three main components:

- **Backend (Server)**: FastAPI WebSocket server that processes audio and provides real-time transcription
- **Frontend Example**: Basic HTML & JavaScript implementation that demonstrates how to capture and stream audio
+- **Frontend**: A basic HTML & JavaScript interface that captures microphone audio and streams it to the backend via WebSockets. You can use and adapt the provided template at [whisperlivekit/web/live_transcription.html](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html) for your specific use case.
+- **Backend (Web Server)**: A FastAPI-based WebSocket server that receives streamed audio data, processes it in real time, and returns transcriptions to the frontend. This is where the WebSocket logic and routing live.
+- **Core Backend (Library Logic)**: A server-agnostic core that handles audio processing, ASR, and diarization. It exposes reusable components that take in audio bytes and return transcriptions. This makes it easy to plug into any WebSocket or audio stream pipeline.

-> **Note**: We recommend installing this library on the server/backend. For the frontend, you can use and adapt the provided HTML template from [whisperlivekit/web/live_transcription.html](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html) for your specific use case.

 ### ✨ Key Features

@@ -33,13 +33,13 @@ WhisperLiveKit consists of two main components:
 - **🔒 Fully Local** - All processing happens on your machine - no data sent to external servers
 - **📱 Multi-User Support** - Handle multiple users simultaneously with a single backend/server
 
-### ⚙️ Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
+### ⚙️ Core differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)

+- **Automatic Silence Chunking** – Automatically chunks when no audio is detected to limit buffer size
 - **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online ASR
+- **Confidence Validation** – Immediately validate high-confidence tokens for faster inference
 - **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing
 - **Buffering Preview** – Displays unvalidated transcription segments
- **Confidence Validation** – Immediately validate high-confidence tokens for faster inference
- **Apple Silicon Optimized** - MLX backend for faster local processing on Mac

 ## 📖 Quick Start

@@ -53,6 +53,14 @@ whisperlivekit-server --model tiny.en
 # Open your browser at http://localhost:8000
 ```

+### Quick Start with SSL
+```bash
+# You must provide a certificate and key
+whisperlivekit-server -ssl-certfile public.crt --ssl-keyfile private.key
+
+# Open your browser at https://localhost:8000
+```
+
 That's it! Start speaking and watch your words appear on screen.

 ## 🛠️ Installation Options
@@ -201,6 +209,8 @@ WhisperLiveKit offers extensive configuration options:
 | `--no-vad` | Disable Voice Activity Detection | `False` |
 | `--buffer_trimming` | Buffer trimming strategy (`sentence` or `segment`) | `segment` |
 | `--warmup-file` | Audio file path for model warmup | `jfk.wav` |
+| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
+| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |

 ## 🔧 How It Works

--- a/demo.png
+++ b/demo.png
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,7 @@
 from setuptools import setup, find_packages
 setup(
    name="whisperlivekit",
-    version="0.1.4",
+    version="0.1.5",
    description="Real-time, Fully Local Whisper's Speech-to-Text and Speaker Diarization",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
--- a/whisper_fastapi_online_server.py
+++ b/whisper_fastapi_online_server.py
@@ -1,82 +0,0 @@
-from contextlib import asynccontextmanager
-from fastapi import FastAPI, WebSocket, WebSocketDisconnect
-from fastapi.responses import HTMLResponse
-from fastapi.middleware.cors import CORSMiddleware
-
-from whisperlivekit import WhisperLiveKit
-from whisperlivekit.audio_processor import AudioProcessor
-
-import asyncio
-import logging
-import os
-
-logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
-logging.getLogger().setLevel(logging.WARNING)
-logger = logging.getLogger(__name__)
-logger.setLevel(logging.DEBUG)
-
-kit = None
-
-@asynccontextmanager
-async def lifespan(app: FastAPI):
-    global kit
-    kit = WhisperLiveKit()
-    yield
-
-app = FastAPI(lifespan=lifespan)
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-
-
-@app.get("/")
-async def get():
-    return HTMLResponse(kit.web_interface())
-
-
-async def handle_websocket_results(websocket, results_generator):
-    """Consumes results from the audio processor and sends them via WebSocket."""
-    try:
-        async for response in results_generator:
-            await websocket.send_json(response)
-    except Exception as e:
-        logger.warning(f"Error in WebSocket results handler: {e}")
-
-
-@app.websocket("/asr")
-async def websocket_endpoint(websocket: WebSocket):
-    audio_processor = AudioProcessor()
-
-    await websocket.accept()
-    logger.info("WebSocket connection opened.")
-            
-    results_generator = await audio_processor.create_tasks()
-    websocket_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
-
-    try:
-        while True:
-            message = await websocket.receive_bytes()
-            await audio_processor.process_audio(message)
-    except WebSocketDisconnect:
-        logger.warning("WebSocket disconnected.")
-    finally:
-        websocket_task.cancel()
-        await audio_processor.cleanup()
-        logger.info("WebSocket endpoint cleaned up.")
-
-if __name__ == "__main__":
-    import uvicorn
-    
-    temp_kit = WhisperLiveKit(transcription=False, diarization=False)
-    
-    uvicorn.run(
-        "whisper_fastapi_online_server:app", 
-        host=temp_kit.args.host, 
-        port=temp_kit.args.port, 
-        reload=True,
-        log_level="info"
-    )
--- a/whisperlivekit/audio_processor.py
+++ b/whisperlivekit/audio_processor.py
@@ -205,22 +205,10 @@ class AudioProcessor:
                    self.last_ffmpeg_activity = time()
                    continue

-                # Reduce timeout for reading from FFmpeg
-                try:
-                    chunk = await asyncio.wait_for(
-                        loop.run_in_executor(None, self.ffmpeg_process.stdout.read, buffer_size),
-                        timeout=5.0  # Shorter timeout (5 seconds instead of 15)
-                    )
-                    if chunk:
-                        self.last_ffmpeg_activity = time()
-                        
-                except asyncio.TimeoutError:
-                    logger.warning("FFmpeg read timeout. Restarting...")
-                    await self.restart_ffmpeg()
-                    beg = time()
+                chunk = await loop.run_in_executor(None, self.ffmpeg_process.stdout.read, buffer_size)
+                if chunk:
                    self.last_ffmpeg_activity = time()
-                    continue
-
+                        
                if not chunk:
                    logger.info("FFmpeg stdout closed.")
                    break
@@ -233,7 +221,7 @@ class AudioProcessor:
                        self.convert_pcm_to_float(self.pcm_buffer).copy()
                    )

-                # Process when we have enough data
+                # Process when enough data
                if len(self.pcm_buffer) >= self.bytes_per_sec:
                    if len(self.pcm_buffer) > self.max_bytes_per_sec:
                        logger.warning(
@@ -492,19 +480,40 @@ class AudioProcessor:
                if not self.ffmpeg_process or not hasattr(self.ffmpeg_process, 'stdin') or self.ffmpeg_process.poll() is not None:
                    logger.warning("FFmpeg process not available, restarting...")
                    await self.restart_ffmpeg()
-                    
-                self.ffmpeg_process.stdin.write(message)
-                self.ffmpeg_process.stdin.flush()
-                self.last_ffmpeg_activity = time()  # Update activity timestamp
-                return
                
+                loop = asyncio.get_running_loop()                
+                try:
+                    await asyncio.wait_for(
+                        loop.run_in_executor(None, lambda: self.ffmpeg_process.stdin.write(message)),
+                        timeout=2.0
+                    )
+                except asyncio.TimeoutError:
+                    logger.warning("FFmpeg write operation timed out, restarting...")
+                    await self.restart_ffmpeg()
+                    retry_count += 1
+                    continue
+                    
+                try:
+                    await asyncio.wait_for(
+                        loop.run_in_executor(None, self.ffmpeg_process.stdin.flush),
+                        timeout=2.0
+                    )
+                except asyncio.TimeoutError:
+                    logger.warning("FFmpeg flush operation timed out, restarting...")
+                    await self.restart_ffmpeg()
+                    retry_count += 1
+                    continue
+                    
+                self.last_ffmpeg_activity = time()
+                return
+                    
            except (BrokenPipeError, AttributeError, OSError) as e:
                retry_count += 1
                logger.warning(f"Error writing to FFmpeg: {e}. Retry {retry_count}/{max_retries}...")
                
                if retry_count < max_retries:
                    await self.restart_ffmpeg()
-                    await asyncio.sleep(0.5)  # Shorter pause between retries
+                    await asyncio.sleep(0.5)
                else:
                    logger.error("Maximum retries reached for FFmpeg process")
                    await self.restart_ffmpeg()
--- a/whisperlivekit/basic_server.py
+++ b/whisperlivekit/basic_server.py
@@ -3,12 +3,13 @@ from fastapi import FastAPI, WebSocket, WebSocketDisconnect
 from fastapi.responses import HTMLResponse
 from fastapi.middleware.cors import CORSMiddleware

-from whisperlivekit import WhisperLiveKit
+from whisperlivekit import WhisperLiveKit, parse_args
 from whisperlivekit.audio_processor import AudioProcessor

 import asyncio
 import logging
-import os
+import os, sys
+import argparse

 logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
 logging.getLogger().setLevel(logging.WARNING)
@@ -72,15 +73,31 @@ def main():
    """Entry point for the CLI command."""
    import uvicorn
    
-    temp_kit = WhisperLiveKit(transcription=False, diarization=False)
+    args = parse_args()
    
-    uvicorn.run(
-        "whisperlivekit.basic_server:app", 
-        host=temp_kit.args.host, 
-        port=temp_kit.args.port, 
-        reload=True,
-        log_level="info"
-    )
+    uvicorn_kwargs = {
+        "app": "whisperlivekit.basic_server:app",
+        "host":args.host, 
+        "port":args.port, 
+        "reload": False,
+        "log_level": "info",
+        "lifespan": "on",
+    }
+    
+    ssl_kwargs = {}
+    if args.ssl_certfile or args.ssl_keyfile:
+        if not (args.ssl_certfile and args.ssl_keyfile):
+            raise ValueError("Both --ssl-certfile and --ssl-keyfile must be specified together.")
+        ssl_kwargs = {
+            "ssl_certfile": args.ssl_certfile,
+            "ssl_keyfile": args.ssl_keyfile
+        }
+
+
+    if ssl_kwargs:
+        uvicorn_kwargs = {**uvicorn_kwargs, **ssl_kwargs}
+
+    uvicorn.run(**uvicorn_kwargs)

 if __name__ == "__main__":
    main()
--- a/whisperlivekit/core.py
+++ b/whisperlivekit/core.py
@@ -130,6 +130,9 @@ def parse_args():
        help="Set the log level",
        default="DEBUG",
    )
+    parser.add_argument("--ssl-certfile", type=str, help="Path to the SSL certificate file.", default=None)
+    parser.add_argument("--ssl-keyfile", type=str, help="Path to the SSL private key file.", default=None)
+

    args = parser.parse_args()
    
--- a/whisperlivekit/web/live_transcription.html
+++ b/whisperlivekit/web/live_transcription.html
@@ -321,7 +321,8 @@

        const host = window.location.hostname || "localhost";
        const port = window.location.port || "8000";
-        const defaultWebSocketUrl = `ws://${host}:${port}/asr`;
+        const protocol = window.location.protocol === "https:" ? "wss" : "ws";
+        const defaultWebSocketUrl = `${protocol}://${host}:${port}/asr`;
        websocketInput.value = defaultWebSocketUrl;
        websocketUrl = defaultWebSocketUrl;

--- a/whisperlivekit/whisper_streaming_custom/whisper_online.py
+++ b/whisperlivekit/whisper_streaming_custom/whisper_online.py
@@ -179,7 +179,7 @@ def warmup_asr(asr, warmup_file=None, timeout=5):
        logger.warning(f"Warmup file {warmup_file} invalid or missing.")
        return False
    
-    print(f"Warmping up Whisper with {warmup_file}")
+    print(f"Warming up Whisper with {warmup_file}")
    try:
        import librosa
        audio, sr = librosa.load(warmup_file, sr=16000)
Author	SHA1	Message	Date
Quentin Fuxa	c4150894af	Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web	2025-04-13 12:11:01 +02:00
Quentin Fuxa	25bf242ce1	bump version to 0.1.5	2025-04-13 12:10:53 +02:00
Quentin Fuxa	14cc601a5c	Update README.md	2025-04-13 11:07:53 +02:00
Quentin Fuxa	34d5d513fa	fix typo	2025-04-12 18:22:14 +02:00
Quentin Fuxa	2ab3dac948	remove whisper_fastapi_online_server.py	2025-04-12 18:21:04 +02:00
Quentin Fuxa	b56fcffde1	Solves stdin flushes blocking IO https://github.com/QuentinFuxa/WhisperLiveKit/issues/110 https://github.com/QuentinFuxa/WhisperLiveKit/issues/106 https://github.com/QuentinFuxa/WhisperLiveKit/issues/90 https://github.com/QuentinFuxa/WhisperLiveKit/issues/87 https://github.com/QuentinFuxa/WhisperLiveKit/issues/81 https://github.com/QuentinFuxa/WhisperLiveKit/issues/2	2025-04-12 15:25:46 +02:00
Quentin Fuxa	2def194893	add ssl certificate and key file arguments to parser	2025-04-11 12:20:22 +02:00
Quentin Fuxa	29978da301	adds ssl possibility in basic server	2025-04-11 12:20:08 +02:00
Quentin Fuxa	b708890788	protocol default to ws	2025-04-11 12:14:14 +02:00
Quentin Fuxa	3ac4c514cf	remove temp_kit method to get args. uvicorn reload to False for better perfs	2025-04-11 12:02:52 +02:00
Chris Margach	3c58bfcfa2	update readme for package launch with SSL	2025-04-10 13:47:09 +09:00
Chris Margach	d53b7a323a	update sample html to use wss in case of https	2025-04-10 13:46:52 +09:00
Chris Margach	02de5993e6	allow passing of cert and key locations to uvicorn via package	2025-04-10 13:42:30 +09:00