diff --git a/README.md b/README.md index 8c7b26d..6fcddc4 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,20 @@ -# Whisper Streaming with FastAPI and WebSocket Integration +# Whisper Streaming with FastAPI & WebSocket Integration -This project extends the [Whisper Streaming](https://github.com/ufal/whisper_streaming) implementation by incorporating few extras. The enhancements include: +A feature-packed fork of [Whisper Streaming](https://github.com/ufal/whisper_streaming) with **real-time speech-to-text (STT) enhancements**, multi-user support, and a JavaScript client ποΈβ¨ -1. **FastAPI Server with WebSocket Endpoint**: Real-time STT in browsers. Audio chunks processed via FFmpeg async streaming process. +## What's New? -2. **Buffering preview**: Enhances streaming feedback by displaying the unvalidated buffer content. +β **FastAPI Server with WebSocket Endpoint** β Enables real-time STT in browsers with async FFmpeg processing. +β **Buffering Preview** β Displays unvalidated buffer content for better streaming feedback. +β **Multiple Users Support** β The backend handles multiple users simultaneously without conflicts. +β **HTML - JavaScript Client Implementation** β A plug-and-play MediaRecorder setup for seamless client integration. +β **MLX Whisper Backend** β Optimized Apple Silicon support for faster local processing. +β **Enhanced sentence segmentation** β Improves buffer trimming and sentence boundaries in certain languages +β **Diarization (Beta)** β Real-time speaker labeling using [Diart](https://github.com/juanmc2005/diart). -3. **Multiple users**: The backend can support multiple users simultaneously without conflicts. - -4. **Javascript Client implementation**: MediaRecorder implementation that can be copied on your client side. - -5. **MLX Whisper backend**: Integrates the alternative backend option MLX Whisper, optimized for efficient speech recognition on Apple silicon. - -6. **Diarization (beta)**: Adds speaker labeling in real-time alongside transcription using the [Diart](https://github.com/juanmc2005/diart) library. Each transcription segment is tagged with a speaker. - - - -## Code Origins - -This project reuses and extends code from the original Whisper Streaming repository: -- whisper_online.py, backends.py and online_asr.py: Contains code from whisper_streaming -- silero_vad_iterator.py: Originally from the Silero VAD repository, included in the whisper_streaming project. +
+
+