mirror of
https://github.com/QuentinFuxa/WhisperLiveKit.git
synced 2026-03-07 22:33:36 +00:00
default model is now base. default backend simulstreaming
This commit is contained in:
@@ -81,4 +81,4 @@ EXPOSE 8000
|
||||
ENTRYPOINT ["whisperlivekit-server", "--host", "0.0.0.0"]
|
||||
|
||||
# Default args
|
||||
CMD ["--model", "tiny.en"]
|
||||
CMD ["--model", "base"]
|
||||
40
README.md
40
README.md
@@ -4,7 +4,7 @@
|
||||
<img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/demo.png" alt="WhisperLiveKit Demo" width="730">
|
||||
</p>
|
||||
|
||||
<p align="center"><b>Real-time, Fully Local Speech-to-Text with Speaker Diarization</b></p>
|
||||
<p align="center"><b>Real-time, Fully Local Speech-to-Text with Speaker Identification</b></p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://pypi.org/project/whisperlivekit/"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/whisperlivekit?color=g"></a>
|
||||
@@ -14,7 +14,7 @@
|
||||
</p>
|
||||
|
||||
|
||||
WhisperLiveKit brings real-time speech transcription directly to your browser, with a ready-to-use backend+server and a simple frontend. ✨
|
||||
Real-time speech transcription directly to your browser, with a ready-to-use backend+server and a simple frontend. ✨
|
||||
|
||||
#### Powered by Leading Research:
|
||||
|
||||
@@ -25,20 +25,14 @@ WhisperLiveKit brings real-time speech transcription directly to your browser, w
|
||||
- [Silero VAD](https://github.com/snakers4/silero-vad) (2024) - Enterprise-grade Voice Activity Detection
|
||||
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Real-time Transcription** - Locally (or on-prem) convert speech to text instantly as you speak
|
||||
- **Speaker Diarization** - Identify different speakers in real-time.
|
||||
- **Voice Activity Detection** – Reduce processing overhead when no voice is detected.
|
||||
- **Punctuation-Based Speaker Splitting** - Align speaker changes with natural sentence boundaries.
|
||||
- **Confidence Validation** – Immediately validate high-confidence tokens (WhisperStreaming only)
|
||||
- **SimulStreaming Backend** - [Dual-licensed](https://github.com/ufal/SimulStreaming#-licence-and-contributions) - Ultra-low latency transcription using SOTA AlignAtt policy.
|
||||
- **Multi-User Support** - Handle multiple users simultaneously with a single backend/server
|
||||
> **Why not just run a simple Whisper model on every audio batch?** Whisper is designed for complete utterances, not real-time chunks. Processing small segments loses context, cuts off words mid-syllable, and produces poor transcription. WhisperLiveKit uses state-of-the-art simultaneous speech research for intelligent buffering and incremental processing.
|
||||
|
||||
|
||||
### Architecture
|
||||
|
||||
<img alt="Architecture" src="architecture.png" />
|
||||
<img alt="Architecture" src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/architecture.png" />
|
||||
|
||||
*The backend supports multiple concurrent users. Voice Activity Detection reduces overhead when no voice is detected.*
|
||||
|
||||
### Installation & Quick Start
|
||||
|
||||
@@ -57,14 +51,16 @@ pip install whisperlivekit
|
||||
#### Quick Start
|
||||
1. **Start the transcription server:**
|
||||
```bash
|
||||
whisperlivekit-server --model tiny.en
|
||||
whisperlivekit-server --model base --language en
|
||||
```
|
||||
|
||||
2. **Open your browser** and navigate to `http://localhost:8000`
|
||||
2. **Open your browser** and navigate to `http://localhost:8000`. Start speaking and watch your words appear in real-time!
|
||||
|
||||
3. **Start speaking** and watch your words appear in real-time!
|
||||
|
||||
> For HTTPS requirements, see the **Parameters** section for SSL configuration options.
|
||||
> - See [tokenizer.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py) for the list of all available languages.
|
||||
> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.
|
||||
|
||||
|
||||
|
||||
#### Optional Dependencies
|
||||
|
||||
@@ -151,18 +147,18 @@ The package includes an HTML/JavaScript implementation [here](https://github.com
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `--host` | Server host address | `localhost` |
|
||||
| `--port` | Server port | `8000` |
|
||||
| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
|
||||
| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |
|
||||
| `--model` | Whisper model size. | `tiny` |
|
||||
| `--model` | Whisper model size. | `small` |
|
||||
| `--language` | Source language code or `auto` | `en` |
|
||||
| `--task` | `transcribe` or `translate` | `transcribe` |
|
||||
| `--backend` | Processing backend | `faster-whisper` |
|
||||
| `--backend` | Processing backend | `simulstreaming` |
|
||||
| `--min-chunk-size` | Minimum audio chunk size (seconds) | `1.0` |
|
||||
| `--no-vac` | Disable Voice Activity Controller | `False` |
|
||||
| `--no-vad` | Disable Voice Activity Detection | `False` |
|
||||
| `--warmup-file` | Audio file path for model warmup | `jfk.wav` |
|
||||
| `--host` | Server host address | `localhost` |
|
||||
| `--port` | Server port | `8000` |
|
||||
| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
|
||||
| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |
|
||||
|
||||
|
||||
| WhisperStreaming backend options | Description | Default |
|
||||
|
||||
@@ -301,7 +301,7 @@ class AudioProcessor:
|
||||
transcription_lag_s = max(0.0, time() - self.beg_loop - self.end_buffer)
|
||||
asr_processing_logs = f"internal_buffer={asr_internal_buffer_duration_s:.2f}s | lag={transcription_lag_s:.2f}s |"
|
||||
if type(item) is Silence:
|
||||
asr_processing_logs += f" + Silence of = {item.duration :.2fs} | last_end = {self.tokens[-1].end} |"
|
||||
asr_processing_logs += f" + Silence of = {item.duration:.2f}s | last_end = {self.tokens[-1].end} |"
|
||||
logger.info(asr_processing_logs)
|
||||
|
||||
if type(item) is Silence:
|
||||
|
||||
@@ -82,7 +82,7 @@ def parse_args():
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
type=str,
|
||||
default="tiny",
|
||||
default="small",
|
||||
help="Name size of the Whisper model to use (default: tiny). Suggested values: tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large,large-v3-turbo. The model is automatically downloaded from the model hub if not present in model cache dir.",
|
||||
)
|
||||
|
||||
@@ -115,7 +115,7 @@ def parse_args():
|
||||
parser.add_argument(
|
||||
"--backend",
|
||||
type=str,
|
||||
default="faster-whisper",
|
||||
default="simulstreaming",
|
||||
choices=["faster-whisper", "whisper_timestamped", "mlx-whisper", "openai-api", "simulstreaming"],
|
||||
help="Load only this backend for Whisper processing.",
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user