mirror of
https://github.com/QuentinFuxa/WhisperLiveKit.git
synced 2026-03-07 14:23:18 +00:00
2.7 KiB
2.7 KiB
Available model sizes:
- tiny.en (english only)
- tiny
- base.en (english only)
- base
- small.en (english only)
- small
- medium.en (english only)
- medium
- large-v1
- large-v2
- large-v3
- large-v3-turbo
How to choose?
Language Support
- English only: Use
.enmodels for better accuracy and faster processing when you only need English transcription - Multilingual: Do not use
.enmodels.
Resource Constraints
- Limited GPU/CPU or need for very low latency: Choose
smallor smaller modelstiny: Fastest, lowest resource usage, acceptable quality for simple audiobase: Good balance of speed and accuracy for basic use casessmall: Better accuracy while still being resource-efficient
- Good resources available: Use
largemodels for best accuracylarge-v2: Excellent accuracy, good multilingual supportlarge-v3: Best overall accuracy and language support
Special Cases
- No translation needed: Use
large-v3-turbo- Same transcription quality as
large-v2but significantly faster - Important: Does not translate correctly, only transcribes
- Same transcription quality as
Model Comparison Table
| Model | Speed | Accuracy | Multilingual | Translation | Best Use Case |
|---|---|---|---|---|---|
| tiny(.en) | Fastest | Basic | Yes/No | Yes/No | Real-time, low resources |
| base(.en) | Fast | Good | Yes/No | Yes/No | Balanced performance |
| small(.en) | Medium | Better | Yes/No | Yes/No | Quality on limited hardware |
| medium(.en) | Slow | High | Yes/No | Yes/No | High quality, moderate resources |
| large-v2 | Slowest | Excellent | Yes | Yes | Best overall quality |
| large-v3 | Slowest | Excellent | Yes | Yes | Maximum accuracy |
| large-v3-turbo | Fast | Excellent | Yes | No | Fast, high-quality transcription |
Additional Considerations
Model Performance:
- Accuracy improves significantly from tiny to large models
- English-only models are ~10-15% more accurate for English audio
- Newer versions (v2, v3) have better punctuation and formatting
Hardware Requirements:
tiny: ~1GB VRAMbase: ~1GB VRAMsmall: ~2GB VRAMmedium: ~5GB VRAMlarge: ~10GB VRAM
Audio Quality Impact:
- Clean, clear audio: smaller models may suffice
- Noisy, accented, or technical audio: larger models recommended
- Phone/low-quality audio: use at least
smallmodel
Quick Decision Tree
- English only? → Add
.ento your choice - Limited resources or need speed? →
smallor smaller - Good hardware and want best quality? →
large-v3 - Need fast, high-quality transcription without translation? →
large-v3-turbo - Need translation capabilities? →
large-v2orlarge-v3(avoid turbo)