WhisperLiveKit

mirror of https://github.com/QuentinFuxa/WhisperLiveKit.git synced 2026-03-07 14:23:18 +00:00

Author	SHA1	Message	Date
Quentin Fuxa	8fedeb9fed	Merge pull request #340 from QuentinFuxa/voxtral_tests feat: voxtral-mlx backend, benchmark suite, unit tests, runtime metrics v0.2.19	2026-02-23 10:37:40 +01:00
Quentin Fuxa	b1fc23807a	docs: add benchmark collaboration call, voxtral in powered-by section	2026-02-23 10:37:22 +01:00
Quentin Fuxa	10c4e5f730	docs: add speed vs accuracy scatter plot to benchmark and README WER vs RTF scatter plot showing all backend/policy/model combos on the 30s English file. Sweet spot zone highlights the best tradeoffs. Added to both BENCHMARK.md and README.md.	2026-02-23 10:27:53 +01:00
Quentin Fuxa	c76b2ef2c6	docs: rewrite benchmark with base/small comparison, proper French results - Re-ran all whisper benchmarks with --lan fr for the French file (previously ran with --lan en which made the results meaningless) - Added small model results alongside base for all backends - Added model size comparison table (base vs small tradeoffs) - Added benchmark chart (30s English, WER + RTF by backend) - Added caveats section about dataset size and RTF variance - Key findings: SimulStreaming saturates at 5.3% WER on base already, small model mainly helps LocalAgreement and French timestamps - mlx-whisper LA base is unstable on French (hallucination loops)	2026-02-23 10:16:34 +01:00
Quentin Fuxa	4b2377c243	fix: correct false auto-detect claim, median bug, RTF inflation - BENCHMARK.md: whisper also supports --language auto, voxtral is not the only one. Fixed mlx-whisper speed comparison (LA is actually faster than SS for mlx-whisper, not comparable). - metrics.py: median calculation was wrong for even-length lists (took upper middle instead of averaging the two middle values). - metrics_collector.py: RTF was inflated because log_summary() used wall-clock elapsed time instead of sum of actual ASR call durations. - README.md: clarified that whisper also supports auto language detection, voxtral just does it better. - Added 2 new median tests (even + odd length).	2026-02-22 23:38:04 +01:00
Quentin Fuxa	a4da246ea5	feat: add voxtral-mlx native backend for Apple Silicon Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency speech transcription on Apple Silicon. Avoids the transformers/torch overhead and runs at 0.18-0.32x real-time factor. - voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder - voxtral_mlx/loader.py: model loading with 6-bit quantized weights - voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX - voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline	2026-02-22 23:28:10 +01:00
Quentin Fuxa	9b2c3ee844	docs: update README with voxtral backend, benchmarks, testing sections - Add Voxtral Backend section explaining voxtral-mlx and voxtral (HF). - Add Testing & Benchmarks section with commands to run tests/benchmarks. - Update --backend parameter docs to include voxtral-mlx and voxtral. - Update optional dependencies table with Voxtral entry. - Link to BENCHMARK.md for detailed performance comparisons.	2026-02-22 23:27:57 +01:00
Quentin Fuxa	83d0fa3fac	feat: benchmark suite with WER, timestamp accuracy, cross-backend comparison - Extend test_backend_offline.py with WER and timestamp accuracy metrics computed via whisperlivekit.metrics against ground truth transcripts. - Add --benchmark flag to auto-detect all installed backends and run each (backend, policy) combination in sequence. - Add --policy flag to override the streaming policy. - Add detect_available_backends() probing faster-whisper, mlx-whisper, voxtral-mlx, voxtral (HF), and openai-whisper. - Add print_cross_backend_comparison() with per-combo averages. - Add run_benchmark.py for comprehensive multi-model benchmarking. - Add BENCHMARK.md with full results on Apple M4: speed, WER, timestamp accuracy, VAC impact, and recommendations. - Add ground truth transcript JSON files for all audio test files.	2026-02-22 23:27:50 +01:00
Quentin Fuxa	5a12c627b4	feat: add 99-test unit test suite with zero model dependencies Test suite covering: - metrics.py: WER computation, timestamp accuracy, text normalization - config.py: defaults, .en model detection, policy aliases, from_namespace - timed_objects.py: ASRToken, Silence, Transcript, Segment, FrontData - hypothesis_buffer.py: insert, flush, LCP matching, pop_committed - silence_handling.py: state machine, double-counting regression test - audio_processor.py: async pipeline with MockOnlineProcessor All tests run in ~1.3s without downloading any ASR models. Add pytest and pytest-asyncio as optional test dependencies. Update .gitignore to allow tests/ directory.	2026-02-22 23:27:40 +01:00
Quentin Fuxa	f5eee67b11	fix: silence double-counting bug, add metrics module and runtime instrumentation - Fix _begin_silence pushing same object reference as _end_silence, causing the consumer to process two ended events and double the silence duration. - Fix initial silence never cleared when VAC is disabled, causing the no-VAC path to enqueue zero audio. - Add sample-precise silence boundaries (at_sample parameter). - Add whisperlivekit/metrics.py with WER computation (word-level Levenshtein) and timestamp accuracy (greedy alignment). No external dependencies. - Add whisperlivekit/metrics_collector.py with SessionMetrics dataclass for per-session runtime observability. Instrumented at 6 points in AudioProcessor: init, process_audio, transcription_processor, _end_silence, results_formatter, cleanup. Emits SESSION_METRICS structured log line on session end.	2026-02-22 23:27:12 +01:00
Quentin Fuxa	4a6868e3e1	correct processor attributes mixtral	2026-02-22 21:13:21 +01:00
Quentin Fuxa	3c15246fc0	mixstral hf v0	2026-02-20 20:49:57 +01:00
Quentin Fuxa	d337248fda	feat: add healthcheck to Dockerfiles (#228 )	2026-02-20 20:48:28 +01:00
Quentin Fuxa	b8d9d7d289	fix: handle numpy object_ dtype from ctranslate2 encoder (#337 )	2026-02-20 20:48:28 +01:00
Quentin Fuxa	4c7706e2cf	fix: use vac_chunk_size for audio processing interval when VAC is enabled (#334 )	2026-02-20 20:48:06 +01:00
Quentin Fuxa	7f3a3df620	simulstreaming mlx & torch dedup of common base	2025-02-15 23:52:00 +01:00
Quentin Fuxa	e7e82f7c19	bump to 0.2.18 0.2.18	2026-02-11 22:10:00 +01:00
Quentin Fuxa	8c799fa4d1	fix simulstreaming vram leak: cap cross-attn accumulation + token budget fixes #283, fixes #275 - accumulated_cross_attns was growing unboundedly during decoding loop, using up to ~5GB for repetition loops. now capped to rolling window of 16 - max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50) instead of actual text token rate (~15/s), allowing 10-40x too many decoding steps - removed unused torch.cat on early return path - removed dead self.committed/last_result_tokens lists (never read) - same fixes applied to mlx variant	2026-02-11 22:10:00 +01:00
Quentin Fuxa	8923337380	fix --direct-english-translation not setting task=translate for localagreement backends the flag was only used for tokenizer language selection but never actually passed to whisper/faster-whisper transcribe calls. also init OpenaiApiASR.task and read from transcribe_kargs. fixes #306	2026-02-11 22:10:00 +01:00
Quentin Fuxa	aded1649ae	fix model_cache_dir + direct_english_translation task in simulstreaming pass actual cache dir instead of None, and use proper task string instead of boolean for AlignAttConfig fixes #310	2026-02-11 22:10:00 +01:00
Quentin Fuxa	3b535e857a	fix NoneType concatenation in add_translation fixes #296	2026-02-11 22:10:00 +01:00
Quentin Fuxa	d649250b9a	fix Segment classmethod call + isinstance type narrowing fixes #331, fixes #329	2026-02-11 22:10:00 +01:00
Quentin Fuxa	7735478286	add insert_audio_chunk to DiartDiarization fixes #332	2026-02-11 22:10:00 +01:00
Quentin Fuxa	b9e72d2b9a	add probability field to ASRToken fixes #330, fixes #313	2026-02-11 22:10:00 +01:00
Quentin Fuxa	e5b01033af	add json normalizers for english language in build	2026-01-16 10:47:46 +01:00
Quentin Fuxa	6ae545bcb1	bump to 0.2.17.post1 0.2.17.post1	2026-01-16 10:43:52 +01:00
Quentin Fuxa	04980d3f5e	Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit	2026-01-16 10:38:29 +01:00
Quentin Fuxa	79a705c969	fixes #323	2026-01-16 10:38:07 +01:00
Quentin Fuxa	34e4abd455	Merge pull request #322 from eschmidbauer/fix/thread-safety-issues Fix kv cache not being properly cleaned between sessions	2026-01-09 19:23:35 +01:00
Emmanuel Schmidbauer	d59ddbaeae	Fix critical thread safety issues	2026-01-09 11:23:19 -05:00
Quentin Fuxa	4dd66e7766	Merge pull request #317 from jantonj/fix-bug-diarization-lag update diarization lag after stream analysed	2025-12-19 17:43:07 +01:00
Anton Jacobson	3db5d81a20	update diarization lag after stream analysed	2025-12-18 14:13:28 +01:00
Quentin Fuxa	b67ddea494	bump to 0.2.17 0.2.17	2025-12-08 23:52:00 +01:00
Quentin Fuxa	3192553e20	fixes #307	2025-12-09 10:27:49 +01:00
Quentin Fuxa	f379a243fe	Merge pull request #274 from blakkd/patch-1 minor path change	2025-12-09 10:10:32 +01:00
Quentin Fuxa	ec09898a9f	fixes #301	2025-12-06 10:19:50 +01:00
blakkd	befbae56c7	minor path change prevents ``` FileNotFoundError: [Errno 2] No such file or directory: 'whisperlivekit/web/live_transcription.html' ```	2025-11-16 23:47:58 +01:00
Quentin Fuxa	bbd4fd6cff	Merge branch 'improve_EOS_handling'	2025-11-16 22:30:31 +01:00
Quentin Fuxa	28985962a0	Silence handling: finish transcription even if not validated at the BEGINNING of the silence	2025-11-16 22:29:08 +01:00
Quentin Fuxa	a38c103fcd	simulstreaming coreml encoder compatibility	2025-11-16 21:24:14 +01:00
Quentin Fuxa	4d2ffb24f8	coreml conversion	2025-11-16 19:11:43 +01:00
Quentin Fuxa	1bbbb7903c	lora loader in shared whisper core	2025-11-16 18:44:35 +01:00
Quentin Fuxa	bcffdbc6b3	bump to 0.2.14 0.2.14	2025-11-15 20:19:09 +01:00
Quentin Fuxa	80b77998f9	Refactor backend handling	2025-11-15 19:51:41 +01:00
Quentin Fuxa	d310f7e25f	hf compatibility	2025-11-15 18:34:19 +01:00
Quentin Fuxa	8d9be88fe6	translation buffer is now displayed in frontend	2025-11-10 15:22:26 +01:00
Quentin Fuxa	16461052ed	task to direct-english-translation	2025-11-10 13:20:26 +01:00
Quentin Fuxa	5491dbd824	last_validated_token handled in state	2025-11-10 13:18:52 +01:00
Quentin Fuxa	13401ffe24	whisper core at root of wlk	2025-11-10 12:17:18 +01:00
Quentin Fuxa	7108d2ddc5	fixes https://github.com/QuentinFuxa/WhisperLiveKit/issues/269	2025-11-09 20:08:18 +01:00

1 2 3 4 5 ...

697 Commits