WhisperLiveKit

mirror of https://github.com/QuentinFuxa/WhisperLiveKit.git synced 2026-04-25 23:57:17 +00:00

Author	SHA1	Message	Date
Quentin Fuxa	8dc7b77071	Bump version to 0.2.20 v0.2.20	2026-03-08 16:02:00 +01:00
Quentin Fuxa	10d85ff65f	Update docs, CI, and architecture diagram	2026-03-08 15:14:00 +01:00
Quentin Fuxa	e7e3441ca4	Add Qwen3 ASR backend	2026-03-07 11:48:00 +01:00
Quentin Fuxa	9abe26a996	Add CLI with serve, transcribe, listen, pull, diagnose	2026-03-01 13:37:00 +01:00
Quentin Fuxa	c8e7c216ed	Replace mock tests with real pipeline tests	2026-02-28 10:05:00 +01:00
Quentin Fuxa	586540ae36	Add test harness and test client	2026-02-22 16:19:00 +01:00
Quentin Fuxa	cd8df8e1aa	Update package setup and exports	2026-02-21 11:33:00 +01:00
Quentin Fuxa	e30f9a2573	Improve diarization backends	2026-02-15 14:55:00 +01:00
Quentin Fuxa	32de7b1276	Fix frontend buffer rendering for slow backends	2026-02-14 09:28:00 +01:00
Quentin Fuxa	9ac7c26a0b	Add OpenAI REST API and Deepgram WebSocket	2026-02-08 15:42:00 +01:00
Quentin Fuxa	c0e2600993	Add snapshot-then-diff WebSocket protocol	2026-02-07 10:17:00 +01:00
Quentin Fuxa	e0db3a98f9	Add per-session language proxy	2026-02-01 17:03:00 +01:00
Quentin Fuxa	2fe34427ef	Fix voxtral streaming drain and silence flush	2026-01-31 11:12:00 +01:00
Quentin Fuxa	d58365421f	Refactor audio processor async pipeline	2026-01-25 13:48:00 +01:00
Quentin Fuxa	a282cbe75f	Improve tokens alignment and silence handling	2026-01-24 10:55:00 +01:00
Quentin Fuxa	6e85c16614	Refactor TranscriptionEngine singleton	2026-01-18 15:27:00 +01:00
Quentin Fuxa	e1823dd99c	Improve online ASR processor	2026-01-17 09:35:00 +01:00
Quentin Fuxa	e144abbbc7	Refactor timed objects and data structures	2026-01-11 16:08:00 +01:00
Quentin Fuxa	83362c89c4	Clean up config and model paths	2026-01-10 11:42:00 +01:00
Quentin Fuxa	74c4dc791d	Lint scripts and tests	2026-01-04 14:15:00 +01:00
Quentin Fuxa	cf6c49f502	Ruff lint cleanup	2026-01-03 10:23:00 +01:00
Quentin Fuxa	451535d48f	Fix ctranslate2 encoder conversion (#345 ) and memory leak in TokensAlignment (#344 ) - Add fallback chain for StorageView to numpy conversion - Prune old tokens/segments after 5min to bound memory	2026-03-10 22:37:00 +01:00
Quentin Fuxa	8bc0937c46	Update README section on powered research	2026-03-06 18:46:07 +01:00
Quentin Fuxa	929cf7a26b	add link to AlignAtt interactive playground	2026-03-06 18:43:25 +01:00
Quentin Fuxa	abfaf06203	Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit	2026-03-04 18:17:23 +01:00
Quentin Fuxa	d1fe932241	Apply DRY method v0 - to try to catch and resolve infinite loops such as in #338	2026-03-03 22:52:00 +01:00
Quentin Fuxa	c112ceffb6	Merge pull request #342 from mnicnc404/fix/whisper-tokenizer-index-error fix(whisper/tokenizer): prevent IndexError from crashing multilingual…	2026-03-02 20:36:58 +01:00
Quentin Fuxa	4917406e06	Merge pull request #341 from AymurAI/feat/uv-deps-resolution deps/docker: align python support, deterministic deps resolution & docker images releases	2026-03-02 20:34:49 +01:00
Chingning Chen	b63f54e838	fix(whisper/tokenizer): prevent IndexError from crashing multilingual streams This fix addresses a critical bug in the Whisper tokenizer that causes the transcription server to crash with an `IndexError: string index out of range` when streaming audio in languages utilizing multi-byte UTF-8 characters (e.g., Cantonese, Japanese, Mandarin). When a 3-byte character is cut off at the boundary of an audio chunk, incomplete bytes are decoded into a single Unicode replacement character (`\ufffd`), artificially shortening the string and breaking the offset mapping assumed by `split_tokens_on_unicode`. This ports the upstream fix from SYSTRAN/faster-whisper (PR #111) to add a strict bounds check before accessing the string index, allowing incomplete bytes to be safely caught and handled in the next chunk.	2026-03-02 15:31:43 +08:00
jedzill4	c56a53fbf4	deps(mlx-groups): add optional dependencies for Apple Silicon MLX backends	2026-03-01 20:05:52 -03:00
Quentin Fuxa	66e58624b9	disable MLXAlignAtt which fails on special characters	2026-03-01 11:52:00 +01:00
jedzill4	9366e067f9	deps(pyproject): add torch and torchaudio to main dependencies	2026-02-27 19:19:18 -03:00
jedzill4	866c25670c	deps(docker): change CUDA base image to runtime version	2026-02-27 19:16:29 -03:00
jedzill4	2553ef283e	deps(docker): fix dependency group for cu129 image - Changed the extras for cu129-diarization-sortformer from gpu-cu129 to cu129. - This aligns the dependency with the correct naming convention for consistency.	2026-02-25 21:49:08 -03:00
jedzill4	73e7fafc48	feat(tests): python matrix support test - Introduced a new argument for selecting the diarization backend in the engine creation. - Enhanced the `create_engine` function to accept and utilize the specified diarization backend. - Updated the test runner to accommodate the new backend option for improved flexibility.	2026-02-25 21:35:41 -03:00
jedzill4	bbcebcb1fe	deps(sortformer): adjust nemo-toolkit version constraints - Updated the version constraint for `diarization-sortformer` to restrict it to Python 3.10 and below.	2026-02-25 21:33:00 -03:00
jedzill4	4bb58dc7aa	deps(diart): improve diart dependency tree. rename gpu-cu129 dependency group to cu129	2026-02-25 20:27:26 -03:00
jedzill4	27ca028479	ci(github): add GitHub Actions workflows for Docker image publishing and support matrix - Introduced a workflow to publish Docker images on tag push and manual triggers. - Added a support matrix workflow to test across multiple OS and Python versions.	2026-02-25 14:27:51 -03:00
jedzill4	d24805cc18	🚀 chore (docker): update docker images improving caching and using uv as python package manager	2026-02-25 14:22:43 -03:00
jedzill4	994ce21365	📌 chore(deps): pin dependences to python 3.11 to 3.13 due dependency resolution matrix	2026-02-25 14:21:19 -03:00
jedzill4	132823dc09	deps: improve deps dependency resolution (wip)	2026-02-24 20:15:53 -03:00
jedzill4	d6d8c2635f	chore: use uv as python project manager to improve dependency resolution	2026-02-23 22:16:32 -03:00
Quentin Fuxa	8fedeb9fed	Merge pull request #340 from QuentinFuxa/voxtral_tests feat: voxtral-mlx backend, benchmark suite, unit tests, runtime metrics v0.2.19	2026-02-23 10:37:40 +01:00
Quentin Fuxa	b1fc23807a	docs: add benchmark collaboration call, voxtral in powered-by section	2026-02-23 10:37:22 +01:00
Quentin Fuxa	10c4e5f730	docs: add speed vs accuracy scatter plot to benchmark and README WER vs RTF scatter plot showing all backend/policy/model combos on the 30s English file. Sweet spot zone highlights the best tradeoffs. Added to both BENCHMARK.md and README.md.	2026-02-23 10:27:53 +01:00
Quentin Fuxa	c76b2ef2c6	docs: rewrite benchmark with base/small comparison, proper French results - Re-ran all whisper benchmarks with --lan fr for the French file (previously ran with --lan en which made the results meaningless) - Added small model results alongside base for all backends - Added model size comparison table (base vs small tradeoffs) - Added benchmark chart (30s English, WER + RTF by backend) - Added caveats section about dataset size and RTF variance - Key findings: SimulStreaming saturates at 5.3% WER on base already, small model mainly helps LocalAgreement and French timestamps - mlx-whisper LA base is unstable on French (hallucination loops)	2026-02-23 10:16:34 +01:00
Quentin Fuxa	4b2377c243	fix: correct false auto-detect claim, median bug, RTF inflation - BENCHMARK.md: whisper also supports --language auto, voxtral is not the only one. Fixed mlx-whisper speed comparison (LA is actually faster than SS for mlx-whisper, not comparable). - metrics.py: median calculation was wrong for even-length lists (took upper middle instead of averaging the two middle values). - metrics_collector.py: RTF was inflated because log_summary() used wall-clock elapsed time instead of sum of actual ASR call durations. - README.md: clarified that whisper also supports auto language detection, voxtral just does it better. - Added 2 new median tests (even + odd length).	2026-02-22 23:38:04 +01:00
Quentin Fuxa	a4da246ea5	feat: add voxtral-mlx native backend for Apple Silicon Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency speech transcription on Apple Silicon. Avoids the transformers/torch overhead and runs at 0.18-0.32x real-time factor. - voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder - voxtral_mlx/loader.py: model loading with 6-bit quantized weights - voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX - voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline	2026-02-22 23:28:10 +01:00
Quentin Fuxa	9b2c3ee844	docs: update README with voxtral backend, benchmarks, testing sections - Add Voxtral Backend section explaining voxtral-mlx and voxtral (HF). - Add Testing & Benchmarks section with commands to run tests/benchmarks. - Update --backend parameter docs to include voxtral-mlx and voxtral. - Update optional dependencies table with Voxtral entry. - Link to BENCHMARK.md for detailed performance comparisons.	2026-02-22 23:27:57 +01:00
Quentin Fuxa	83d0fa3fac	feat: benchmark suite with WER, timestamp accuracy, cross-backend comparison - Extend test_backend_offline.py with WER and timestamp accuracy metrics computed via whisperlivekit.metrics against ground truth transcripts. - Add --benchmark flag to auto-detect all installed backends and run each (backend, policy) combination in sequence. - Add --policy flag to override the streaming policy. - Add detect_available_backends() probing faster-whisper, mlx-whisper, voxtral-mlx, voxtral (HF), and openai-whisper. - Add print_cross_backend_comparison() with per-combo averages. - Add run_benchmark.py for comprehensive multi-model benchmarking. - Add BENCHMARK.md with full results on Apple M4: speed, WER, timestamp accuracy, VAC impact, and recommendations. - Add ground truth transcript JSON files for all audio test files.	2026-02-22 23:27:50 +01:00

1 2 3 4 5 ...

739 Commits