717 Commits

Author SHA1 Message Date
Quentin Fuxa
8bc0937c46 Update README section on powered research 2026-03-06 18:46:07 +01:00
Quentin Fuxa
929cf7a26b add link to AlignAtt interactive playground 2026-03-06 18:43:25 +01:00
Quentin Fuxa
abfaf06203 Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit 2026-03-04 18:17:23 +01:00
Quentin Fuxa
d1fe932241 Apply DRY method v0 - to try to catch and resolve infinite loops such as in #338 2026-03-03 22:52:00 +01:00
Quentin Fuxa
c112ceffb6 Merge pull request #342 from mnicnc404/fix/whisper-tokenizer-index-error
fix(whisper/tokenizer): prevent IndexError from crashing multilingual…
2026-03-02 20:36:58 +01:00
Quentin Fuxa
4917406e06 Merge pull request #341 from AymurAI/feat/uv-deps-resolution
deps/docker: align python support, deterministic deps resolution & docker images releases
2026-03-02 20:34:49 +01:00
Chingning Chen
b63f54e838 fix(whisper/tokenizer): prevent IndexError from crashing multilingual streams
This fix addresses a critical bug in the Whisper tokenizer that causes
the transcription server to crash with an `IndexError: string index out
of range` when streaming audio in languages utilizing multi-byte UTF-8
characters (e.g., Cantonese, Japanese, Mandarin).

When a 3-byte character is cut off at the boundary of an audio chunk,
incomplete bytes are decoded into a single Unicode replacement character
(`\ufffd`), artificially shortening the string and breaking the offset
mapping assumed by `split_tokens_on_unicode`.

This ports the upstream fix from SYSTRAN/faster-whisper (PR #111) to add
a strict bounds check before accessing the string index, allowing
incomplete bytes to be safely caught and handled in the next chunk.
2026-03-02 15:31:43 +08:00
jedzill4
c56a53fbf4 deps(mlx-groups): add optional dependencies for Apple Silicon MLX backends 2026-03-01 20:05:52 -03:00
Quentin Fuxa
66e58624b9 disable MLXAlignAtt which fails on special characters 2026-03-01 11:52:00 +01:00
jedzill4
9366e067f9 deps(pyproject): add torch and torchaudio to main dependencies 2026-02-27 19:19:18 -03:00
jedzill4
866c25670c deps(docker): change CUDA base image to runtime version 2026-02-27 19:16:29 -03:00
jedzill4
2553ef283e deps(docker): fix dependency group for cu129 image
- Changed the extras for cu129-diarization-sortformer from gpu-cu129 to cu129.
- This aligns the dependency with the correct naming convention for consistency.
2026-02-25 21:49:08 -03:00
jedzill4
73e7fafc48 feat(tests): python matrix support test
- Introduced a new argument for selecting the diarization backend in the engine creation.
- Enhanced the `create_engine` function to accept and utilize the specified diarization backend.
- Updated the test runner to accommodate the new backend option for improved flexibility.
2026-02-25 21:35:41 -03:00
jedzill4
bbcebcb1fe deps(sortformer): adjust nemo-toolkit version constraints
- Updated the version constraint for `diarization-sortformer` to restrict it to Python 3.10 and below.
2026-02-25 21:33:00 -03:00
jedzill4
4bb58dc7aa deps(diart): improve diart dependency tree. rename gpu-cu129 dependency group to cu129 2026-02-25 20:27:26 -03:00
jedzill4
27ca028479 ci(github): add GitHub Actions workflows for Docker image publishing and support matrix
- Introduced a workflow to publish Docker images on tag push and manual triggers.
- Added a support matrix workflow to test across multiple OS and Python versions.
2026-02-25 14:27:51 -03:00
jedzill4
d24805cc18 🚀 chore (docker): update docker images improving caching and using uv as python package manager 2026-02-25 14:22:43 -03:00
jedzill4
994ce21365 📌 chore(deps): pin dependences to python 3.11 to 3.13 due dependency resolution matrix 2026-02-25 14:21:19 -03:00
jedzill4
132823dc09 deps: improve deps dependency resolution (wip) 2026-02-24 20:15:53 -03:00
jedzill4
d6d8c2635f chore: use uv as python project manager to improve dependency resolution 2026-02-23 22:16:32 -03:00
Quentin Fuxa
8fedeb9fed Merge pull request #340 from QuentinFuxa/voxtral_tests
feat: voxtral-mlx backend, benchmark suite, unit tests, runtime metrics
v0.2.19
2026-02-23 10:37:40 +01:00
Quentin Fuxa
b1fc23807a docs: add benchmark collaboration call, voxtral in powered-by section 2026-02-23 10:37:22 +01:00
Quentin Fuxa
10c4e5f730 docs: add speed vs accuracy scatter plot to benchmark and README
WER vs RTF scatter plot showing all backend/policy/model combos
on the 30s English file. Sweet spot zone highlights the best
tradeoffs. Added to both BENCHMARK.md and README.md.
2026-02-23 10:27:53 +01:00
Quentin Fuxa
c76b2ef2c6 docs: rewrite benchmark with base/small comparison, proper French results
- Re-ran all whisper benchmarks with --lan fr for the French file
  (previously ran with --lan en which made the results meaningless)
- Added small model results alongside base for all backends
- Added model size comparison table (base vs small tradeoffs)
- Added benchmark chart (30s English, WER + RTF by backend)
- Added caveats section about dataset size and RTF variance
- Key findings: SimulStreaming saturates at 5.3% WER on base already,
  small model mainly helps LocalAgreement and French timestamps
- mlx-whisper LA base is unstable on French (hallucination loops)
2026-02-23 10:16:34 +01:00
Quentin Fuxa
4b2377c243 fix: correct false auto-detect claim, median bug, RTF inflation
- BENCHMARK.md: whisper also supports --language auto, voxtral is not
  the only one. Fixed mlx-whisper speed comparison (LA is actually
  faster than SS for mlx-whisper, not comparable).
- metrics.py: median calculation was wrong for even-length lists
  (took upper middle instead of averaging the two middle values).
- metrics_collector.py: RTF was inflated because log_summary() used
  wall-clock elapsed time instead of sum of actual ASR call durations.
- README.md: clarified that whisper also supports auto language
  detection, voxtral just does it better.
- Added 2 new median tests (even + odd length).
2026-02-22 23:38:04 +01:00
Quentin Fuxa
a4da246ea5 feat: add voxtral-mlx native backend for Apple Silicon
Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency
speech transcription on Apple Silicon. Avoids the transformers/torch
overhead and runs at 0.18-0.32x real-time factor.

- voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder
- voxtral_mlx/loader.py: model loading with 6-bit quantized weights
- voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX
- voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline
2026-02-22 23:28:10 +01:00
Quentin Fuxa
9b2c3ee844 docs: update README with voxtral backend, benchmarks, testing sections
- Add Voxtral Backend section explaining voxtral-mlx and voxtral (HF).
- Add Testing & Benchmarks section with commands to run tests/benchmarks.
- Update --backend parameter docs to include voxtral-mlx and voxtral.
- Update optional dependencies table with Voxtral entry.
- Link to BENCHMARK.md for detailed performance comparisons.
2026-02-22 23:27:57 +01:00
Quentin Fuxa
83d0fa3fac feat: benchmark suite with WER, timestamp accuracy, cross-backend comparison
- Extend test_backend_offline.py with WER and timestamp accuracy metrics
  computed via whisperlivekit.metrics against ground truth transcripts.
- Add --benchmark flag to auto-detect all installed backends and run
  each (backend, policy) combination in sequence.
- Add --policy flag to override the streaming policy.
- Add detect_available_backends() probing faster-whisper, mlx-whisper,
  voxtral-mlx, voxtral (HF), and openai-whisper.
- Add print_cross_backend_comparison() with per-combo averages.
- Add run_benchmark.py for comprehensive multi-model benchmarking.
- Add BENCHMARK.md with full results on Apple M4: speed, WER,
  timestamp accuracy, VAC impact, and recommendations.
- Add ground truth transcript JSON files for all audio test files.
2026-02-22 23:27:50 +01:00
Quentin Fuxa
5a12c627b4 feat: add 99-test unit test suite with zero model dependencies
Test suite covering:
- metrics.py: WER computation, timestamp accuracy, text normalization
- config.py: defaults, .en model detection, policy aliases, from_namespace
- timed_objects.py: ASRToken, Silence, Transcript, Segment, FrontData
- hypothesis_buffer.py: insert, flush, LCP matching, pop_committed
- silence_handling.py: state machine, double-counting regression test
- audio_processor.py: async pipeline with MockOnlineProcessor

All tests run in ~1.3s without downloading any ASR models.
Add pytest and pytest-asyncio as optional test dependencies.
Update .gitignore to allow tests/ directory.
2026-02-22 23:27:40 +01:00
Quentin Fuxa
f5eee67b11 fix: silence double-counting bug, add metrics module and runtime instrumentation
- Fix _begin_silence pushing same object reference as _end_silence,
  causing the consumer to process two ended events and double the
  silence duration.
- Fix initial silence never cleared when VAC is disabled, causing
  the no-VAC path to enqueue zero audio.
- Add sample-precise silence boundaries (at_sample parameter).
- Add whisperlivekit/metrics.py with WER computation (word-level
  Levenshtein) and timestamp accuracy (greedy alignment). No
  external dependencies.
- Add whisperlivekit/metrics_collector.py with SessionMetrics
  dataclass for per-session runtime observability. Instrumented
  at 6 points in AudioProcessor: init, process_audio,
  transcription_processor, _end_silence, results_formatter, cleanup.
  Emits SESSION_METRICS structured log line on session end.
2026-02-22 23:27:12 +01:00
Quentin Fuxa
4a6868e3e1 correct processor attributes mixtral 2026-02-22 21:13:21 +01:00
Quentin Fuxa
3c15246fc0 mixstral hf v0 2026-02-20 20:49:57 +01:00
Quentin Fuxa
d337248fda feat: add healthcheck to Dockerfiles (#228) 2026-02-20 20:48:28 +01:00
Quentin Fuxa
b8d9d7d289 fix: handle numpy object_ dtype from ctranslate2 encoder (#337) 2026-02-20 20:48:28 +01:00
Quentin Fuxa
4c7706e2cf fix: use vac_chunk_size for audio processing interval when VAC is enabled (#334) 2026-02-20 20:48:06 +01:00
Quentin Fuxa
7f3a3df620 simulstreaming mlx & torch dedup of common base 2025-02-15 23:52:00 +01:00
Quentin Fuxa
e7e82f7c19 bump to 0.2.18 0.2.18 2026-02-11 22:10:00 +01:00
Quentin Fuxa
8c799fa4d1 fix simulstreaming vram leak: cap cross-attn accumulation + token budget
fixes #283, fixes #275

- accumulated_cross_attns was growing unboundedly during decoding loop,
  using up to ~5GB for repetition loops. now capped to rolling window of 16
- max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50)
  instead of actual text token rate (~15/s), allowing 10-40x too many
  decoding steps
- removed unused torch.cat on early return path
- removed dead self.committed/last_result_tokens lists (never read)
- same fixes applied to mlx variant
2026-02-11 22:10:00 +01:00
Quentin Fuxa
8923337380 fix --direct-english-translation not setting task=translate for localagreement backends
the flag was only used for tokenizer language selection but never
actually passed to whisper/faster-whisper transcribe calls. also init
OpenaiApiASR.task and read from transcribe_kargs.

fixes #306
2026-02-11 22:10:00 +01:00
Quentin Fuxa
aded1649ae fix model_cache_dir + direct_english_translation task in simulstreaming
pass actual cache dir instead of None, and use proper task string
instead of boolean for AlignAttConfig

fixes #310
2026-02-11 22:10:00 +01:00
Quentin Fuxa
3b535e857a fix NoneType concatenation in add_translation
fixes #296
2026-02-11 22:10:00 +01:00
Quentin Fuxa
d649250b9a fix Segment classmethod call + isinstance type narrowing
fixes #331, fixes #329
2026-02-11 22:10:00 +01:00
Quentin Fuxa
7735478286 add insert_audio_chunk to DiartDiarization
fixes #332
2026-02-11 22:10:00 +01:00
Quentin Fuxa
b9e72d2b9a add probability field to ASRToken
fixes #330, fixes #313
2026-02-11 22:10:00 +01:00
Quentin Fuxa
e5b01033af add json normalizers for english language in build 2026-01-16 10:47:46 +01:00
Quentin Fuxa
6ae545bcb1 bump to 0.2.17.post1 0.2.17.post1 2026-01-16 10:43:52 +01:00
Quentin Fuxa
04980d3f5e Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit 2026-01-16 10:38:29 +01:00
Quentin Fuxa
79a705c969 fixes #323 2026-01-16 10:38:07 +01:00
Quentin Fuxa
34e4abd455 Merge pull request #322 from eschmidbauer/fix/thread-safety-issues
Fix kv cache not being properly cleaned between sessions
2026-01-09 19:23:35 +01:00
Emmanuel Schmidbauer
d59ddbaeae Fix critical thread safety issues 2026-01-09 11:23:19 -05:00