Commit Graph

764 Commits

Author SHA1 Message Date
Quentin Fuxa
b102e12943 M5 benchmark figures: WER vs RTF scatter, 0.6B+1.7B MLX results 2026-03-15 15:00:00 +01:00
Quentin Fuxa
7aa3b764bd MLX benchmark: 1.7B SimulStreaming on M5 (WER 4.07%, RTF 0.944)
LibriSpeech test-clean, 500 utterances.
1.7B is borderline real-time on M5 (RTF 0.944).
0.6B (3.30% WER, 0.263 RTF) is the practical choice for MacBook.
2026-03-15 14:00:00 +01:00
Quentin Fuxa
a422e604ae MLX benchmark: 0.6B SimulStreaming on M5 MacBook (WER 3.30%, RTF 0.263)
LibriSpeech test-clean, 500 utterances, per-utterance simul-streaming.
AlignAtt border detection with 20 alignment heads.
Platform: Apple M5 32GB (MLX fp16).

benchmark_mlx_simul.py: reusable benchmark script for MLX backends.
2026-03-15 13:00:00 +01:00
Quentin Fuxa
e14b913807 Merge branch 'benchmarks-h100' 2026-03-15 12:00:00 +01:00
Quentin Fuxa
47d4cbeecc reorganize benchmarks: move H100 results to benchmarks/h100/ 2026-03-15 23:59:00 +01:00
Quentin Fuxa
f75dfb386d final benchmark: Voxtral vLLM realtime streaming 2026-03-15 23:59:00 +01:00
Quentin Fuxa
276ba84d02 update figures with Voxtral vLLM results 2026-03-15 23:55:00 +01:00
Quentin Fuxa
36b3885cf2 add Voxtral 4B to benchmark figures 2026-03-15 23:30:00 +01:00
Quentin Fuxa
a29e799ba5 update H100 benchmark figures with ACL6060 results 2026-03-15 22:30:00 +01:00
Quentin Fuxa
22325ba326 tune simul-kv: 2s inference interval, configurable min_new_seconds 2026-03-15 21:30:00 +01:00
Quentin Fuxa
a540a5fd10 fix simul-kv audio trim bug, add 1.7B v2 alignment heads 2026-03-15 20:45:00 +01:00
Quentin Fuxa
7b08ea74ab add H100 benchmark figures 2026-03-15 19:15:00 +01:00
Quentin Fuxa
b69eaf82be qwen3 simul+kv: optimized streaming with kv cache reuse 2026-03-15 18:30:00 +01:00
Quentin Fuxa
3b7a2fcc87 Add Qwen3-ASR MLX SimulStreaming backend
New backend 'qwen3-mlx-simul' for Apple Silicon: AlignAtt border
detection via monkey-patched cross-attention on MLX Qwen3-ASR.
Supports 0.6B (RTF 0.236 on M5) and 1.7B models.

- qwen3_mlx_simul.py: full streaming implementation with KV cache,
  alignment head attention extraction, border-distance policy
- core.py: register new backend in TranscriptionEngine + online_factory
- parse_args.py: add qwen3-mlx-simul to CLI choices
2026-03-15 11:00:00 +01:00
Quentin Fuxa
ed503be140 qwen 2026-01-02 23:52:00 +01:00
Quentin Fuxa
a6a85431f6 update benchmark with qwen3 which reuses kv cache 2026-03-15 22:32:01 +01:00
Quentin Fuxa
dd48997674 qwen3: reuse encoder kv cache 2026-03-15 22:31:39 +01:00
Quentin Fuxa
f24481dc29 update archi 2026-03-15 11:36:45 +01:00
Quentin Fuxa
ed76f40ee5 Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit 2026-03-15 11:16:38 +01:00
Quentin Fuxa
5330b3fac5 update benchmark part 2026-03-15 11:16:26 +01:00
Quentin Fuxa
0c73a73aa3 update benchmark results and procedure 2026-03-15 11:16:15 +01:00
Quentin Fuxa
2d6bc4f572 Add '*.c' to .dockerignore 2026-03-14 00:18:10 +01:00
Quentin Fuxa
dfd5bf417c voxtral mlx : improved chunking 2026-03-14 00:13:29 +01:00
Quentin Fuxa
9d8db7ab38 add qwen3 simul in tests 2026-03-14 00:13:09 +01:00
Quentin Fuxa
fa15115163 qwen3 alignment heads 2026-03-14 00:12:50 +01:00
Quentin Fuxa
8dc7b77071 Bump version to 0.2.20 v0.2.20 2026-03-08 16:02:00 +01:00
Quentin Fuxa
10d85ff65f Update docs, CI, and architecture diagram 2026-03-08 15:14:00 +01:00
Quentin Fuxa
e7e3441ca4 Add Qwen3 ASR backend 2026-03-07 11:48:00 +01:00
Quentin Fuxa
9abe26a996 Add CLI with serve, transcribe, listen, pull, diagnose 2026-03-01 13:37:00 +01:00
Quentin Fuxa
c8e7c216ed Replace mock tests with real pipeline tests 2026-02-28 10:05:00 +01:00
Quentin Fuxa
586540ae36 Add test harness and test client 2026-02-22 16:19:00 +01:00
Quentin Fuxa
cd8df8e1aa Update package setup and exports 2026-02-21 11:33:00 +01:00
Quentin Fuxa
e30f9a2573 Improve diarization backends 2026-02-15 14:55:00 +01:00
Quentin Fuxa
32de7b1276 Fix frontend buffer rendering for slow backends 2026-02-14 09:28:00 +01:00
Quentin Fuxa
9ac7c26a0b Add OpenAI REST API and Deepgram WebSocket 2026-02-08 15:42:00 +01:00
Quentin Fuxa
c0e2600993 Add snapshot-then-diff WebSocket protocol 2026-02-07 10:17:00 +01:00
Quentin Fuxa
e0db3a98f9 Add per-session language proxy 2026-02-01 17:03:00 +01:00
Quentin Fuxa
2fe34427ef Fix voxtral streaming drain and silence flush 2026-01-31 11:12:00 +01:00
Quentin Fuxa
d58365421f Refactor audio processor async pipeline 2026-01-25 13:48:00 +01:00
Quentin Fuxa
a282cbe75f Improve tokens alignment and silence handling 2026-01-24 10:55:00 +01:00
Quentin Fuxa
6e85c16614 Refactor TranscriptionEngine singleton 2026-01-18 15:27:00 +01:00
Quentin Fuxa
e1823dd99c Improve online ASR processor 2026-01-17 09:35:00 +01:00
Quentin Fuxa
e144abbbc7 Refactor timed objects and data structures 2026-01-11 16:08:00 +01:00
Quentin Fuxa
83362c89c4 Clean up config and model paths 2026-01-10 11:42:00 +01:00
Quentin Fuxa
74c4dc791d Lint scripts and tests 2026-01-04 14:15:00 +01:00
Quentin Fuxa
cf6c49f502 Ruff lint cleanup 2026-01-03 10:23:00 +01:00
Quentin Fuxa
451535d48f Fix ctranslate2 encoder conversion (#345) and memory leak in TokensAlignment (#344)
- Add fallback chain for StorageView to numpy conversion
- Prune old tokens/segments after 5min to bound memory
2026-03-10 22:37:00 +01:00
Quentin Fuxa
8bc0937c46 Update README section on powered research 2026-03-06 18:46:07 +01:00
Quentin Fuxa
929cf7a26b add link to AlignAtt interactive playground 2026-03-06 18:43:25 +01:00
Quentin Fuxa
abfaf06203 Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit 2026-03-04 18:17:23 +01:00