Quentin Fuxa
d080d675a8
cutom alignment heads parameter for custom models
2025-09-27 11:04:00 +02:00
Quentin Fuxa
8cbaeecc75
cutom alignment heads parameter for custom models
2025-09-27 11:04:00 +02:00
google-labs-jules[bot]
70e854b346
feat: Allow loading fine-tuned models in simulstreaming
...
This change modifies the `simulstreaming` backend to support loading fine-tuned Whisper models via the `--model_dir` argument.
The `SimulStreamingASR` class has been updated to:
- Use the `model_dir` path directly to load the model, which is the correct procedure for fine-tuned `.pt` files.
- Automatically disable the `faster-whisper` and `mlx-whisper` fast encoders when `model_dir` is used, as they are not compatible with standard fine-tuned models.
The call site in `core.py` already passed the `model_dir` argument, so no changes were needed there. This change makes the `simulstreaming` backend more flexible and allows users to leverage their own custom models.
2025-09-27 07:29:30 +00:00
Quentin Fuxa
b22478c0b4
correct silences handling when language not auto
2025-09-25 23:20:00 +02:00
Quentin Fuxa
4dd5d8bf8a
translation compatible with auto and detected language
2025-09-22 11:20:00 +02:00
Quentin Fuxa
93f002cafb
language detection after few seconds working
2025-09-20 11:08:00 +02:00
Quentin Fuxa
674b20d3af
in buffer while language not detected »
2025-09-21 11:05:00 +02:00
Quentin Fuxa
a5503308c5
O(n) to O(1) for simulstreaming timestamp determination
2025-09-21 11:04:00 +02:00
Quentin Fuxa
426d70a790
simulstreaming infer does not return a dictionary anymore
2025-09-21 11:03:00 +02:00
Quentin Fuxa
334b338ab0
use platform to determine system and recommand mlx whisper
2025-09-07 15:49:11 +02:00
notV3NOM
abd8f2c269
Fix exponentially growing simulstreaming silence timer
2025-09-04 21:49:07 +05:30
Quentin Fuxa
3bd2122eb4
0.2.8 : only the decoder of whisper is loaded in memory when a different encoder is used
2025-09-02 21:12:25 +02:00
Quentin Fuxa
d5008ed828
mlx/fasterWhisper encoders are loaded once and shared in simulstreaming
2025-09-01 12:33:19 +02:00
Quentin Fuxa
1d926f2e67
mlx-whisper used as simulstreaming encoder: improve speed for macos systems
2025-08-30 22:19:11 +02:00
Quentin Fuxa
4a5d5e1f3b
raise Exception when language == auto and task == translation
2025-08-29 17:44:46 +02:00
Quentin Fuxa
9895bc83bf
auto detection of language for warmup if not indicated
2025-08-27 20:37:48 +02:00
Quentin Fuxa
b32dd8bfc4
Align backend and frontend time handling
2025-08-21 10:33:15 +02:00
Quentin Fuxa
d0e9e37ef6
simulstreaming: cumulative_time_offset to keep timestamps correct when audio > 30s
2025-08-17 09:33:47 +02:00
Quentin Fuxa
e2184d5e06
better handle silences when VAC + correct offset issue with whisperstreaming backend
2025-08-17 01:27:07 +02:00
Quentin Fuxa
0f2eba507e
use with_offset to add no audio offset to tokens
2025-08-17 00:33:24 +02:00
Quentin Fuxa
55e08474f3
recycle backend in simulstreaming thanks to new remove hooks function
2025-08-16 23:06:16 +02:00
Quentin Fuxa
1652db9a2d
Use distinct backend models for simulstreaming and add --preloaded_model_count to preload them
2025-08-15 23:03:55 +02:00
Quentin Fuxa
15c3df1cba
warmup base whisper when using simulstreaming
2025-08-12 18:52:52 +02:00
Quentin Fuxa
728e1f1290
simulstreaming warmup is done for each instance of online, not for the backend
2025-08-12 18:35:04 +02:00
Quentin Fuxa
d098af3185
each SimulStreamingOnlineProcessor now contains PaddedAlignAttWhisper instance. SimulStreamingASR only contains loaded whisper model
2025-08-11 08:24:14 +02:00
Quentin Fuxa
5491964e81
clean SimulStreamingOnlineProcessor initialization + audio processing
2025-08-09 20:16:27 +02:00
Quentin Fuxa
b05297a96d
clean simulwhisper backend and online
2025-08-09 18:02:15 +02:00
Quentin Fuxa
197293e25e
refactor(simulstreaming): extract backend + online module into separate files from whisper streaming
2025-08-08 18:07:51 +02:00