WhisperLiveKit

mirror of https://github.com/QuentinFuxa/WhisperLiveKit.git synced 2026-03-07 22:33:36 +00:00

Files

Quentin Fuxa 8c799fa4d1 fix simulstreaming vram leak: cap cross-attn accumulation + token budget

fixes #283, fixes #275

- accumulated_cross_attns was growing unboundedly during decoding loop,
  using up to ~5GB for repetition loops. now capped to rolling window of 16
- max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50)
  instead of actual text token rate (~15/s), allowing 10-40x too many
  decoding steps
- removed unused torch.cat on early return path
- removed dead self.committed/last_result_tokens lists (never read)
- same fixes applied to mlx variant

2026-02-11 22:10:00 +01:00

__init__.py

new dec class

2024-11-21 23:52:00 +01:00

decoder_state.py

new dec class

2024-11-21 23:52:00 +01:00

decoders.py

uses native mlx function for attention

2024-11-21 23:52:00 +01:00

simul_whisper.py

fix simulstreaming vram leak: cap cross-attn accumulation + token budget

2026-02-11 22:10:00 +01:00