WhisperLiveKit

mirror of https://github.com/QuentinFuxa/WhisperLiveKit.git synced 2026-03-07 14:23:18 +00:00

Files

Quentin Fuxa 8c799fa4d1 fix simulstreaming vram leak: cap cross-attn accumulation + token budget

fixes #283, fixes #275

- accumulated_cross_attns was growing unboundedly during decoding loop,
  using up to ~5GB for repetition loops. now capped to rolling window of 16
- max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50)
  instead of actual text token rate (~15/s), allowing 10-40x too many
  decoding steps
- removed unused torch.cat on early return path
- removed dead self.committed/last_result_tokens lists (never read)
- same fixes applied to mlx variant

2026-02-11 22:10:00 +01:00

diarization

add insert_audio_chunk to DiartDiarization

2026-02-11 22:10:00 +01:00

local_agreement

fix --direct-english-translation not setting task=translate for localagreement backends

2026-02-11 22:10:00 +01:00

silero_vad_models

fixes silence detected but never reported by silero

2025-11-23 11:20:00 +01:00

simul_whisper

fix simulstreaming vram leak: cap cross-attn accumulation + token budget