- BENCHMARK.md: whisper also supports --language auto, voxtral is not
the only one. Fixed mlx-whisper speed comparison (LA is actually
faster than SS for mlx-whisper, not comparable).
- metrics.py: median calculation was wrong for even-length lists
(took upper middle instead of averaging the two middle values).
- metrics_collector.py: RTF was inflated because log_summary() used
wall-clock elapsed time instead of sum of actual ASR call durations.
- README.md: clarified that whisper also supports auto language
detection, voxtral just does it better.
- Added 2 new median tests (even + odd length).
- Fix _begin_silence pushing same object reference as _end_silence,
causing the consumer to process two ended events and double the
silence duration.
- Fix initial silence never cleared when VAC is disabled, causing
the no-VAC path to enqueue zero audio.
- Add sample-precise silence boundaries (at_sample parameter).
- Add whisperlivekit/metrics.py with WER computation (word-level
Levenshtein) and timestamp accuracy (greedy alignment). No
external dependencies.
- Add whisperlivekit/metrics_collector.py with SessionMetrics
dataclass for per-session runtime observability. Instrumented
at 6 points in AudioProcessor: init, process_audio,
transcription_processor, _end_silence, results_formatter, cleanup.
Emits SESSION_METRICS structured log line on session end.