From dbdb4ea66cc30af2433a475b9fb885a492db3980 Mon Sep 17 00:00:00 2001 From: Quentin Fuxa <38427957+QuentinFuxa@users.noreply.github.com> Date: Fri, 1 Aug 2025 16:33:26 +0200 Subject: [PATCH] Update README.md --- README.md | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index b21bb7f..efd079f 100644 --- a/README.md +++ b/README.md @@ -13,19 +13,8 @@ License

-## Overview - This project is based on [WhisperStreaming](https://github.com/ufal/whisper_streaming) and [SimulStreaming](https://github.com/ufal/SimulStreaming), allowing you to transcribe audio directly from your browser. WhisperLiveKit provides a complete backend solution for real-time speech transcription with a functional, simple and customizable frontend. Everything runs locally on your machine ✨ -### Architecture - -WhisperLiveKit consists of three main components: - -- **Frontend**: A basic html + JS interface that captures microphone audio and streams it to the backend via WebSockets. You can use and adapt the [provided template](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html). -- **Backend (Web Server)**: A FastAPI-based WebSocket server that receives streamed audio data, processes it in real time, and returns transcriptions to the frontend. This is where the WebSocket logic and routing live. -- **Core Backend (Library Logic)**: A server-agnostic core that handles audio processing, ASR, and diarization. It exposes reusable components that take in audio bytes and return transcriptions. - - ### Key Features - **Real-time Transcription** - Locally (or on-prem) convert speech to text instantly as you speak @@ -37,6 +26,11 @@ WhisperLiveKit consists of three main components: - **Punctuation-Based Speaker Splitting [BETA]** - Align speaker changes with natural sentence boundaries for more readable transcripts - **SimulStreaming Backend** - [Dual-licensed](https://github.com/ufal/SimulStreaming#-licence-and-contributions) - Ultra-low latency transcription using SOTA AlignAtt policy. +### Architecture + +Picture 1 + + ## Quick Start ```bash