add download script for SimulStreaming backend

This commit is contained in:
Quentin Fuxa
2025-07-03 10:14:45 +02:00
parent 36cc4412c3
commit 5daaf77258
3 changed files with 56 additions and 1 deletions

View File

@@ -35,7 +35,7 @@ WhisperLiveKit consists of three main components:
- **Confidence Validation** Immediately validate high-confidence tokens for faster inference (WhisperStreaming only)
- **Buffering Preview** Displays unvalidated transcription segments (not compatible with SimulStreaming yet)
- **Punctuation-Based Speaker Splitting [BETA]** - Align speaker changes with natural sentence boundaries for more readable transcripts
- **SimulStreaming Backend** - Ultra-low latency transcription using state-of-the-art AlignAtt policy. The code is not directly included in the repo : To use, please copy [whisper](https://github.com/ufal/SimulStreaming/tree/main/simul_whisper/whisper) folder into `whisperlivekit/simul_whisper/` . ⚠️ You must comply with the [Polyform license](https://github.com/ufal/SimulStreaming/blob/main/LICENCE.txt)
- **SimulStreaming Backend** - Ultra-low latency transcription using state-of-the-art AlignAtt policy. The code is not bundled directly in this repository. To retrieve it, please run `python whisperlivekit/download_simulstreaming_backend.py`
## Quick Start

View File

@@ -0,0 +1,30 @@
import os
import requests
GITHUB_API_URL = "https://api.github.com/repos/ufal/SimulStreaming/contents/simul_whisper/whisper"
RAW_BASE_URL = "https://raw.githubusercontent.com/ufal/SimulStreaming/main/simul_whisper/whisper"
TARGET_DIR = os.path.join("whisperlivekit", "simul_whisper", "whisper")
def download_files_from_github(api_url, local_dir):
os.makedirs(local_dir, exist_ok=True)
response = requests.get(api_url)
response.raise_for_status()
items = response.json()
for item in items:
if item['type'] == 'file':
download_url = item['download_url']
file_name = item['name']
print(f"Downloading {file_name} ...")
file_response = requests.get(download_url)
file_response.raise_for_status()
with open(os.path.join(local_dir, file_name), 'wb') as f:
f.write(file_response.content)
elif item['type'] == 'dir':
# Recursive call for subdirectories
download_files_from_github(item['url'], os.path.join(local_dir, item['name']))
if __name__ == "__main__":
print(f"Downloading files into {TARGET_DIR} ...")
download_files_from_github(GITHUB_API_URL, TARGET_DIR)
print("✅ Download completed successfully.")

View File

@@ -0,0 +1,25 @@
📄 SimulStreaming (https://github.com/ufal/SimulStreaming) Licence
SimulStreaming is dual-licensed:
🔹 Non-Commercial Use
You may use SimulStreaming under the **PolyForm Noncommercial License 1.0.0** if you
obtain the code through the GitHub repository. This license is **free of charge**
and comes with **no obligations** for non-commercial users.
🔸 Commercial Use
Understanding who uses SimulStreaming commercially helps us improve and
prioritize development. Therefore, we want to **require registration** of those who acquire a commercial licence.
We plan to make the commercial licenceses **affordable** to SMEs and individuals. We
are considering to provide commercial licenses either for free or for symbolic
one-time fee, and maybe also provide additional support. You can share your preference via the [questionnaire](https://forms.cloud.microsoft/e/7tCxb4gJfB).
You can also leave your contact [there](https://forms.cloud.microsoft/e/7tCxb4gJfB) to be notified when the commercial licenses become
available.
✉️ Contact
[Dominik Macháček](https://ufal.mff.cuni.cz/dominik-machacek/), machacek@ufal.mff.cuni.cz