Commit Graph

186 Commits

Author SHA1 Message Date
Quentin Fuxa
c3d72cae7c Merge pull request #26 from SilasK/fix-sentencesegmenter
Improve logging stil trying to fix sentence segmenter
2025-01-28 15:53:26 +01:00
Quentin Fuxa
4622fe7aff Merge branch 'main' into fix-sentencesegmenter 2025-01-28 15:53:10 +01:00
Silas Kieser
8ee1488c08 rename to_flush to concatenate_tsw 2025-01-27 16:49:22 +01:00
Silas Kieser
77d43885a3 chunk at sentence takes now an argument =self.comited 2025-01-27 16:29:06 +01:00
Silas Kieser
04170153e0 improve logging 2025-01-27 16:12:30 +01:00
Silas Kieser
baddf0284b buffer length in sentence segmentation is no also max as in segment. 2025-01-27 15:36:19 +01:00
Quentin Fuxa
6e0f1dda25 Merge remote-tracking branch 'contrib/fix-sentencesegmenter' 2025-01-26 15:34:41 +01:00
Quentin Fuxa
c66794e1f5 Merge pull request #20 from SilasK/clean-main
In my limited experience with french "" should also be the sep for mlx-whisper
2025-01-26 14:57:52 +01:00
Silas Kieser
f0eaffacd3 improve logging in whisper_online.py 2025-01-21 14:59:36 +01:00
Silas Kieser
69a2ed6bfb add logger for online asr 2025-01-21 14:45:45 +01:00
Silas Kieser
25eb276794 ignore wav and scripts 2025-01-21 14:08:41 +01:00
Silas Kieser
9f262813ec sep for mlx is also "" 2025-01-21 12:16:46 +01:00
Silas Kieser
4293580581 use moses sentence segmenter instead of tokenizer 2025-01-21 12:12:41 +01:00
Silas Kieser
42d2784c20 clearer log messages for sentence segmentation 2025-01-21 12:11:54 +01:00
Silas Kieser
7fad0a3ee2 sep for mlx is also "" 2025-01-21 10:42:07 +01:00
Quentin Fuxa
27d2db77f7 Update README.md 2025-01-20 03:08:01 +01:00
Quentin Fuxa
fba37eba0a move to src 2025-01-19 21:17:55 +01:00
Quentin Fuxa
5523b51fd7 first speaker is "0" no more None 2025-01-19 19:40:09 +01:00
Quentin Fuxa
9bdb92e923 update demo.png 2025-01-19 19:36:10 +01:00
Quentin Fuxa
b51c8427f4 diart link added 2025-01-19 17:12:55 +01:00
Quentin Fuxa
977436622a add diarization (beta). Disabled by default 2025-01-19 17:12:40 +01:00
Quentin Fuxa
ce56264241 split whisper_online.py into smaller files 2025-01-14 20:52:53 +01:00
Quentin Fuxa
9cbac96c44 del online once webstreaming is finished 2025-01-14 20:20:22 +01:00
Quentin Fuxa
3f30d3de6e Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web 2025-01-14 20:14:22 +01:00
Quentin Fuxa
f884d1162d warning when transcribe_kargs are used with MLX Whisper 2025-01-14 20:14:16 +01:00
Quentin Fuxa
6ee91c3c93 Merge pull request #15 from in-c0/patch-1
Specify encoding to ensure Python reads file as UTF-8
2025-01-13 20:30:51 +01:00
Ava
f52a5ae3c2 specify encoding to ensure Python reads file as UTF-8
executing `python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000` resulted in error on my setup for me:

```
whisper_streaming_web\whisper_fastapi_online_server.py, line 47, in <module>
    html = f.read()
           ^^^^^^^^
  File "C:\Python312\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1818: character maps to <undefined>
```

On Windows, Python defaults to the `cp1252` encoding, which may not match the encoding of the file being read. 
Files containing special characters, non-ASCII text, or saved with UTF-8 encoding can trigger this error when read without specifying the correct encoding.
2025-01-13 23:12:38 +11:00
Quentin Fuxa
0ff6067f37 Update README.md 2025-01-04 00:55:12 +01:00
Quentin Fuxa
da6c8d25e4 Update README.md 2025-01-03 14:54:29 +01:00
Quentin Fuxa
aa0ba598f0 no online conflict when multiple users 2025-01-03 14:48:45 +01:00
Quentin Fuxa
b7a2d23a18 if websocket connection fails, frontend does not allow recording 2024-12-31 11:17:41 +01:00
Quentin Fuxa
58e48bb717 Merge pull request #10 from SilasK/main
More flexibility by using custom tokenize_method  + black
2024-12-31 10:33:47 +01:00
silask
6a04ddbed2 only print translated text not timestamps 2024-12-30 21:53:33 +01:00
silask
aa4d2599cc fix #7 2024-12-30 21:53:33 +01:00
silask
5fdb08edae black formating 2024-12-30 21:53:33 +01:00
Quentin Fuxa
4cb3660666 Update README.md 2024-12-30 20:46:36 +01:00
Quentin Fuxa
122368bff3 Append full transcription in websocket processing 2024-12-30 15:21:00 +01:00
Quentin Fuxa
0d833eaea2 Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web 2024-12-28 18:32:36 +01:00
Quentin Fuxa
c960d1571d Batch unprocessed audio to reduce Whisper streaming calls 2024-12-28 18:32:27 +01:00
Quentin Fuxa
1aa1b9ea99 Update README.md : ffmpeg to ffmpeg-python 2024-12-28 09:15:09 +01:00
Quentin Fuxa
99019f1dd7 Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web 2024-12-24 19:36:36 +01:00
Quentin Fuxa
1cea20a42d /ws to /asr to distinguish protocol ws:// from endpoint 2024-12-24 19:36:20 +01:00
Quentin Fuxa
50bbd26517 throw errors if websocket connection fails 2024-12-24 19:31:08 +01:00
Quentin Fuxa
cf5d1cf013 Update README.md 2024-12-19 18:57:15 +01:00
Quentin Fuxa
0553b75415 unfork project, indicate files from whisper streaming 2024-12-19 12:01:07 +01:00
Quentin Fuxa
baa01728be Merge branch 'whisper-mlx' 2024-12-19 11:14:48 +01:00
Quentin Fuxa
8dcebd9329 add translate_model_name function 2024-12-19 11:10:02 +01:00
Quentin Fuxa
bfe973a0d2 Merge branch 'whisper-mlx' 2024-12-19 10:48:25 +01:00
Quentin Fuxa
87cab7c280 add whisper mlx backend 2024-12-19 10:47:46 +01:00
Quentin Fuxa
bee27c68e6 better buffer gestion 2024-12-19 10:19:24 +01:00