Commit Graph

202 Commits

Author SHA1 Message Date
Quentin Fuxa
b82cc3b613 adapt backend for the new classes 2025-02-07 12:24:37 +01:00
Quentin Fuxa
46f7f9cbd1 Use Sentence, Transcript and ASRToken classes for clarity 2025-02-07 12:24:11 +01:00
Quentin Fuxa
48c111f494 revert changes for segments buffer_trimming_way to work 2025-02-07 10:17:45 +01:00
Quentin Fuxa
54628274d6 show language used 2025-02-07 10:16:46 +01:00
Quentin Fuxa
0d874fb515 cuda or cpu auto detection 2025-02-07 10:16:03 +01:00
Quentin Fuxa
4d1aa4421a Merge pull request #30 from SilasK/tsw
Time stamped text classes
2025-01-31 22:54:58 +01:00
Quentin Fuxa
f4d98e2c8c Merge pull request #27 from SilasK/fix-sentencesegmenter
Fix sentence segmenter
2025-01-31 22:54:33 +01:00
Silas Kieser
15205f31d1 add doctest 2025-01-28 23:17:21 +01:00
Silas Kieser
b1f7034577 my version of timestamped text 2025-01-28 23:13:15 +01:00
Silas Kieser
23dee02d56 sentence overflow works 2025-01-28 22:38:55 +01:00
Silas Kieser
efd80095a7 segment also works 2025-01-28 22:11:28 +01:00
Silas Kieser
f4d3df3d87 change log format 2025-01-28 21:25:17 +01:00
Silas Kieser
9c7d429e15 add logging config to server 2025-01-28 17:38:13 +01:00
Silas Kieser
611d33cba5 keep a test script in base directory 2025-01-28 17:13:03 +01:00
Silas Kieser
ab7c22d3e3 whisper_online works with the new sentence segment 2025-01-28 17:02:21 +01:00
Silas Kieser
870a779666 sentence work again! 2025-01-28 16:55:07 +01:00
Quentin Fuxa
c3d72cae7c Merge pull request #26 from SilasK/fix-sentencesegmenter
Improve logging stil trying to fix sentence segmenter
2025-01-28 15:53:26 +01:00
Quentin Fuxa
4622fe7aff Merge branch 'main' into fix-sentencesegmenter 2025-01-28 15:53:10 +01:00
Silas Kieser
8ee1488c08 rename to_flush to concatenate_tsw 2025-01-27 16:49:22 +01:00
Silas Kieser
77d43885a3 chunk at sentence takes now an argument =self.comited 2025-01-27 16:29:06 +01:00
Silas Kieser
04170153e0 improve logging 2025-01-27 16:12:30 +01:00
Silas Kieser
baddf0284b buffer length in sentence segmentation is no also max as in segment. 2025-01-27 15:36:19 +01:00
Quentin Fuxa
6e0f1dda25 Merge remote-tracking branch 'contrib/fix-sentencesegmenter' 2025-01-26 15:34:41 +01:00
Quentin Fuxa
c66794e1f5 Merge pull request #20 from SilasK/clean-main
In my limited experience with french "" should also be the sep for mlx-whisper
2025-01-26 14:57:52 +01:00
Silas Kieser
f0eaffacd3 improve logging in whisper_online.py 2025-01-21 14:59:36 +01:00
Silas Kieser
69a2ed6bfb add logger for online asr 2025-01-21 14:45:45 +01:00
Silas Kieser
25eb276794 ignore wav and scripts 2025-01-21 14:08:41 +01:00
Silas Kieser
9f262813ec sep for mlx is also "" 2025-01-21 12:16:46 +01:00
Silas Kieser
4293580581 use moses sentence segmenter instead of tokenizer 2025-01-21 12:12:41 +01:00
Silas Kieser
42d2784c20 clearer log messages for sentence segmentation 2025-01-21 12:11:54 +01:00
Silas Kieser
7fad0a3ee2 sep for mlx is also "" 2025-01-21 10:42:07 +01:00
Quentin Fuxa
27d2db77f7 Update README.md 2025-01-20 03:08:01 +01:00
Quentin Fuxa
fba37eba0a move to src 2025-01-19 21:17:55 +01:00
Quentin Fuxa
5523b51fd7 first speaker is "0" no more None 2025-01-19 19:40:09 +01:00
Quentin Fuxa
9bdb92e923 update demo.png 2025-01-19 19:36:10 +01:00
Quentin Fuxa
b51c8427f4 diart link added 2025-01-19 17:12:55 +01:00
Quentin Fuxa
977436622a add diarization (beta). Disabled by default 2025-01-19 17:12:40 +01:00
Quentin Fuxa
ce56264241 split whisper_online.py into smaller files 2025-01-14 20:52:53 +01:00
Quentin Fuxa
9cbac96c44 del online once webstreaming is finished 2025-01-14 20:20:22 +01:00
Quentin Fuxa
3f30d3de6e Merge branch 'main' of https://github.com/QuentinFuxa/whisper_streaming_web 2025-01-14 20:14:22 +01:00
Quentin Fuxa
f884d1162d warning when transcribe_kargs are used with MLX Whisper 2025-01-14 20:14:16 +01:00
Quentin Fuxa
6ee91c3c93 Merge pull request #15 from in-c0/patch-1
Specify encoding to ensure Python reads file as UTF-8
2025-01-13 20:30:51 +01:00
Ava
f52a5ae3c2 specify encoding to ensure Python reads file as UTF-8
executing `python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000` resulted in error on my setup for me:

```
whisper_streaming_web\whisper_fastapi_online_server.py, line 47, in <module>
    html = f.read()
           ^^^^^^^^
  File "C:\Python312\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1818: character maps to <undefined>
```

On Windows, Python defaults to the `cp1252` encoding, which may not match the encoding of the file being read. 
Files containing special characters, non-ASCII text, or saved with UTF-8 encoding can trigger this error when read without specifying the correct encoding.
2025-01-13 23:12:38 +11:00
Quentin Fuxa
0ff6067f37 Update README.md 2025-01-04 00:55:12 +01:00
Quentin Fuxa
da6c8d25e4 Update README.md 2025-01-03 14:54:29 +01:00
Quentin Fuxa
aa0ba598f0 no online conflict when multiple users 2025-01-03 14:48:45 +01:00
Quentin Fuxa
b7a2d23a18 if websocket connection fails, frontend does not allow recording 2024-12-31 11:17:41 +01:00
Quentin Fuxa
58e48bb717 Merge pull request #10 from SilasK/main
More flexibility by using custom tokenize_method  + black
2024-12-31 10:33:47 +01:00
silask
6a04ddbed2 only print translated text not timestamps 2024-12-30 21:53:33 +01:00
silask
aa4d2599cc fix #7 2024-12-30 21:53:33 +01:00