Is this Suitable for real time websocket streaming? #2741
Replies: 1 comment 1 reply
-
|
I would treat Whisper as suitable for near real-time, not true streaming. The current code processes audio in sliding 30 second windows, so a websocket setup usually means buffering small chunks, running transcription repeatedly, and stitching partial results together. That works, but it is not a native streaming ASR pipeline. For model choice, if you want the best latency, I would start with For the last-word question, VAD alone will only tell you speech boundaries, not the actual last word. Whisper does support
So yes for pseudo real-time, but not as a true low-latency streaming model out of the box. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Whisper models are suitable for realtime websocket streaming. If yes please suggest me which model results better. Ihave one more doubt is there any way to find the last word of transcription using VAD?
Beta Was this translation helpful? Give feedback.
All reactions