[Whisper] Add word timestamps and confidence scores#201
[Whisper] Add word timestamps and confidence scores#201awni merged 8 commits intoml-explore:mainfrom
Conversation
|
Super cool, thanks for adding that! |
|
Addresses #146 |
|
After measuring the time taken for operations to add word-level timestamps/scores, I've found that most are consumed by the extra model forward pass. There also appears to be overhead in the first run of DTW, likely due to Numba JIT compilation Below are the measured times from tests with the large model. |
There was a problem hiding this comment.
Hi @bofenghuang this looks really nice to me and I think we can merge it!
One thing I'm wondering is if we can test the alignment code and/or the word timestamp code at all? It is a bit involved so it would be good to have a test or two to cover it.
d466638 to
bfbcb5d
Compare
|
Hi @awni, thanks for the review! I've just done a rebase and added a test for word-level timestamps & confidence, comparing the results with those from openai-whisper. |
|
Below are the new measured times from tests run on my mac m1 pro: |
awni
left a comment
There was a problem hiding this comment.
Thank you for adding this!! I updated the README to reflect the addition.
* Add word timestamps and confidence scores * Create a separate forward_with_cross_qk function * Move multiple ops from np to mlx, clean comments * Save alignment_heads * Cast qk to fp32 * Add test for word-level timestamps and confidence scores * format + readme * nit --------- Co-authored-by: Awni Hannun <awni@apple.com>
Hi @awni 👋
I've tried to add several new features to the Whisper implementation through this PR, following the implementation of the original repository:
transcribe()openai/whisper#869)This is still a draft version that may require some optimizations:
median_filteranddtw. I used directly themedian_filterfrom scipy, since I didn't find theunfoldfunction in mlx. As fordtw, I kept the original numba versionqkattention scores in the model forwardBelow are the benchmark times from tests run on my M1 Pro.