Skip to content

Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm#659

Merged
jongwook merged 3 commits intoopenai:mainfrom
HennerM:fix-mm-normalisation
Jan 18, 2023
Merged

Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm#659
jongwook merged 3 commits intoopenai:mainfrom
HennerM:fix-mm-normalisation

Conversation

@HennerM
Copy link
Copy Markdown
Contributor

@HennerM HennerM commented Dec 8, 2022

The English normaliser mistakenly replaces 20mm to 20hmm. In this case "mm" is the unit postfix millimetre.

This was caused by treating 10 as number and thus splitting "10" and "mm". The "mm" token was then further replaced with "hmm" according to the English.json mapping.

Removing "mm" from the mapping shouldn't be a problem, since there is already a condition before that that would remove "mm" words entirely from the input.

"mhm" and "mmm" could probably be removed for the same reason.

@jongwook jongwook merged commit ea1c266 into openai:main Jan 18, 2023
@jongwook
Copy link
Copy Markdown
Collaborator

Thanks. Those replacers are from the post-processing scripts for the CHiME dataset:

https://github.com/kaldi-asr/kaldi/blob/ae8cbe8858f2a66a9b193c82dbe3b0479364165f/egs/chime5/s5/local/wer_output_filter#L19-L21

but I agree it'd be not very relevant to keep it at this point.

zackees pushed a commit to zackees/whisper that referenced this pull request May 5, 2023
ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023
abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
heejipark23 pushed a commit to heejipark23/whisper that referenced this pull request Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants