Predict the Mean Opinion Score (MOS) of speech processed by speech enhancement (SE) systems. Check our challenge webpage for details
This repo provides the official implementation/baseline derived from Uni-VERSA-Ext for URGENT 2026 Track 2.
Play with the model in Colab: https://colab.research.google.com/drive/1Y2OkPE0hGSG4XRj_b7RsmWMVSg4KkhM7
๐ Security note: We use HyperPyYAML for config loading. Treat configs as code: do not load model from untrusted sources.
- Install from GitHub
pip install git+https://github.com/urgent-challenge/urgent2026_challenge_track2- Predict the speech quality metrics of a single audio file
from urgent2026_sqa.infer import infer_single, load_model
model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")
# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_single(model, config, "./assets/sp03.wav"))
print(infer_single(model, config, "./assets/sp03_casino_sn5.wav"))- Predict the speech quality metrics of all audio file in a folder
from urgent2026_sqa.infer import infer_list, load_model
from pathlib import Path
model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")
audio_paths = list(Path("./assets").glob("*.wav"))
# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_list(model, config, audio_paths))git clone https://github.com/urgent-challenge/urgent2026_challenge_track2
cd urgent2026_challenge_track2
# Create and activate environment
conda create -n urgent2026-sqa python=3.11 -y
conda activate urgent2026-sqa
pip install -e .[train]The following script fetches/organizes all datasets listed in below:
โ ๏ธ NOTE: bvcc and bc19 datasets require manual processing after downloading. If you don't want to include them, comment out the corresponding line in scripts/data/prepare.sh.
bash scripts/prepare_data.sh </path/to/db>| Corpus | #Samples | #Systems | Duration (hours) | Links | License | |
|---|---|---|---|---|---|---|
| Training | BC19 | 136 | 21 | 0.32 | [Original] | Custom |
| BVCC | 4973 | 175 | 5.56 | [Original] | Custom | |
| NISQA | 11020 | N/A | 27.21 | [Original] | Mixed | |
| PSTN | 58709 | N/A | 163.08 | [Original] | Unknown | |
| SOMOS | 14100 | 181 | 18.32 | [Original] [Huggingface] | CC BY-NC-SA 4.0 | |
| TCD-VoIP | 384 | 24 | 0.87 | [Original] [Huggingface] | CC BY-NC-SA 4.0 | |
| Tencent | 11563 | N/A | 23.51 | [Original] [Huggingface] | Apache | |
| TMHINT-QI | 12937 | 98 | 11.35 | [Original] [Huggingface] | MIT | |
| TTSDS2 | 460 | 80 | 0.96 | [Original] [Huggingface] | MIT | |
| urgent2024-sqa | 238 | 238000 | 429.34 | [Huggingface] | CC BY-NC-SA 4.0 | |
| urgent2025-sqa | 100000 | 100 | 261.31 | [Huggingface] | CC BY-NC-SA 4.0 | |
| Dev | CHiME-7 UDASE Eval | 640 | 5 | 0.84 | [Original] [Huggingface] | CC BY-SA 4.0 |
| Test | urgent2024-sqa (blind_test_mos) | 6900 | 23 | 13.80 | [Huggingface] | CC BY-NC-SA 4.0 |
The following command train the Uni-VERSA-Ext with all prepared training datasets.
accelerate launch urgent2026/train.py \
--config configs/universa-ext.yaml \
--exp exp/universa-ext \
--train-data "data/*/train/data.jsonl" \
--cv-data "data/chime-7-udase-eval/test/data.jsonl"โ Training will auto-resume from the latest checkpoint
โ
Distributed training and mixed precision is supported with accelerate:
accelerate launch --num_processes=<N> \
--main_process_port <port> \
--mixed_precision=bf16 \
urgent2026_sqa/train.py \
...If you use configs/universa-ext_wavlm-base_mos-only.yaml you'll need to exclude the urgent2024-sqa training set:
accelerate launch urgent2026_sqa/train.py \
--config configs/universa-ext_wavlm-base_mos-only.yaml \
--exp exp/universa-ext_wavlm-base_mos-only \
--train-data "data/{bvcc,bc19,nisqa,pstn,somos,tcd-voip,tencent,tmhint-qi,ttsds2}/train/data.jsonl" \
--cv-data "data/chime-7-udase-eval/test/data.jsonl"๐ง Under Construction
pip install -e .[dev]For inference on single audio file, follow Quickstart (Inference)
For batch inference:
dataset="chime-7-udase-eval" python urgent2026_sqa/infer.py \
--ckpt "exp/universa-ext/model_last.pt" \
--data "data/${dataset}/test/data.jsonl" \
--outdir "exp/universa-ext/infer/${dataset}"This will genenerate a results.jsonl file and {metric}.scp files for all metrics under the --outdir
dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
--pred "exp/universa-ext/infer/${dataset}/results.jsonl" \
--ref "data/${dataset}/test/data.jsonl"You may also want to evaluate metric by comparing annotated metrics (e.g. scoreq) vs mos:
dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
--pred "data/${dataset}/test/data.jsonl" \
--ref "data/${dataset}/test/data.jsonl" \
--pred-metric "scoreq"| Category | Metric | Value Range | Opt. |
|---|---|---|---|
| Error | System level MSE | [0, โ) | โ |
| Utterance level MSE | [0, โ) | โ | |
| Linear Correlation | System level LCC | [-1, 1] | โ |
| Utterance level LCC | [-1, 1] | โ | |
| Rank Correlation | System level SRCC | [-1, 1] | โ |
| Utterance level SRCC | [-1, 1] | โ | |
| System level KTAU | [-1, 1] | โ | |
| Utterance level KTAU | [-1, 1] | โ |
--
- AudioMOS Challenge series (VoiceMOS 2022โ2024, AudioMOS 2025): A series of benchmark challenges on MOS prediction for synthetic speech, singing voice music and general audio, providing large-scale datasets and standard evaluation protocols.
A: The mos labeled split of urgent2025-sqa is partially used as test data for this challenge, it will be release after the challenge ends, stay tuned!
If you use this code or datasets, please consider citing:
@article{UniVersaExt-Wang2025,
title={Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment},
author={Wang, Wei and Zhang, Wangyou and Li, Chenda and Shi, Jiatong and Watanabe, Shinji and Qian, Yanmin},
journal={arXiv preprint arXiv:2506.12260},
year={2025}
}
@inproceedings{Interspeech2025-Saijo2025,
title={Interspeech 2025 {URGENT} Speech Enhancement Challenge},
author={Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Ni, Zhaoheng and Kumar, Anurag and Sach, Marvin and Fu, Yihui and Wang, Wei and Fingscheidt, Tim and Watanabe, Shinji},
booktitle={Proc. Interspeech},
pages={858--862},
year={2025},
}
@inproceedings{URGENT-Zhang2024,
title={{URGENT} Challenge: Universality, Robustness, and Generalizability For Speech Enhancement},
author={Zhang, Wangyou and Scheibler, Robin and Saijo, Kohei and Cornell, Samuele and Li, Chenda and Ni, Zhaoheng and Pirklbauer, Jan and Sach, Marvin and Watanabe, Shinji and Fingscheidt, Tim and Qian, Yanmin},
booktitle={Proc. Interspeech},
pages={4868--4872},
year={2024}
}
@article{P808-Sach2025,
title={P.808 Multilingual Speech Enhancement Testing: Approach and Results of {URGENT} 2025 Challenge},
author={Sach, Marvin and Fu, Yihui and Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Kumar, Anurag and Wang, Wei and Qian, Yanmin and Watanabe, Shinji and Fingscheidt, Tim},
journal={arXiv preprint arXiv:2507.11306},
year={2025}
}