URGENT 2026 — Track 2 (Speech Quality Assessment)

Predict the Mean Opinion Score (MOS) of speech processed by speech enhancement (SE) systems. Check our challenge webpage for details

This repo provides the official implementation/baseline derived from Uni-VERSA-Ext for URGENT 2026 Track 2.

📑 Table of Contents

Quickstart (Inference)
Training
Batch Inference & Evaluation
Benchmark
Links
FAQ
Citations

🚀 Quickstart (Inference)

💻 Colab

Play with the model in Colab: https://colab.research.google.com/drive/1Y2OkPE0hGSG4XRj_b7RsmWMVSg4KkhM7

🖥️ Local

🔒 Security note: We use HyperPyYAML for config loading. Treat configs as code: do not load model from untrusted sources.

Install from GitHub

pip install git+https://github.com/urgent-challenge/urgent2026_challenge_track2

Predict the speech quality metrics of a single audio file

from urgent2026_sqa.infer import infer_single, load_model

model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")

# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_single(model, config, "./assets/sp03.wav"))
print(infer_single(model, config, "./assets/sp03_casino_sn5.wav"))

Predict the speech quality metrics of all audio file in a folder

from urgent2026_sqa.infer import infer_list, load_model
from pathlib import Path

model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")
audio_paths = list(Path("./assets").glob("*.wav"))
# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_list(model, config, audio_paths))

🔬 Training

⚙️ Installation

git clone https://github.com/urgent-challenge/urgent2026_challenge_track2
cd urgent2026_challenge_track2

# Create and activate environment
conda create -n urgent2026-sqa python=3.11 -y
conda activate urgent2026-sqa

pip install -e .[train]

📊 Data

The following script fetches/organizes all datasets listed in below:

⚠️ NOTE: bvcc and bc19 datasets require manual processing after downloading. If you don't want to include them, comment out the corresponding line in scripts/data/prepare.sh.

bash scripts/prepare_data.sh </path/to/db>

	Corpus	#Samples	#Systems	Duration (hours)	Links	License
Training	BC19	136	21	0.32	[Original]	Custom
	BVCC	4973	175	5.56	[Original]	Custom
	NISQA	11020	N/A	27.21	[Original]	Mixed
	PSTN	58709	N/A	163.08	[Original]	Unknown
	SOMOS	14100	181	18.32	[Original] [Huggingface]	CC BY-NC-SA 4.0
	TCD-VoIP	384	24	0.87	[Original] [Huggingface]	CC BY-NC-SA 4.0
	Tencent	11563	N/A	23.51	[Original] [Huggingface]	Apache
	TMHINT-QI	12937	98	11.35	[Original] [Huggingface]	MIT
	TTSDS2	460	80	0.96	[Original] [Huggingface]	MIT
	urgent2024-sqa	238	238000	429.34	[Huggingface]	CC BY-NC-SA 4.0
	urgent2025-sqa	100000	100	261.31	[Huggingface]	CC BY-NC-SA 4.0
Dev	CHiME-7 UDASE Eval	640	5	0.84	[Original] [Huggingface]	CC BY-SA 4.0
Test	urgent2024-sqa (blind_test_mos)	6900	23	13.80	[Huggingface]	CC BY-NC-SA 4.0

🔥 Launch training

The following command train the Uni-VERSA-Ext with all prepared training datasets.

accelerate launch urgent2026/train.py \
  --config configs/universa-ext.yaml \
  --exp exp/universa-ext \
  --train-data "data/*/train/data.jsonl" \
  --cv-data "data/chime-7-udase-eval/test/data.jsonl"

✅ Training will auto-resume from the latest checkpoint

✅ Distributed training and mixed precision is supported with accelerate:

accelerate launch --num_processes=<N> \
  --main_process_port <port> \
  --mixed_precision=bf16 \
  urgent2026_sqa/train.py \
  ...

If you use configs/universa-ext_wavlm-base_mos-only.yaml you'll need to exclude the urgent2024-sqa training set:

accelerate launch urgent2026_sqa/train.py \
  --config configs/universa-ext_wavlm-base_mos-only.yaml \
  --exp exp/universa-ext_wavlm-base_mos-only \
  --train-data "data/{bvcc,bc19,nisqa,pstn,somos,tcd-voip,tencent,tmhint-qi,ttsds2}/train/data.jsonl" \
  --cv-data "data/chime-7-udase-eval/test/data.jsonl"

🛠️ (Optional) Build Your Own Multi-Metric Dataset

🚧 Under Construction

pip install -e .[dev]

📦 Batch Inference & Evaluation

Batch inference

For inference on single audio file, follow Quickstart (Inference)

For batch inference:

dataset="chime-7-udase-eval" python urgent2026_sqa/infer.py \
  --ckpt "exp/universa-ext/model_last.pt" \
  --data "data/${dataset}/test/data.jsonl" \
  --outdir "exp/universa-ext/infer/${dataset}"

This will genenerate a results.jsonl file and {metric}.scp files for all metrics under the --outdir

Evaluation

dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
  --pred "exp/universa-ext/infer/${dataset}/results.jsonl" \
  --ref  "data/${dataset}/test/data.jsonl"

You may also want to evaluate metric by comparing annotated metrics (e.g. scoreq) vs mos:

dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
  --pred "data/${dataset}/test/data.jsonl" \
  --ref  "data/${dataset}/test/data.jsonl" \
  --pred-metric "scoreq"

Metrics

Category	Metric	Value Range	Opt.
Error	System level MSE	[0, ∞)	↓
Error	Utterance level MSE	[0, ∞)	↓
Linear Correlation	System level LCC	[-1, 1]	↑
Linear Correlation	Utterance level LCC	[-1, 1]	↑
Rank Correlation	System level SRCC	[-1, 1]	↑
	Utterance level SRCC	[-1, 1]	↑
	System level KTAU	[-1, 1]	↑
	Utterance level KTAU	[-1, 1]	↑

Benchmark (WIP)

--

🔗 Links

Suggested MOS Predictors

Repository	Year	Paper
Uni-VERSA-Ext (This repo)	2025	Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
Uni-VERSA	2025	Uni-VERSA: Versatile Speech Assessment with a Unified Network
Distill-MOS	2025	Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
SCOREQ	2024	Speech Quality Assessment with Contrastive Regression
DNSMOSPro	2024	DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech
UTMOSv2	2024	The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
UTMOS	2022	SaruLab System for VoiceMOS Challenge 2022
SSL-MOS	2022	Generalization Ability of MOS Prediction Networks
NISQA	2021	NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets
LDNet	2021	LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech
DNSMOS	2020	DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Related Challenges

AudioMOS Challenge series (VoiceMOS 2022–2024, AudioMOS 2025): A series of benchmark challenges on MOS prediction for synthetic speech, singing voice music and general audio, providing large-scale datasets and standard evaluation protocols.

🙋 FAQ

Q1. The urgent2025-sqa dataset does not seem to have a mos labeled split as in urgent2024-sqa

A: The mos labeled split of urgent2025-sqa is partially used as test data for this challenge, it will be release after the challenge ends, stay tuned!

📚 Citations

If you use this code or datasets, please consider citing:

@article{UniVersaExt-Wang2025,
  title={Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment},
  author={Wang, Wei and Zhang, Wangyou and Li, Chenda and Shi, Jiatong and Watanabe, Shinji and Qian, Yanmin},
  journal={arXiv preprint arXiv:2506.12260},
  year={2025}
}

@inproceedings{Interspeech2025-Saijo2025,
  title={Interspeech 2025 {URGENT} Speech Enhancement Challenge},
  author={Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Ni, Zhaoheng and Kumar, Anurag and Sach, Marvin and Fu, Yihui and Wang, Wei and Fingscheidt, Tim and Watanabe, Shinji},
  booktitle={Proc. Interspeech},
  pages={858--862},
  year={2025},
}

@inproceedings{URGENT-Zhang2024,
  title={{URGENT} Challenge: Universality, Robustness, and Generalizability For Speech Enhancement},
  author={Zhang, Wangyou and Scheibler, Robin and Saijo, Kohei and Cornell, Samuele and Li, Chenda and Ni, Zhaoheng and Pirklbauer, Jan and Sach, Marvin and Watanabe, Shinji and Fingscheidt, Tim and Qian, Yanmin},
  booktitle={Proc. Interspeech},
  pages={4868--4872},
  year={2024}
}

@article{P808-Sach2025,
  title={P.808 Multilingual Speech Enhancement Testing: Approach and Results of {URGENT} 2025 Challenge},
  author={Sach, Marvin and Fu, Yihui and Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Kumar, Anurag and Wang, Wei and Qian, Yanmin and Watanabe, Shinji and Fingscheidt, Tim},
  journal={arXiv preprint arXiv:2507.11306},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
configs		configs
scripts		scripts
urgent2026_sqa		urgent2026_sqa
.gitmodules		.gitmodules
BENCHMARK.md		BENCHMARK.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URGENT 2026 — Track 2 (Speech Quality Assessment)

📑 Table of Contents

🚀 Quickstart (Inference)

💻 Colab

🖥️ Local

🔬 Training

⚙️ Installation

📊 Data

🔥 Launch training

🛠️ (Optional) Build Your Own Multi-Metric Dataset

📦 Batch Inference & Evaluation

Batch inference

Evaluation

Metrics

Benchmark (WIP)

🔗 Links

Suggested MOS Predictors

Related Challenges

🙋 FAQ

Q1. The urgent2025-sqa dataset does not seem to have a mos labeled split as in urgent2024-sqa

📚 Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

URGENT 2026 — Track 2 (Speech Quality Assessment)

📑 Table of Contents

🚀 Quickstart (Inference)

💻 Colab

🖥️ Local

🔬 Training

⚙️ Installation

📊 Data

🔥 Launch training

🛠️ (Optional) Build Your Own Multi-Metric Dataset

📦 Batch Inference & Evaluation

Batch inference

Evaluation

Metrics

Benchmark (WIP)

🔗 Links

Suggested MOS Predictors

Related Challenges

🙋 FAQ

Q1. The urgent2025-sqa dataset does not seem to have a mos labeled split as in urgent2024-sqa

📚 Citations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages