Skip to content

urgent-challenge/urgent2026_challenge_track2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

39 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

URGENT 2026 โ€” Track 2 (Speech Quality Assessment)

Predict the Mean Opinion Score (MOS) of speech processed by speech enhancement (SE) systems. Check our challenge webpage for details

This repo provides the official implementation/baseline derived from Uni-VERSA-Ext for URGENT 2026 Track 2.

๐Ÿ“‘ Table of Contents


๐Ÿš€ Quickstart (Inference)

๐Ÿ’ป Colab

Play with the model in Colab: https://colab.research.google.com/drive/1Y2OkPE0hGSG4XRj_b7RsmWMVSg4KkhM7

๐Ÿ–ฅ๏ธ Local

๐Ÿ”’ Security note: We use HyperPyYAML for config loading. Treat configs as code: do not load model from untrusted sources.

  1. Install from GitHub
pip install git+https://github.com/urgent-challenge/urgent2026_challenge_track2
  1. Predict the speech quality metrics of a single audio file
from urgent2026_sqa.infer import infer_single, load_model

model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")

# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_single(model, config, "./assets/sp03.wav"))
print(infer_single(model, config, "./assets/sp03_casino_sn5.wav"))
  1. Predict the speech quality metrics of all audio file in a folder
from urgent2026_sqa.infer import infer_list, load_model
from pathlib import Path

model, config = load_model("vvwangvv/universa-ext_wavlm-base_5metric")
audio_paths = list(Path("./assets").glob("*.wav"))
# examples are from https://labsites.rochester.edu/air/projects/is2012/examples.html
print(infer_list(model, config, audio_paths))

๐Ÿ”ฌ Training

โš™๏ธ Installation

git clone https://github.com/urgent-challenge/urgent2026_challenge_track2
cd urgent2026_challenge_track2

# Create and activate environment
conda create -n urgent2026-sqa python=3.11 -y
conda activate urgent2026-sqa

pip install -e .[train]

๐Ÿ“Š Data

The following script fetches/organizes all datasets listed in below:

โš ๏ธ NOTE: bvcc and bc19 datasets require manual processing after downloading. If you don't want to include them, comment out the corresponding line in scripts/data/prepare.sh.

bash scripts/prepare_data.sh </path/to/db>
Corpus #Samples #Systems Duration (hours) Links License
Training BC19 136 21 0.32 [Original] Custom
BVCC 4973 175 5.56 [Original] Custom
NISQA 11020 N/A 27.21 [Original] Mixed
PSTN 58709 N/A 163.08 [Original] Unknown
SOMOS 14100 181 18.32 [Original] [Huggingface] CC BY-NC-SA 4.0
TCD-VoIP 384 24 0.87 [Original] [Huggingface] CC BY-NC-SA 4.0
Tencent 11563 N/A 23.51 [Original] [Huggingface] Apache
TMHINT-QI 12937 98 11.35 [Original] [Huggingface] MIT
TTSDS2 460 80 0.96 [Original] [Huggingface] MIT
urgent2024-sqa 238 238000 429.34 [Huggingface] CC BY-NC-SA 4.0
urgent2025-sqa 100000 100 261.31 [Huggingface] CC BY-NC-SA 4.0
Dev CHiME-7 UDASE Eval 640 5 0.84 [Original] [Huggingface] CC BY-SA 4.0
Test urgent2024-sqa (blind_test_mos) 6900 23 13.80 [Huggingface] CC BY-NC-SA 4.0

๐Ÿ”ฅ Launch training

The following command train the Uni-VERSA-Ext with all prepared training datasets.

accelerate launch urgent2026/train.py \
  --config configs/universa-ext.yaml \
  --exp exp/universa-ext \
  --train-data "data/*/train/data.jsonl" \
  --cv-data "data/chime-7-udase-eval/test/data.jsonl"

โœ… Training will auto-resume from the latest checkpoint

โœ… Distributed training and mixed precision is supported with accelerate:

accelerate launch --num_processes=<N> \
  --main_process_port <port> \
  --mixed_precision=bf16 \
  urgent2026_sqa/train.py \
  ...

If you use configs/universa-ext_wavlm-base_mos-only.yaml you'll need to exclude the urgent2024-sqa training set:

accelerate launch urgent2026_sqa/train.py \
  --config configs/universa-ext_wavlm-base_mos-only.yaml \
  --exp exp/universa-ext_wavlm-base_mos-only \
  --train-data "data/{bvcc,bc19,nisqa,pstn,somos,tcd-voip,tencent,tmhint-qi,ttsds2}/train/data.jsonl" \
  --cv-data "data/chime-7-udase-eval/test/data.jsonl"

๐Ÿ› ๏ธ (Optional) Build Your Own Multi-Metric Dataset

๐Ÿšง Under Construction

pip install -e .[dev]

๐Ÿ“ฆ Batch Inference & Evaluation

Batch inference

For inference on single audio file, follow Quickstart (Inference)

For batch inference:

dataset="chime-7-udase-eval" python urgent2026_sqa/infer.py \
  --ckpt "exp/universa-ext/model_last.pt" \
  --data "data/${dataset}/test/data.jsonl" \
  --outdir "exp/universa-ext/infer/${dataset}"

This will genenerate a results.jsonl file and {metric}.scp files for all metrics under the --outdir

Evaluation

dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
  --pred "exp/universa-ext/infer/${dataset}/results.jsonl" \
  --ref  "data/${dataset}/test/data.jsonl"

You may also want to evaluate metric by comparing annotated metrics (e.g. scoreq) vs mos:

dataset="chime-7-udase-eval" python urgent2026_sqa/eval.py \
  --pred "data/${dataset}/test/data.jsonl" \
  --ref  "data/${dataset}/test/data.jsonl" \
  --pred-metric "scoreq"

Metrics

Category Metric Value Range Opt.
Error System level MSE [0, โˆž) โ†“
Utterance level MSE [0, โˆž) โ†“
Linear Correlation System level LCC [-1, 1] โ†‘
Utterance level LCC [-1, 1] โ†‘
Rank Correlation System level SRCC [-1, 1] โ†‘
Utterance level SRCC [-1, 1] โ†‘
System level KTAU [-1, 1] โ†‘
Utterance level KTAU [-1, 1] โ†‘

Benchmark (WIP)

--

๐Ÿ”— Links

Suggested MOS Predictors

Repository Year Paper
Uni-VERSA-Ext (This repo) 2025 Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
Uni-VERSA 2025 Uni-VERSA: Versatile Speech Assessment with a Unified Network
Distill-MOS 2025 Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
SCOREQ 2024 Speech Quality Assessment with Contrastive Regression
DNSMOSPro 2024 DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech
UTMOSv2 2024 The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
UTMOS 2022 SaruLab System for VoiceMOS Challenge 2022
SSL-MOS 2022 Generalization Ability of MOS Prediction Networks
NISQA 2021 NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets
LDNet 2021 LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech
DNSMOS 2020 DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

Related Challenges


๐Ÿ™‹ FAQ

Q1. The urgent2025-sqa dataset does not seem to have a mos labeled split as in urgent2024-sqa

A: The mos labeled split of urgent2025-sqa is partially used as test data for this challenge, it will be release after the challenge ends, stay tuned!


๐Ÿ“š Citations

If you use this code or datasets, please consider citing:

@article{UniVersaExt-Wang2025,
  title={Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment},
  author={Wang, Wei and Zhang, Wangyou and Li, Chenda and Shi, Jiatong and Watanabe, Shinji and Qian, Yanmin},
  journal={arXiv preprint arXiv:2506.12260},
  year={2025}
}

@inproceedings{Interspeech2025-Saijo2025,
  title={Interspeech 2025 {URGENT} Speech Enhancement Challenge},
  author={Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Ni, Zhaoheng and Kumar, Anurag and Sach, Marvin and Fu, Yihui and Wang, Wei and Fingscheidt, Tim and Watanabe, Shinji},
  booktitle={Proc. Interspeech},
  pages={858--862},
  year={2025},
}

@inproceedings{URGENT-Zhang2024,
  title={{URGENT} Challenge: Universality, Robustness, and Generalizability For Speech Enhancement},
  author={Zhang, Wangyou and Scheibler, Robin and Saijo, Kohei and Cornell, Samuele and Li, Chenda and Ni, Zhaoheng and Pirklbauer, Jan and Sach, Marvin and Watanabe, Shinji and Fingscheidt, Tim and Qian, Yanmin},
  booktitle={Proc. Interspeech},
  pages={4868--4872},
  year={2024}
}

@article{P808-Sach2025,
  title={P.808 Multilingual Speech Enhancement Testing: Approach and Results of {URGENT} 2025 Challenge},
  author={Sach, Marvin and Fu, Yihui and Saijo, Kohei and Zhang, Wangyou and Cornell, Samuele and Scheibler, Robin and Li, Chenda and Kumar, Anurag and Wang, Wei and Qian, Yanmin and Watanabe, Shinji and Fingscheidt, Tim},
  journal={arXiv preprint arXiv:2507.11306},
  year={2025}
}

About

Official baseline for ICASSP 2026 URGENT Challenge Track 2 (Speech Quality Assessment)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก