Skip to content

chen-hao-chao/mdm-prime-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



MDM-Prime Paper on arXiv MDM-Prime-v2 on Hugging Face MDM-Prime-v2 on Docker MDM-Prime-v2 on Docker MDM-Prime-v2 on X

⚠️ Notice: Perplexity Evaluation Error

We have identified a serious error in the NLL evaluation results. As a result, the paper has been retracted. Please see our errata note and announcement for more details. A corrected implementation will be released soon.

What remains valid

The following results are unaffected and the code can still be used to reproduce them:

  • Claims about compute-optimal MDMs and ARMs (Tables 3, 4)
  • Scaling plots of MDMs and ARMs (Figure 5, Table 2)

What is affected

The NLL results for MDM-Prime-v2 do not represent a real improvement and may be overestimated.

We apologize for any inconvenience this may cause.


News

  • 📓 [May 1, 2026] Released errata note. The current NLL evaluation is incorrect.

What’s Inside

This repository contains the code implementation of the experiments presented in the paper MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models.

  • 🐳 Docker environments for easy installation
  • 🤗 Pretrained weights for inference and evaluation
  • 📉 Weights and Biases logs for enhanced reproducibility
  • 🔬 Code for all experiments in our paper:
    • Scaling Analysis
    • Larger-scale Pretraining

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

  • Download our docker image and launch gradio_demo.py:
# Pull and launch the docker image
docker pull chenhaochao/mdm-prime-v2-litgpt:latest
docker run -v $(pwd):/workspace --rm -it --gpus all --ipc=host -p 3000:3000 chenhaochao/mdm-prime-v2-litgpt:latest

# Install gradio and run gradio_demo.py
uv pip install gradio
/venv/mdm-prime-v2-litgpt/bin/python gradio_demo.py
  • Loading the model's weights takes a few minutes. After running the commands, the demo website will be available at http://localhost:3000/.

License

This code implementation is developed based on the following repositories.

Further changes based on the code in this folder are licensed under the Apache-2.0 license.

Citation

If you find this code implementation useful, please consider citing our papers.

@article{chao2026mdmprimev2,
      title = {{MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan},
      year = {2026},
}
@inproceedings{chao2025mdmprime,
      title = {{Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G. Krishnan},
      booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)},
      year = {2025},
}

About

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors