mdm-prime-v2/README.md at main · chen-hao-chao/mdm-prime-v2

⚠️ Notice: Perplexity Evaluation Error

We have identified a serious error in the NLL evaluation results. As a result, the paper has been retracted. Please see our errata note and announcement for more details. A corrected implementation will be released soon.

What remains valid

The following results are unaffected and the code can still be used to reproduce them:

Claims about compute-optimal MDMs and ARMs (Tables 3, 4)
Scaling plots of MDMs and ARMs (Figure 5, Table 2)

What is affected

The NLL results for MDM-Prime-v2 do not represent a real improvement and may be overestimated.

We apologize for any inconvenience this may cause.

News

📓 [May 1, 2026] Released errata note. The current NLL evaluation is incorrect.

What’s Inside

This repository contains the code implementation of the experiments presented in the paper MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models.

🐳 Docker environments for easy installation
🤗 Pretrained weights for inference and evaluation
📉 Weights and Biases logs for enhanced reproducibility
🔬 Code for all experiments in our paper:
- Scaling Analysis
- Larger-scale Pretraining

Overview

Scaling Analysis

Folder: mdm-prime-v2/megatron
Dataset: allenai/c4
Weights & Biases Logs: lance_chao/megatron-all-runs
Experiment: Section 4.1 in our paper
Best for: (1) Studying the loss behavior; (2) Pretraining with advanced parallelism

Larger-scale Pretraining

Folder: mdm-prime-v2/lit_gpt
Dataset: cerebras/SlimPajama-627B (or gmongaras/SlimPajama-627B_Reupload)
Experiment: Section 4.3 in our paper
Best for: (1) Pretraining 1.1B models; (2) Running inference and downstream applications

Demo

Download our docker image and launch gradio_demo.py:

# Pull and launch the docker image
docker pull chenhaochao/mdm-prime-v2-litgpt:latest
docker run -v $(pwd):/workspace --rm -it --gpus all --ipc=host -p 3000:3000 chenhaochao/mdm-prime-v2-litgpt:latest

# Install gradio and run gradio_demo.py
uv pip install gradio
/venv/mdm-prime-v2-litgpt/bin/python gradio_demo.py

Loading the model's weights takes a few minutes. After running the commands, the demo website will be available at http://localhost:3000/.

License

This code implementation is developed based on the following repositories.

ML-GSAI/SMDM (at commit 1df2e12), licensed under the Apache-2.0 license.
jzhang38/TinyLlama (at commit bf12224), licensed under the Apache-2.0 license.
NVIDIA/Megatron-LM (at commit 636179d), licensed under the Apache-2.0 license.
wmn-231314/diffusion-data-constraint (at commit 61002b2), licensed under the Apache-2.0 license.

Further changes based on the code in this folder are licensed under the Apache-2.0 license.

Citation

If you find this code implementation useful, please consider citing our papers.

@article{chao2026mdmprimev2,
      title = {{MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Junwei Quan, Chun-Yi Lee, Rahul G. Krishnan},
      year = {2026},
}
@inproceedings{chao2025mdmprime,
      title = {{Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking}}, 
      author = {Chen-Hao Chao, Wei-Fang Sun, Hanwen Liang, Chun-Yi Lee, Rahul G. Krishnan},
      booktitle = {Proceedings of the Conference on Neural Information Processing Systems (NeurIPS)},
      year = {2025},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚠️ Notice: Perplexity Evaluation Error

What remains valid

What is affected

News

What’s Inside

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

License

Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

⚠️ Notice: Perplexity Evaluation Error

What remains valid

What is affected

News

What’s Inside

Overview

Scaling Analysis

Larger-scale Pretraining

Demo

License

Citation