Traditional throughput metrics (tokens per second) are hardware-dependent, making fair comparisons difficult. We introduce AUP (Accuracy Under Parallelism), a hardware-independent metric that jointly measures efficiency and performance.
AUP captures both parallelism (tokens per forward pass) and accuracy, with a weighting function that penalizes accuracy degradation
Key insight: AUP uses tokens per forward (TPF) instead of tokens per second (TPS), making it device-independent. A higher AUP score means the model maintains accuracy while achieving high parallelism.
We have released a dLLM Leaderboard comparing different dLLMs. You can find it at 🌐 this blog.
