Description
When using speculative decoding / drafting (specifically with Qwen 27B Optimized Speed model), inference occasionally crashes with:
ValueError: SparseDistribution probabilities must have positive mass
This happens due to:
- Numerical instability or masking: Extreme logits or masked positions in the model can result in all
-inf values, which lead to NaN probabilities.
- Residual distribution logic: In
residual_distribution, the difference of probabilities can sum to exactly 0.0 or a non-finite value if a value is NaN. If total evaluates to NaN, the check total <= 0 evaluates to False, bypassing target fallback logic and constructing a SparseDistribution with all NaN probabilities, which crashes the inference runner.
Suggested Fix
- Fallback gracefully to a one-hot distribution on the first available token (greedy argmax) inside the
SparseDistribution constructor if the probability mass is zero or non-finite.
- Add strict
not np.isfinite(total) checks to the fallback paths in residual_distribution to prevent NaN values from bypassing target fallbacks.
I have implemented this fix and verified it with unit tests. I will submit a PR shortly.
Description
When using speculative decoding / drafting (specifically with Qwen 27B Optimized Speed model), inference occasionally crashes with:
ValueError: SparseDistribution probabilities must have positive massThis happens due to:
-infvalues, which lead toNaNprobabilities.residual_distribution, the difference of probabilities can sum to exactly0.0or a non-finite value if a value isNaN. Iftotalevaluates toNaN, the checktotal <= 0evaluates toFalse, bypassing target fallback logic and constructing aSparseDistributionwith allNaNprobabilities, which crashes the inference runner.Suggested Fix
SparseDistributionconstructor if the probability mass is zero or non-finite.not np.isfinite(total)checks to the fallback paths inresidual_distributionto preventNaNvalues from bypassing target fallbacks.I have implemented this fix and verified it with unit tests. I will submit a PR shortly.