fix: prevent NaN propagation in autoregressive decode loop#381
Open
JasonOA888 wants to merge 1 commit intogoogle-research:masterfrom
Open
fix: prevent NaN propagation in autoregressive decode loop#381JasonOA888 wants to merge 1 commit intogoogle-research:masterfrom
JasonOA888 wants to merge 1 commit intogoogle-research:masterfrom
Conversation
When normalize_inputs=True (default), the compiled decode path applies global revin normalization before the decode function, which then applies its own patch-level revin. This double normalization can produce values with very small variance, causing sigma to approach zero in subsequent AR steps. Division by near-zero sigma amplifies numerical noise into NaN/Inf, which cascades through all remaining decode steps. Fixes google-research#321 Changes: - torch/util.py: Add descriptive comment to revin sigma guard - timesfm_2p5_torch.py: Add nan_to_num guards after both prefill and AR renormalization to prevent NaN/Inf from cascading through the decode loop - Both guards replace NaN/Inf with 0.0 (the neutral element for the additive residual path in the transformer)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #321
When normalize_inputs=True (default), the compiled decode path applies global revin normalization before decode(), which then applies its own patch-level revin. This double normalization can produce values with very small variance, causing sigma to approach zero in subsequent AR steps. Division by near-zero sigma amplifies numerical noise into NaN/Inf, which cascades through all remaining decode steps.
Changes: