Skip to content

fix: prevent NaN propagation in autoregressive decode loop#381

Open
JasonOA888 wants to merge 1 commit intogoogle-research:masterfrom
JasonOA888:fix/issue-321-nan-forecast
Open

fix: prevent NaN propagation in autoregressive decode loop#381
JasonOA888 wants to merge 1 commit intogoogle-research:masterfrom
JasonOA888:fix/issue-321-nan-forecast

Conversation

@JasonOA888
Copy link
Copy Markdown

Fixes #321

When normalize_inputs=True (default), the compiled decode path applies global revin normalization before decode(), which then applies its own patch-level revin. This double normalization can produce values with very small variance, causing sigma to approach zero in subsequent AR steps. Division by near-zero sigma amplifies numerical noise into NaN/Inf, which cascades through all remaining decode steps.

Changes:

  • timesfm_2p5_torch.py: Add nan_to_num guards after both prefill and AR renormalization
  • torch/util.py: Add comment to revin sigma guard

When normalize_inputs=True (default), the compiled decode path applies
global revin normalization before the decode function, which then applies
its own patch-level revin. This double normalization can produce values
with very small variance, causing sigma to approach zero in subsequent
AR steps. Division by near-zero sigma amplifies numerical noise into
NaN/Inf, which cascades through all remaining decode steps.

Fixes google-research#321

Changes:
- torch/util.py: Add descriptive comment to revin sigma guard
- timesfm_2p5_torch.py: Add nan_to_num guards after both prefill and
  AR renormalization to prevent NaN/Inf from cascading through the
  decode loop
- Both guards replace NaN/Inf with 0.0 (the neutral element for the
  additive residual path in the transformer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nan output

1 participant