Training: 78%|███████▊ | 1556/2000 [6:15:28<1:55:17, 15.58s/step]2025-08-23 06:19:20,889 - trainers.dit_trainer - WARNING - Step 1557: detected nan/inf/spiky gradient (norm=2.1966569423675537); skipping optimizer step.
Training: 78%|███████▊ | 1557/2000 [6:15:43<1:55:00, 15.58s/step]2025-08-23 06:19:36,454 - trainers.dit_trainer - WARNING - Step 1558: detected nan/inf/spiky gradient (norm=2.164307117462158); skipping optimizer step.
Training: 78%|███████▊ | 1558/2000 [6:15:59<1:54:43, 15.57s/step]2025-08-23 06:19:52,016 - trainers.dit_trainer - WARNING - Step 1559: detected nan/inf/spiky gradient (norm=2.1794376373291016); skipping optimizer step.
Training: 78%|███████▊ | 1559/2000 [6:16:15<1:54:26, 15.57s/step]2025-08-23 06:20:07,596 - trainers.dit_trainer - WARNING - Step 1560: detected nan/inf/spiky gradient (norm=2.179413080215454); skipping optimizer step.
2025-08-23 06:20:07,600 - trainers.dit_trainer - INFO - Step 1560 | Loss: 0.5275 | LR: 3.00e-04 | GradNorm: 2.29 | TPS: 16.88K | Data: 0.007s | Frozen: 0.097s | TrainableFwd: 0.701s | TrainableBwd: 1.170s
This is the string of the logs