Commit b2ff855
Michael Dzamba
exp35: InferenceSettings.freeze_params (skip weight-grad backward kernels)
Adds an opt-in flag that calls requires_grad_(False) on every model
parameter at predict-time _lazy_init. For gradient-force inference,
autograd then skips the weight-grad path of every Linear / segment_mm
backward, saving CUDA time and peak memory. The win is conditional —
helps when paired with moe_layer_type=fairchem_cpp, can regress under
tf32+pytorch MOLE due to cuBLAS fused-(dx,dW) kernel selection. Off by
default.1 parent 081a318 commit b2ff855
2 files changed
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
75 | 86 | | |
76 | 87 | | |
77 | 88 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
464 | 468 | | |
465 | 469 | | |
466 | 470 | | |
| |||
0 commit comments