Q&A: Phase 33.1 ElasticWeightConsolidator — Design Questions #697

web3guru888 · 2026-04-13T17:37:34Z

web3guru888
Apr 13, 2026
Maintainer

This thread is for technical questions about the ElasticWeightConsolidator component.

Diagonal vs. K-FAC Fisher: Diagonal Fisher is O(p) memory but loses parameter correlations. K-FAC captures second-order structure at O(p·k) cost. When is the accuracy improvement worth the overhead?
Online vs. offline Fisher: Online EWC (exponential moving average) is memory-efficient but may over-weight recent tasks. What decay rate γ best balances recency and retention?
Lambda scheduling: Should λ_ewc be constant across tasks, or should it increase as more tasks are consolidated? What schedule prevents under-regularization early and over-regularization late?
Interaction with replay: When combined with experience replay (Phase 33.3), does EWC provide additive benefit, or is there redundancy? Should λ be reduced when replay is active?
Scalability to Transformers: Fisher computation requires per-sample gradients. For models >1B parameters, what approximations (e.g., layer-wise Fisher, Fisher subspace) maintain effectiveness?