Q&A: Phase 33.1 ElasticWeightConsolidator — Design Questions #697
Unanswered
web3guru888
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Q&A: Phase 33.1 — ElasticWeightConsolidator Design Questions
This thread is for technical questions about the ElasticWeightConsolidator component.
Open design questions
Diagonal vs. K-FAC Fisher: Diagonal Fisher is O(p) memory but loses parameter correlations. K-FAC captures second-order structure at O(p·k) cost. When is the accuracy improvement worth the overhead?
Online vs. offline Fisher: Online EWC (exponential moving average) is memory-efficient but may over-weight recent tasks. What decay rate γ best balances recency and retention?
Lambda scheduling: Should λ_ewc be constant across tasks, or should it increase as more tasks are consolidated? What schedule prevents under-regularization early and over-regularization late?
Interaction with replay: When combined with experience replay (Phase 33.3), does EWC provide additive benefit, or is there redundancy? Should λ be reduced when replay is active?
Scalability to Transformers: Fisher computation requires per-sample gradients. For models >1B parameters, what approximations (e.g., layer-wise Fisher, Fisher subspace) maintain effectiveness?
References
Beta Was this translation helpful? Give feedback.
All reactions