I noticed that this line was commented out before your LoraConfig to transformer:
transformer.requires_grad_(False)
Doesn't this go against the original intention of your detail expert for fine-tuning the semantic expert?
Total num of the training parameters is about 1.2 billion
btw im trying to train the detail expert but OOM problem happended. The problem was found in computing gan_g_loss. wondering if you could give me some suggestions. looking forward to your reply
I noticed that this line was commented out before your LoraConfig to transformer:
transformer.requires_grad_(False)
Doesn't this go against the original intention of your detail expert for fine-tuning the semantic expert?
Total num of the training parameters is about 1.2 billion
btw im trying to train the detail expert but OOM problem happended. The problem was found in computing gan_g_loss. wondering if you could give me some suggestions. looking forward to your reply