Could you share the training loss to improve reproducibility?

Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:
<img width="459" alt="image" src="https://github.com/google-research/FLAN/assets/8935605/9c5d4e16-35a6-4b8e-9010-1df3144d2e60">

I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share the training loss to improve reproducibility? #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you share the training loss to improve reproducibility? #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions