Question about the mask token within the prompt area.

Hi,

Thank you very much for your excellent and inspiring work!

I have a question regarding the mask token design during training. I noticed that some structural tokens within the prompt region (e.g., <startoftext>, <endoftext>) are also masked, and their logits are set to -inf during inference.

I was wondering:

What is the main motivation behind masking these structural tokens?

Does this strategy contribute to improved model performance or training stability?

Is this design primarily intended to enforce strict generation constraints, or does it also provide benefits during representation learning?

I would greatly appreciate any clarification on the rationale behind this design choice.

Thank you very much for your time and support.

Best regards,
ziz-797

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the mask token within the prompt area. #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the mask token within the prompt area. #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions