Skip to content

Question about the mask token within the prompt area. #30

@ziz-797

Description

@ziz-797

Hi,

Thank you very much for your excellent and inspiring work!

I have a question regarding the mask token design during training. I noticed that some structural tokens within the prompt region (e.g., , ) are also masked, and their logits are set to -inf during inference.

I was wondering:

What is the main motivation behind masking these structural tokens?

Does this strategy contribute to improved model performance or training stability?

Is this design primarily intended to enforce strict generation constraints, or does it also provide benefits during representation learning?

I would greatly appreciate any clarification on the rationale behind this design choice.

Thank you very much for your time and support.

Best regards,
ziz-797

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions