Forward reward always 0 #707

Open

opened

on Oct 24, 2025

I run grpo_classification.py with my own dataset, while my code is always 0. How ot fix.

I use Qwen3-VL-8B-Instruct as the base model

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

⚡