Creates Judge Example as a wrapper on Policy by Jack-Khuu · Pull Request #202 · meta-pytorch/torchforge

Jack-Khuu · 2025-09-19T23:27:01Z

Update(10/10) : This is a very much simplified version of the previous iterations of the PRs. It provides just a basic example of LLM Judges for GRPO

Judges can both be used as "Verifiers" or "Graders". This PR adds to the sandbox, a CorrectnessJudge example of how an LLM Judge can be used in GRPO (note that this PR does not integrate)

It takes as input (prompt + response) generated from a model, and returns whether the model thinks it accurately responded to the prompt. Results can then be used to make decisions during GRPO whitening

python -m tests.sandbox.vllm.judge --config tests/sandbox/vllm/qwen3_4b.yaml

allenwang28

Prompt: What is the capital of Japan?
Responses: ['Aardvark', 'Durian', 'Tokyo']

Generation Results:
================================================================================
Sample 1
Evaluation: 3
--------------------------------------------------------------------------------
Sample 2
Evaluation: 3
--------------------------------------------------------------------------------
Sample 3
Evaluation: 3
--------------------------------------------------------------------------------
Sample 4
Evaluation: 3
--------------------------------------------------------------------------------

lol is this working correctly?

Jack-Khuu · 2025-09-23T18:28:23Z

lol is this working correctly?

I wrote this prompt from the deep archives of my mind and I'm also shocked that the prompting worked.

Jack-Khuu · 2025-10-11T00:09:58Z

Update (10/10): This is a very much simplified version of the previous iterations of the PRs. It provides just a basic example of LLM Judges for GRPO

There is out of scope future work leveraging structured decoding, but that requires additional investigation on how to configure with CoT models (which need to think)

Push basic GenerativeJudge example

249a6ab

Jack-Khuu requested review from Ritesh1905, allenwang28, felipemello1 and joecummings September 19, 2025 23:27

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025

Ritesh1905 reviewed Sep 20, 2025

View reviewed changes

Comment thread apps/vllm/judge.py Outdated

Ritesh1905 reviewed Sep 20, 2025

View reviewed changes

Comment thread src/forge/actors/generative_judge.py Outdated

Ritesh1905 reviewed Sep 20, 2025

View reviewed changes

Comment thread apps/vllm/judge.py Outdated

allenwang28 reviewed Sep 20, 2025

View reviewed changes

Comment thread apps/vllm/judge.py Outdated

Comment thread apps/vllm/judge.py Outdated

Jack-Khuu added 2 commits September 23, 2025 13:25

Merge remote-tracking branch 'origin/main' into judge2

be26c39

[Debug] Individual LLM/Reward Actors

e88f58f

Jack-Khuu changed the title ~~Creates GenerativeJudge as an interface for LLM Judges~~ [WIP] Creates GenerativeJudge as an interface for LLM Judges Sep 24, 2025

Jack-Khuu commented Sep 24, 2025

View reviewed changes

Comment thread src/forge/actors/generative_judge.py Outdated

Jack-Khuu added 6 commits September 24, 2025 09:55

Merge remote-tracking branch 'origin/main' into judge2

634fe59

remove unused

336c997

debug

6a01bd7

Merge remote-tracking branch 'origin/main' into judge2

8c87d42

Refactor to subclass policy

f80ff68

Light cleanup-still testing

53607fd

Jack-Khuu changed the title ~~[WIP] Creates GenerativeJudge as an interface for LLM Judges~~ [WIP] Creates Judges as a wrapper on Policy Oct 4, 2025

Need to test math

00bbffa

allenwang28 reviewed Oct 6, 2025

View reviewed changes

Comment thread src/forge/actors/judge.py Outdated

Comment thread src/forge/actors/judge.py Outdated

joecummings reviewed Oct 6, 2025

View reviewed changes

Comment thread src/forge/actors/judge.py Outdated

Jack-Khuu requested review from ebsmothers and pbontrager October 6, 2025 21:07

Jack-Khuu added 3 commits October 8, 2025 10:04

Merge remote-tracking branch 'origin/main' into judge2

c5e7b07

Psh to switch machines

f3ae7da

Merge remote-tracking branch 'origin/main' into judge2

fa18b3e

Jack-Khuu added 3 commits October 10, 2025 16:50

Clean up and simplify Judge

b67a95d

Merge remote-tracking branch 'origin/main' into judge2

b5a9d70

Rebase typo

b695b3b

Jack-Khuu changed the title ~~[WIP] Creates Judges as a wrapper on Policy~~ Creates Judge Example as a wrapper on Policy Oct 11, 2025

Jack-Khuu requested a review from joecummings October 11, 2025 00:10

Merge branch 'main' into judge2

269b3f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creates Judge Example as a wrapper on Policy#202

Creates Judge Example as a wrapper on Policy#202
Jack-Khuu wants to merge 17 commits intomainfrom
judge2

Jack-Khuu commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenwang28 left a comment

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu commented Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Jack-Khuu commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allenwang28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jack-Khuu commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jack-Khuu commented Sep 19, 2025 •

edited

Loading

Jack-Khuu commented Sep 23, 2025 •

edited

Loading