Adding OpenEnv Wordle GRPO sample by allela-roy · Pull Request #1063 · awslabs/awsome-distributed-training

allela-roy · 2026-04-14T17:06:34Z

Purpose

Adding a sample for OpenEnv Wordle GRPO Training on SageMaker HyperPod (EKS)

Changes

New sample

Test Plan

Environment:

AWS Service: SageMaker HyperPod (EKS)
Instance type: ml.g6e.12xlarge (4x NVIDIA L40S, 48 GB each)
Number of nodes: 2

Test commands:

# 1. Set up environment variables
source env_vars

# 2. Build and push container image
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $REGISTRY
docker build -t ${REGISTRY}${IMAGE}:${TAG} .
docker push ${REGISTRY}${IMAGE}:${TAG}

# 3. Deploy Wordle environment
envsubst < kubernetes/openenv-wordle-env.yaml | kubectl apply -f -
kubectl wait --for=condition=Available deployment/openenv-wordle --timeout=300s

# 4. Verify Wordle env health
kubectl run test-env --rm -it --restart=Never --image=curlimages/curl -- \
  curl -s http://openenv-wordle:7860/health

# 5. Launch single-GPU colocate training
envsubst < kubernetes/train-grpo-wordle.yaml | kubectl apply -f -
kubectl wait --for=condition=Ready pod/grpo-wordle --timeout=300s

Test Results

Single-node colocate mode verified with 6+ training steps, 0 pod restarts, all 4 reward signals active:

Step 1 (epoch 0.011): loss=1.696e-32, reward=0.9385, reward_correct=0.2688, greens=0.1875, yellows=0.175, repetition=0.3073, step_time=152.9s
Step 2 (epoch 0.022): loss=3.387e-32, reward=0.9281, reward_correct=0.25,   greens=0.1531, yellows=0.2125, repetition=0.3125, step_time=153.0s
Step 3 (epoch 0.032): loss=6.355e-33, reward=0.887,  reward_correct=0.2641, greens=0.1906, yellows=0.1562, repetition=0.276,  step_time=154.4s
Step 4 (epoch 0.043): loss=-2.26e-32, reward=0.9156, reward_correct=0.2594, greens=0.1625, yellows=0.1969, repetition=0.2969, step_time=152.4s
Step 5 (epoch 0.054): loss=1.503e-32, reward=0.9208, reward_correct=0.2688, greens=0.1719, yellows=0.2094, repetition=0.2708, step_time=157.3s
Step 6 (epoch 0.065): loss=-6.63e-32, reward=1.013,  reward_correct=0.3063, greens=0.2062, yellows=0.2063, repetition=0.2943, step_time=151.1s

Pod status throughout training:

NAME          READY   STATUS    RESTARTS   AGE
grpo-wordle   1/1     Running   0          25m

Directory Structure

3.test_cases/
└── <framework>/                # e.g. pytorch, megatron, jax
    └── <library>/              # e.g. picotron, FSDP, megatron-lm
        └── <model>/            # e.g. SmolLM-1.7B (may be omitted for single-model cases)
            ├── Dockerfile      # Container / environment setup
            ├── README.md       # Overview, prerequisites, usage
            ├── slurm/          # Slurm-specific launch scripts
            ├── kubernetes/     # Kubernetes manifests
            └── hyperpod-eks/   # HyperPod EKS instructions

Top-level files (Dockerfile, README.md, training scripts, configs) cover general setup.
Subdirectories (slurm/, kubernetes/, hyperpod-eks/) contain service-specific launch instructions.
Not all service subdirectories are required — include only the ones relevant to your test case.

Checklist

[X ] I have read the contributing guidelines.
[ X] I am working against the latest main branch.
[X ] I have searched existing open and recently merged PRs to confirm this is not a duplicate.
[ X] The contribution is self-contained with documentation and scripts.
[X ] External dependencies are pinned to a specific version or tag (no latest).
[X ] A README is included or updated with prerequisites, instructions, and known issues.
[ X] New test cases follow the expected directory structure.

KeitaW

Review 1/5 — Structure & Repository Hygiene

Thanks for this contribution — the OpenEnv Wordle GRPO sample is a great addition to the TRL test cases. I have some findings across several categories; posting them as themed batches for easier navigation.

This batch covers image pinning and Dockerfile robustness.

KeitaW · 2026-04-15T02:32:30Z

@@ -0,0 +1,60 @@
+FROM public.ecr.aws/hpc-cloud/nccl-tests:latest


Base image uses :latest tag

Per repo conventions, container image tags must use fixed versions, never latest. This makes builds non-reproducible — a new push to nccl-tests:latest could silently break the build. I'd suggest pinning to a specific digest or tag. You can find available tags with:

aws ecr-public describe-image-tags --repository-name hpc-cloud/nccl-tests --region us-east-1

Reference: review-checklist, "Fixed version tags."

@KeitaW , don't we want to use the latest since this contains the recommended versions of EFA, NCCL libs?

@allela-roy it doesn't, does it? As of today (April 16th 2026), these are the latest versions on that docker image:

CUDA 12.8
EFA Installer 1.43.0
aws-ofi-nccl 1.16.3
libfabric 1.27.0

So, either you make sure you tag a specific version and on your Dockerfile you update the libraries, or rely on latest anyway and rebuild everything even if that means downgrading a library.

Either, it is a best practice, not a blocker. Chose your own way and do it, but make it reproducible and that it does not break in the near future.

KeitaW · 2026-04-15T02:32:30Z

+                      - Schedulable
+      containers:
+      - name: wordle-env
+        image: registry.hf.space/burtenshaw-wordle:latest


Environment image uses :latest

registry.hf.space/burtenshaw-wordle:latest — if there's no versioned tag for this HuggingFace Space image, I'd suggest adding a comment noting that this is externally managed and may change without notice.

KeitaW · 2026-04-15T02:32:30Z

+      preferredDuringSchedulingIgnoredDuringExecution:
+        - weight: 100
+          preference:
+            matchExpressions:


Init container image uses :latest

I'd suggest pinning to a specific version:

Suggested change

matchExpressions:

image: curlimages/curl:8.12.1

KeitaW · 2026-04-15T02:32:30Z

+export ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
+export REGISTRY=${ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com/
+export IMAGE=openenv-wordle-grpo
+export TAG=latest


Default TAG should not be latest

Combined with the :latest base image, this creates a double-unpinned build. I'd suggest defaulting to a version-based tag:

Suggested change

export TAG=latest

export TAG=v1.0

KeitaW · 2026-04-15T02:32:30Z

+src = p.read_text(); \
+old = 'from transformers.utils.import_utils import _is_package_available'; \
+new = 'from transformers.utils.import_utils import _is_package_available as _orig_is_pkg_avail\n\ndef _is_package_available(pkg_name, return_version=False):\n    result = _orig_is_pkg_avail(pkg_name, return_version=return_version)\n    if return_version:\n        return result if isinstance(result, tuple) else (result, None)\n    return result[0] if isinstance(result, tuple) else result'; \
+p.write_text(src.replace(old, new))"
+
+# Fix GRPOTrainer compatibility with transformers 5.x (warnings_issued attr)
+# and vLLM 0.8.x (logprobs_mode kwarg not supported).
+RUN python -c "\
+import pathlib; \
+p = pathlib.Path('/opt/miniconda3/lib/python3.12/site-packages/trl/trainer/grpo_trainer.py'); \
+src = p.read_text(); \


Inline TRL patches are fragile across versions

These python -c blocks patch TRL source files at exact paths using string replacement. If TRL updates these lines (even whitespace changes), the patches will silently fail to match and the str.replace() will be a no-op, leaving the bugs in place.

I'd suggest:

Adding verification after each patch: python -c "assert 'patched_text' in open('/path/to/file').read()"

Adding a comment noting which TRL issue/PR tracks the upstream fix so future maintainers know when the patches can be removed

or fix TRL directory (nice PR opportunity).

KeitaW

Nice PR few nits.

KeitaW

Review 2/5 — Deployment Pipeline

This batch covers the multi-GPU manifest GPU allocation issue, envsubst usage, missing copyright header, and credential handling.

KeitaW · 2026-04-15T02:32:54Z

+        - --host
+        - "0.0.0.0"
+        - --port


Trainer container requests 1 GPU but claims to use GPUs 1-3

The manifest header (lines 6-8) describes "GPU 0: vLLM server, GPU 1-3: FSDP training" and line 121 uses accelerate launch --num_processes 1. But Kubernetes GPU device plugin allocates GPUs at the container level — requesting 1 GPU here means the trainer gets exactly 1 GPU, not 3.

To achieve the described 1+3 split, I'd suggest:

Suggested change

- --host

- "0.0.0.0"

- --port

requests:

nvidia.com/gpu: 3

memory: "60Gi"

And updating accelerate launch --num_processes 1 on line 121 to --num_processes 3.

Alternatively, if the intent is really 1+1 (total 2 GPUs), the README section "4. Multi-GPU Training" and the manifest header comments should be updated to match.

This is the most significant functional issue I noticed.

KeitaW · 2026-04-15T02:32:54Z

+    sagemaker.amazonaws.com/job-max-retry-count: "3"
+spec:


$HF_TOKEN exposed as plain-text env var

After envsubst, this embeds the actual token in the manifest YAML. Anyone with kubectl get pod -o yaml access can read it. I'd suggest using a Kubernetes Secret instead:

Suggested change

sagemaker.amazonaws.com/job-max-retry-count: "3"

spec:

- name: HF_TOKEN

valueFrom:

secretKeyRef:

name: hf-token

key: token

This applies to all manifests in this PR (train-grpo-wordle.yaml, train-grpo-wordle-multigpu.yaml, inference-wordle.yaml).

KeitaW · 2026-04-15T02:32:54Z

@@ -0,0 +1,60 @@
+FROM public.ecr.aws/hpc-cloud/nccl-tests:latest


Missing copyright header

The Dockerfile is missing the standard repo copyright header. All other files in this PR include it. I'd suggest adding at the top:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: MIT-0

(The training script uses Apache 2.0 from the original HuggingFace source, which is fine for copied code — but the Dockerfile is original.)

KeitaW

Review 3/5 — Infrastructure & NCCL Configuration

Shared memory mounting pattern.

KeitaW · 2026-04-15T02:33:04Z

+          memory: "180Gi"
+      env:
+      - name: PYTORCH_CUDA_ALLOC_CONF


/dev/shm mounted as hostPath instead of emptyDir with Memory medium

Using hostPath: /dev/shm shares the host's shared memory with the pod, which means multiple pods on the same node can interfere with each other. The preferred Kubernetes pattern is an isolated emptyDir:

Suggested change

memory: "180Gi"

env:

- name: PYTORCH_CUDA_ALLOC_CONF

- name: shmem

emptyDir:

medium: Memory

sizeLimit: 16Gi

This gives the pod its own isolated shared memory allocation. The same applies to the other manifests in this PR (train-grpo-wordle-multigpu.yaml, inference-wordle.yaml).

KeitaW

Review 4/5 — Documentation Consistency & Training Code Quality

Dangling references, missing newline, and dependency pinning.

KeitaW · 2026-04-15T02:33:38Z

+# Check checkpoints
+kubectl exec fsx-storage-manager -- ls -la /fsx/checkpoints/wordle-grpo/
+```
+


README references fsx-storage-manager pod that isn't created by the PR

kubectl exec fsx-storage-manager -- ls -la /fsx/checkpoints/wordle-grpo/

There's no manifest or instructions for creating an fsx-storage-manager pod. A user following the README would get Error from server (NotFound). I'd suggest replacing with a command that uses the training pod itself:

Suggested change

kubectl exec grpo-wordle -- ls -la /fsx/checkpoints/wordle-grpo/

KeitaW · 2026-04-15T02:33:38Z

+
+## YOUR GOAL
+
+Solve the Wordle in as few guesses as possible by strategically using feedback to eliminate impossible words and narrow down the solution space efficiently.


Missing newline at end of file

Per the repo's .editorconfig (insert_final_newline = true), all files should end with a newline character.

KeitaW · 2026-04-15T02:33:38Z

+accelerate>=0.27.0
+peft>=0.9.0
+deepspeed>=0.14.0
+wandb
+trl[vllm]


Dependencies use open-ended lower bounds without upper pins

Several dependencies use >= without upper bounds, which can pull incompatible major versions over time. I'd suggest adding upper bounds:

Suggested change

accelerate>=0.27.0

peft>=0.9.0

deepspeed>=0.14.0

wandb

trl[vllm]

transformers>=4.48.0,<5.0

datasets>=2.17.0,<3.0

accelerate>=0.27.0,<2.0

peft>=0.9.0,<1.0

deepspeed>=0.14.0,<1.0

Also note: trl[vllm] is installed here (pulling in a vllm version), then TRL is upgraded to 0.26.2 with --no-deps in the Dockerfile. The vllm version is therefore determined by whatever trl[vllm] pulled initially, not by TRL 0.26.2's constraints. This may be intentional but is worth documenting.

KeitaW · 2026-04-15T02:33:38Z

+
+# Flash Attention (requires torch, must install separately with --no-build-isolation)
+RUN pip install flash-attn>=2.5.0 --no-build-isolation
+


flash-attn has no upper bound pin

Flash Attention builds from source and is sensitive to CUDA/PyTorch version combinations. An unpinned upper bound could pull a version incompatible with torch 2.6.0. I'd suggest:

Suggested change

RUN pip install flash-attn>=2.5.0,<3.0 --no-build-isolation

KeitaW

Review 5/5 — Things That Look Great

Excellent README structure. The pipeline architecture diagram, prerequisite list, step-by-step instructions with expected outputs, configuration reference table, and reward function documentation are all well-done. This is one of the more polished READMEs I've seen in a test case contribution.
OpenEnv is a compelling choice for demonstrating agentic RL. The client-server architecture (environment as a Kubernetes service, training as a separate pod) showcases real Kubernetes-native ML workflows. The "What is OpenEnv?" section does a great job explaining why this matters.
Two deployment modes (single-GPU colocate and multi-GPU server) give users flexibility. Colocate is great for quick iteration; server mode demonstrates a production-like split.
HyperPod-aware manifests. The tolerations, node selectors, health-check affinity, and auto-resume annotations follow HyperPod best practices consistently across all manifests.
The training script is well-structured. Clean separation between rollout logic (rollout_once), reward functions, and the main entrypoint. The argparse interface makes it easy to experiment.
Four distinct reward signals (correctness, greens, yellows, repetition) provide rich training signal and make the training dynamics interpretable from logs.
Copyright headers present on all shell scripts, YAML manifests, and the env_vars example.
The Wordle environment as a Deployment with readiness/liveness probes ensures the environment is healthy before training starts — much better than a bare pod.

Cross-cutting note: `envsubst` variable whitelist

The README instructs users to run envsubst < kubernetes/*.yaml | kubectl apply -f - without an explicit variable whitelist. Without a whitelist, envsubst substitutes every $VAR in the YAML. I'd suggest using explicit whitelists in the README commands, e.g.:

envsubst '$NAMESPACE $REGISTRY $IMAGE $TAG $HF_TOKEN $MODEL_NAME $VLLM_MODE $NUM_GPU_PER_NODE $NUM_GENERATIONS $GRADIENT_ACCUMULATION_STEPS $LEARNING_RATE $INSTANCE_TYPE $FSX_PVC' < kubernetes/train-grpo-wordle.yaml | kubectl apply -f -

paragao

Thanks for the contribution — the README quality and overall structure are solid. A few items to address before we can merge:

1. HF_TOKEN plain-text exposure (Security — Blocking)

All Kubernetes manifests (train-grpo-wordle.yaml, train-grpo-wordle-multigpu.yaml, inference-wordle.yaml) pass HF_TOKEN as a plain-text value: field after envsubst processing. Anyone with kubectl get pod -o yaml access can read the token.

Please replace the plain-text value with a Kubernetes Secret reference, for example:

- name: HF_TOKEN
  valueFrom:
    secretKeyRef:
      name: hf-token
      key: token

And add instructions to the README for creating the secret:

kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN

2. Multi-GPU manifest: GPU allocation mismatch (Blocking)

The comments in train-grpo-wordle-multigpu.yaml describe:

GPU 0: Dedicated vLLM server
GPU 1-3: FSDP-sharded GRPO training

But both the vllm-server and trainer containers request only 1 GPU each (nvidia.com/gpu: 1), and the training command uses accelerate launch --num_processes 1.

This means only 2 GPUs are used total, not 4 as described. Could you clarify the intended configuration?

If the intent is 1 GPU for vLLM + 3 GPUs for FSDP: the trainer container should request nvidia.com/gpu: 3 and use --num_processes 3.
If the intent is 1 GPU for vLLM + 1 GPU for training: the comments and documentation should be updated to match.

3. Add upper bounds to dependency versions in `requirements.txt`

The current dependencies use open-ended >= bounds (e.g., transformers>=4.48.0). Given that the Dockerfile already patches TRL for transformers 5.x compatibility issues, unbounded dependencies are a real reproducibility risk.

Please add upper bounds, for example:

transformers>=4.48.0,<6.0
datasets>=2.17.0,<4.0
accelerate>=0.27.0,<2.0
peft>=0.9.0,<1.0
deepspeed>=0.14.0,<1.0

4. Missing copyright header in `Dockerfile`

The Dockerfile is the only file in this PR missing the standard copyright header. Please add the following two lines at the very top of the file (before the FROM line):

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

All other files in the PR already have this header.

5. Double-check `env_vars.example` for sensitive data

Please review the env_vars.example file and confirm it does not contain any real sensitive data such as AWS account IDs, FSx filesystem IDs, credentials, or tokens. The current placeholders look correct (e.g., <your-huggingface-token>, <your-hyperpod-cluster-name>), but please verify before merging.

6. `fsx-storage-manager` pod reference in README (Non-blocking)

The README references a pod called fsx-storage-manager in two places:

Checking checkpoints:

kubectl exec fsx-storage-manager -- ls -la /fsx/checkpoints/wordle-grpo/

Cleanup section:

kubectl delete pod grpo-wordle grpo-wordle-multigpu inference-wordle fsx-storage-manager 2>/dev/null

However, no manifest in this PR creates a fsx-storage-manager pod. Users following the README will get a "pod not found" error on these commands.

Could you clarify the intent here?

If fsx-storage-manager is a pod users are expected to have in their cluster already, please add a note explaining that.
If it was included by mistake, please remove it from the README commands and use the grpo-wordle pod instead (or another existing pod that has the FSx volume mounted).
If there should be a manifest for it, please add one.

Adding OpenEnv Wordle GRPO TRL sample

263de41

KeitaW reviewed Apr 15, 2026

View reviewed changes

KeitaW requested changes Apr 15, 2026

View reviewed changes

KeitaW reviewed Apr 15, 2026

View reviewed changes

paragao requested changes Apr 15, 2026

View reviewed changes

		@@ -0,0 +1,60 @@
		FROM public.ecr.aws/hpc-cloud/nccl-tests:latest

-        - --host
-        - "0.0.0.0"
-        - --port
+        requests:
+          nvidia.com/gpu: 3
+          memory: "60Gi"

-          memory: "180Gi"
-      env:
-      - name: PYTORCH_CUDA_ALLOC_CONF
+    - name: shmem
+      emptyDir:
+        medium: Memory
+        sizeLimit: 16Gi


	kubectl exec grpo-wordle -- ls -la /fsx/checkpoints/wordle-grpo/


		## YOUR GOAL

		Solve the Wordle in as few guesses as possible by strategically using feedback to eliminate impossible words and narrow down the solution space efficiently. No newline at end of file


		# Flash Attention (requires torch, must install separately with --no-build-isolation)
		RUN pip install flash-attn>=2.5.0 --no-build-isolation

Conversation

allela-roy commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Test Plan

Test Results

Directory Structure

Checklist

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review 1/5 — Structure & Repository Hygiene

Uh oh!

Choose a reason for hiding this comment

Base image uses :latest tag

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Environment image uses :latest

Uh oh!

Choose a reason for hiding this comment

Init container image uses :latest

Uh oh!

Choose a reason for hiding this comment

Default TAG should not be latest

Uh oh!

KeitaW Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Inline TRL patches are fragile across versions

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review 2/5 — Deployment Pipeline

Uh oh!

Choose a reason for hiding this comment

Trainer container requests 1 GPU but claims to use GPUs 1-3

Uh oh!

Choose a reason for hiding this comment

$HF_TOKEN exposed as plain-text env var

Uh oh!

Choose a reason for hiding this comment

Missing copyright header

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review 3/5 — Infrastructure & NCCL Configuration

Uh oh!

Choose a reason for hiding this comment

/dev/shm mounted as hostPath instead of emptyDir with Memory medium

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review 4/5 — Documentation Consistency & Training Code Quality

Uh oh!

Choose a reason for hiding this comment

README references fsx-storage-manager pod that isn't created by the PR

Uh oh!

Choose a reason for hiding this comment

Missing newline at end of file

Uh oh!

Choose a reason for hiding this comment

Dependencies use open-ended lower bounds without upper pins

Uh oh!

Choose a reason for hiding this comment

flash-attn has no upper bound pin

Uh oh!

KeitaW left a comment

Choose a reason for hiding this comment

Review 5/5 — Things That Look Great

Cross-cutting note: envsubst variable whitelist

Uh oh!

paragao left a comment

allela-roy commented Apr 14, 2026 •

edited

Loading

Base image uses `:latest` tag

Environment image uses `:latest`

Init container image uses `:latest`

Default TAG should not be `latest`

KeitaW Apr 15, 2026 •

edited

Loading

`$HF_TOKEN` exposed as plain-text env var

`/dev/shm` mounted as hostPath instead of emptyDir with Memory medium

README references `fsx-storage-manager` pod that isn't created by the PR

`flash-attn` has no upper bound pin

Cross-cutting note: `envsubst` variable whitelist

3. Add upper bounds to dependency versions in `requirements.txt`

4. Missing copyright header in `Dockerfile`

5. Double-check `env_vars.example` for sensitive data

6. `fsx-storage-manager` pod reference in README (Non-blocking)