🍎 Fruits Catcher GRPO Training - Command Line Arguments

🚀 Quick Start

Default Training

python main.py

Custom Training Examples

🎯 Quick Test Training (1 epoch)

python main.py --total-epochs 1 --batch-size 4

⚡ Fast Training with Compilation

python main.py --compile --total-epochs 1000 --lr-rate 2e-4

🎮 Custom Game Configuration

python main.py \
  --screen-width 25 \
  --screen-height 20 \
  --max-fruits 5 \
  --win-score 50 \
  --fail-score -50

🧠 Large Model Training

python main.py \
  --hidden-size 4096 \
  --batch-size 64 \
  --total-epochs 3000 \
  --lr-rate 5e-5 \
  --max-steps 150 \
  --patience 800

� Early Stopping Control

# Quick testing with early stopping after 50 epochs
python main.py --total-epochs 1000 --patience 50

# Conservative training with longer patience
python main.py --total-epochs 5000 --patience 500

# Aggressive early stopping for quick experiments
python main.py --total-epochs 2000 --patience 100

� CPU Training

python main.py --device cpu --batch-size 8 --total-epochs 500

�📂 Custom Model Name

python main.py --model-name my_custom_model --total-epochs 1500

📋 All Available Arguments

🎮 Game Configuration

--screen-width - Game screen width (default: 20)
--screen-height - Game screen height (default: 15)
--sprite-width - AI sprite width (default: 3)
--sprite-height - AI sprite height (default: 1)
--max-fruits - Maximum fruits on screen (default: 3)
--min-fruits - Minimum fruits on screen (default: 1)
--min-interval-steps - Minimum steps between fruit spawns (default: 4)
--view-height-multiplier - View height scaling factor (default: 50.0)
--view-width-multiplier - View width scaling factor (default: 50.0)
--refresh-timer - Game refresh timer in ms (default: 150)
--fail-score - Score threshold for game failure (default: -30)
--win-score - Score threshold for game victory (default: 30)

🧠 Training Configuration

--hidden-size - Neural network hidden layer size (default: 2048)
--batch-size - Training batch size (default: 32)
--total-epochs - Total training epochs (default: 2000)
--max-steps - Maximum steps per episode (default: 100)
--lr-rate - Learning rate (default: 1e-4)
--patience - Early stopping patience in epochs (default: 500)
--compile - Enable torch.compile for faster training
--no-compile - Disable torch.compile (default)

💾 Output Configuration

--model-name - Model save name (default: grpo_fruits_catcher)
--device - Training device: auto, cpu, cuda, cuda:0, cuda:1 (default: auto)

💡 Training Tips

🎯 For Quick Testing

Use --total-epochs 1-10 for quick validation
Use --batch-size 2-4 for faster iterations

🏆 For Best Performance

Use --compile for faster training (PyTorch 2.0+)
Use --hidden-size 1024 or higher for complex games
Use --batch-size 32 or higher if you have enough GPU memory

🎮 For Custom Games

Increase --win-score and decrease --fail-score for longer episodes
Increase --max-fruits for more challenging gameplay
Adjust --max-steps based on your game difficulty

🛑 Early Stopping Guide

The --patience parameter controls when training stops if no improvement is seen:

--patience 100: Stops if no improvement for 100 epochs (quick experiments)
--patience 300: Good for medium-length training sessions
--patience 500: Default value, good balance between efficiency and thoroughness
--patience 1000: Very patient, suitable for complex models/games

When to adjust patience:

Short patience (50-100): Testing, debugging, quick experiments
Medium patience (200-400): Normal training, most use cases
Long patience (500+): Complex games, large models, research

📊 Example Training Configurations

Beginner (Fast Training)

python main.py --total-epochs 500 --batch-size 8 --hidden-size 512 --patience 100

Intermediate (Balanced)

python main.py --total-epochs 1500 --batch-size 16 --hidden-size 1024 --compile --patience 300

Advanced (High Performance)

python main.py --total-epochs 3000 --batch-size 32 --hidden-size 2048 --compile --lr-rate 5e-5 --patience 500

Research (Long Training)

python main.py --total-epochs 5000 --batch-size 64 --hidden-size 4096 --max-steps 200 --compile --patience 1000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🍎 Fruits Catcher GRPO Training - Command Line Arguments

🚀 Quick Start

Default Training

Custom Training Examples

🎯 Quick Test Training (1 epoch)

⚡ Fast Training with Compilation

🎮 Custom Game Configuration

🧠 Large Model Training

� Early Stopping Control

� CPU Training

�📂 Custom Model Name

📋 All Available Arguments

🎮 Game Configuration

🧠 Training Configuration

💾 Output Configuration

💡 Training Tips

🎯 For Quick Testing

🏆 For Best Performance

🎮 For Custom Games

🛑 Early Stopping Guide

📊 Example Training Configurations

Beginner (Fast Training)

Intermediate (Balanced)

Advanced (High Performance)

Research (Long Training)

FilesExpand file tree

TRAINING_ARGS.md

Latest commit

History

TRAINING_ARGS.md

File metadata and controls

🍎 Fruits Catcher GRPO Training - Command Line Arguments

🚀 Quick Start

Default Training

Custom Training Examples

🎯 Quick Test Training (1 epoch)

⚡ Fast Training with Compilation

🎮 Custom Game Configuration

🧠 Large Model Training

� Early Stopping Control

� CPU Training

�📂 Custom Model Name

📋 All Available Arguments

🎮 Game Configuration

🧠 Training Configuration

💾 Output Configuration

💡 Training Tips

🎯 For Quick Testing

🏆 For Best Performance

🎮 For Custom Games

🛑 Early Stopping Guide

📊 Example Training Configurations

Beginner (Fast Training)

Intermediate (Balanced)

Advanced (High Performance)

Research (Long Training)