Skip to content

misha-chertushkin/rl-made-easy-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning for Large Language Models

Making the Hard Math Easy — Companion Repository

Book: [https://github.com/PacktPublishing/Reinforcement-Learning-for-LLMs]
Author: [Arun Shankar & Michael Chertushkin]

Click any badge to open a notebook in Google Colab — no installation needed.


Part 1 — Foundations

# Chapter Colab
1 Essential Math Toolkit Open In Colab
2 Why LLMs Need RL: The Alignment Gap Open In Colab
3 RL Fundamentals: The Complete Picture Open In Colab
4 Setting Up Your Free Environment Open In Colab

Part 2 — Core Methods

# Chapter Colab
5 Supervised Fine-Tuning & The Cold Start Open In Colab
6 RLHF: The Three-Step Dance Open In Colab
7 Direct Preference Optimization Open In Colab
8 Online DPO & Iterative Alignment Open In Colab
9 Reward Modeling & The Critic Open In Colab
10 Modern RL Algorithms: GRPO, RLOO, KTO Open In Colab

Part 3 — Advanced Techniques

# Chapter Colab
11 Verifiers & Outcome Rewards Open In Colab
12a Reasoning with GRPO — Concepts Open In Colab
12b Reasoning with GRPO — The DeepSeek Recipe Open In Colab
13 Test-Time Compute: Scaling at Inference Open In Colab
14 Self-Play & Constitutional AI Open In Colab
15 Multi-Objective RL & Agentic Systems Open In Colab
16 Domain-Specific RL: Code, Math, Tools Open In Colab

Part 4 — Production & Recipes

# Chapter Colab
17 Recipe: Chatbot Open In Colab
18 Recipe: Reasoner Open In Colab
19 Recipe: Agent Open In Colab

Repository Structure

rl-made-easy-code/
├── notebooks/
│   ├── part1_foundations/     # Chapters 1–4
│   ├── part2_core/            # Chapters 5–10
│   ├── part3_advanced/        # Chapters 11–16
│   └── part4_recipes/         # Chapters 17–19
├── utils/
│   ├── data.py                # Dataset loaders
│   ├── eval.py                # Win rate, KL divergence helpers
│   └── viz.py                 # Training curve plots
├── data/samples/              # Toy datasets — notebooks run offline
│   ├── preferences.jsonl
│   └── prompts.jsonl
└── requirements.txt           # Installed automatically by each notebook

Hardware

All notebooks target the free Colab T4 GPU. Runtime → Change runtime type → T4 GPU.

Part Estimated runtime on T4
Part 1 — Foundations < 5 min (CPU)
Part 2 — Core methods 10–30 min
Part 3 — Advanced 20–60 min
Part 4 — Recipes 30–90 min

License

Code: Apache 2.0 · Text & figures: © [Arun Shankar & Michael Chertushkin], All Rights Reserved

About

This repository provides the code samples for the book "RL for LLM Made Easy" by Arun Shankar and Michael Chertushkin

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors