Reinforcement Learning for Large Language Models

Making the Hard Math Easy — Companion Repository

Book: [https://github.com/PacktPublishing/Reinforcement-Learning-for-LLMs]
Author: [Arun Shankar & Michael Chertushkin]

Click any badge to open a notebook in Google Colab — no installation needed.

Part 1 — Foundations

#	Chapter	Colab
1	Essential Math Toolkit
2	Why LLMs Need RL: The Alignment Gap
3	RL Fundamentals: The Complete Picture
4	Setting Up Your Free Environment

Part 2 — Core Methods

#	Chapter	Colab
5	Supervised Fine-Tuning & The Cold Start
6	RLHF: The Three-Step Dance
7	Direct Preference Optimization
8	Online DPO & Iterative Alignment
9	Reward Modeling & The Critic
10	Modern RL Algorithms: GRPO, RLOO, KTO

Part 3 — Advanced Techniques

#	Chapter	Colab
11	Verifiers & Outcome Rewards
12a	Reasoning with GRPO — Concepts
12b	Reasoning with GRPO — The DeepSeek Recipe
13	Test-Time Compute: Scaling at Inference
14	Self-Play & Constitutional AI
15	Multi-Objective RL & Agentic Systems
16	Domain-Specific RL: Code, Math, Tools

Part 4 — Production & Recipes

#	Chapter	Colab
17	Recipe: Chatbot
18	Recipe: Reasoner
19	Recipe: Agent

Repository Structure

rl-made-easy-code/
├── notebooks/
│   ├── part1_foundations/     # Chapters 1–4
│   ├── part2_core/            # Chapters 5–10
│   ├── part3_advanced/        # Chapters 11–16
│   └── part4_recipes/         # Chapters 17–19
├── utils/
│   ├── data.py                # Dataset loaders
│   ├── eval.py                # Win rate, KL divergence helpers
│   └── viz.py                 # Training curve plots
├── data/samples/              # Toy datasets — notebooks run offline
│   ├── preferences.jsonl
│   └── prompts.jsonl
└── requirements.txt           # Installed automatically by each notebook

Hardware

All notebooks target the free Colab T4 GPU. Runtime → Change runtime type → T4 GPU.

Part	Estimated runtime on T4
Part 1 — Foundations	< 5 min (CPU)
Part 2 — Core methods	10–30 min
Part 3 — Advanced	20–60 min
Part 4 — Recipes	30–90 min

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/samples		data/samples
notebooks		notebooks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gitignore		gitignore
pre-commit-config.yaml		pre-commit-config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Large Language Models

Making the Hard Math Easy — Companion Repository

Part 1 — Foundations

Part 2 — Core Methods

Part 3 — Advanced Techniques

Part 4 — Production & Recipes

Repository Structure

Hardware

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Large Language Models

Making the Hard Math Easy — Companion Repository

Part 1 — Foundations

Part 2 — Core Methods

Part 3 — Advanced Techniques

Part 4 — Production & Recipes

Repository Structure

Hardware

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages