rlBigWorld — PALR: Plasticity-Aware Learning Rates for Continual RL

PALR combats catastrophic plasticity loss in continual reinforcement learning by measuring per-layer plasticity (dead-neuron fraction + effective rank) and responding with targeted learning-rate scaling and surgical neuron perturbation. This repository contains experiments across three continual-RL domains:

Domain	Algorithm	Environment
Continual CartPole	DQN + PALR	4 physics phases
ContinualWorld-10/20	SAC + PALR	10–20 MetaWorld MT10 tasks
JellyBeanWorld	DQN + PALR	Visual agent, reward-sign flips
Habitat Fetch Rearrangement	DD-PPO + PALR	4-phase robot curriculum

Repository Layout

rlBigWorld/
├── palr_plasticity_aware_lr/   # CW10/CW20, JBW, CartPole experiments
│   ├── src/
│   │   ├── palr_agent.py           # PALR-DQN agent
│   │   ├── palr_sac_agent.py       # PALR-SAC agent (CW experiments)
│   │   ├── sac_base.py             # Base SAC (actor + twin critics + auto-α)
│   │   ├── dqn_base.py             # Base DQN
│   │   ├── cw_env.py               # ContinualWorld wrapper (10/20 tasks)
│   │   ├── jbw_env.py              # JellyBeanWorld wrapper
│   │   ├── cw_baselines.py         # SAC baselines (L2Reg, S&P, PeriodicReset)
│   │   ├── baselines.py            # DQN baselines
│   │   ├── plasticity_metrics.py   # dead_neuron_fraction, effective_rank, etc.
│   │   ├── replay_buffer.py        # Continuous replay buffer
│   │   ├── run_cw_experiments.py   # Main CW10/CW20 launcher
│   │   ├── run_jbw_experiments.py  # JBW launcher
│   │   ├── run_experiments.py      # CartPole launcher
│   │   ├── merge_cw_checkpoints.py # Aggregate per-seed results → JSON
│   │   └── plot_*.py               # Plotting scripts
│   └── setup_jbw.sh                # JBW install + smoke test
│
├── palr_habitat/               # Vision-based Fetch Rearrangement (DD-PPO)
│   ├── src/
│   │   ├── palr_trainer.py         # DD-PPO training loop (torchrun, multi-GPU)
│   │   ├── palr_fetch_policy.py    # ResNet-18 + GRU actor-critic
│   │   ├── palr_resnet_encoder.py  # PALR hooks on conv layers
│   │   ├── fetch_curriculum.py     # 4-phase task curriculum manager
│   │   └── plasticity_metrics_cnn.py  # dead_filter_fraction for conv layers
│   └── configs/
│       ├── ddppo_palr_fetch.yaml       # Full PALR config
│       ├── ddppo_baseline_fetch.yaml   # Baseline (no PALR)
│       ├── ddppo_palr_lr_only_fetch.yaml    # Ablation: LR scaling only
│       ├── ddppo_palr_perturb_only_fetch.yaml # Ablation: perturbation only
│       └── rearrange_base.yaml         # Sensors, task registration
│
├── palr_container.def          # Singularity/Apptainer build definition
├── palr_container.sif          # Pre-built image (6.8 GB)
├── run_cw20.sh                 # Production CW20 launcher (Singularity)
├── run.sh                      # Simple local launcher (.venv + GPU paths)
├── run_interactive.sh          # Habitat single-seed training
├── run_interactive_ablation.sh # Habitat ablation sweep
└── run_on_HPC.sh               # SLURM + Singularity HPC launcher

Container

The Singularity image packages two conda environments to work around a hard dependency conflict: habitat-sim 0.3.2 requires Python 3.9, while mujoco ≥ 3.0.0 (required by MetaWorld 3.0.0) ships Python 3.10+ wheels only.

Conda env	Python	Key packages	Used for
`palr_plasticity`	3.10 (default)	PyTorch 2.3.1+cu121, mujoco 3.7.0, metaworld 3.0.0, gymnasium, tensorflow-cpu 2.15.0, jbw	CW10/CW20, JBW, CartPole
`palr_habitat`	3.9	PyTorch 2.3.1+cu121, habitat-sim 0.3.2 (EGL+Bullet), habitat-lab/baselines 0.3.220241205	Fetch Rearrangement

Build

# Rootless (preferred — no sudo required):
apptainer build palr_container.sif palr_container.def

# With fakeroot (Singularity ≥ 3.5):
singularity build --fakeroot palr_container.sif palr_container.def

Build takes ~30–60 minutes. No GPU required on the build host.

Runtime — switching environments

The container's default PATH points to palr_plasticity (Python 3.10). To use the habitat env, invoke binaries by full path:

# Default (palr_plasticity):
singularity exec --nv palr_container.sif python script.py

# Habitat env (palr_habitat):
singularity exec --nv palr_container.sif \
    /opt/miniforge3/envs/palr_habitat/bin/python script.py

Quick Start

ContinualWorld-20 (recommended entry point)

# Full sweep — 5 agents × 3 seeds, auto-resumes from checkpoints:
bash run_cw20.sh

# Single agent/seed for debugging:
bash run_cw20.sh --single

# Merge results only (after all runs complete):
bash run_cw20.sh --merge

See the CW20 in Singularity section for full details.

Running directly (without the launcher)

# Inside the container, or via run.sh for local .venv:
cd palr_plasticity_aware_lr/src

# CW20 full sweep:
python run_cw_experiments.py --cw20 --episodes_per_task 200 --seeds 3

# Single agent (e.g. PALR, seed 0):
python run_cw_experiments.py --cw20 --agent_idx 4 --seeds 1 --seed_offset 0

# JBW:
python run_jbw_experiments.py --episodes 400 --seeds 3

# CartPole:
python run_experiments.py

Habitat Fetch Rearrangement

# Multi-GPU training (3 GPUs):
bash run_interactive.sh

# Or manually:
singularity exec --nv \
    --bind /home/hzhang/work/rlBigWorld:/workspace \
    palr_container.sif \
    bash -c "cd /workspace && \
        /opt/miniforge3/envs/palr_habitat/bin/torchrun \
            --nproc_per_node=1 \
            palr_habitat/src/palr_trainer.py \
            --config palr_habitat/configs/ddppo_palr_fetch.yaml \
            --seed 0 --outdir results/palr_seed0"

Running CW20 Experiments in Singularity

run_cw20.sh is the production launcher. It orchestrates all agents and seeds inside palr_container.sif, handles parallelism, logs each run, and auto-resumes from existing checkpoints.

Prerequisites

# The .sif must exist before running:
ls palr_container.sif          # 6.8 GB pre-built image
# If missing, build it:
apptainer build palr_container.sif palr_container.def

The script auto-detects apptainer vs singularity and uses whichever is available.

Modes

# 1. Full sweep — all agents × all seeds (default: 5 agents × 3 seeds = 15 runs):
bash run_cw20.sh

# 2. Single run — one specific (seed, agent) pair:
SEED=2 AGENT=4 bash run_cw20.sh --single

# 3. Merge only — aggregate existing checkpoints without running anything new:
bash run_cw20.sh --merge

Environment Variable Reference

All knobs can be overridden by prefixing the command, e.g. EPISODES=400 SEEDS="0 1" bash run_cw20.sh.

Variable	Default	Description
`SIF`	`<proj_dir>/palr_container.sif`	Path to the Singularity image
`PROJ_DIR`	directory of `run_cw20.sh`	Project root (auto-detected)
`EPISODES`	`200`	Episodes per task (×20 tasks = 4 000 total per agent)
`BATCH_SIZE`	`2048`	SAC replay-buffer batch size
`SEEDS`	`"0 1 2"`	Space-separated seed indices to run
`AGENTS`	`"0 3 4 5 6"`	Space-separated agent indices to run
`PARALLEL`	`5`	How many runs to launch concurrently before waiting
`CUDA_VISIBLE_DEVICES`	`0`	GPU to use (passed inside the container)

Agent Index Reference

Index	Agent	Description
`0`	SAC-FixedLR	Vanilla SAC, no adaptation — primary baseline
`3`	SAC-L2Reg	L2 weight regularisation on the critic loss
`4`	PALR-SAC (ours)	Per-layer LR scaling + targeted neuron perturbation
`5`	PALR-NoPerturb	Ablation: LR scaling only
`6`	PALR-NoScale	Ablation: perturbation only

Indices 1 (ShrinkAndPerturb) and 2 (PeriodicReset) exist in the code but are excluded from the default AGENTS sweep.

What Runs Inside the Container

Each launch_one seed agent call executes:

apptainer exec --nv \
    --bind <PROJ_DIR>:/workspace \
    palr_container.sif \
    bash -c "
        cd /workspace
        export CUDA_VISIBLE_DEVICES=0
        export MUJOCO_GL=egl
        export PYOPENGL_PLATFORM=egl
        export EGL_PLATFORM=surfaceless
        export TF_FORCE_GPU_ALLOW_GROWTH=true
        export OMP_NUM_THREADS=4
        unset DISPLAY
        python palr_plasticity_aware_lr/src/run_cw_experiments.py \
            --cw20 \
            --episodes_per_task 200 \
            --seeds 1 \
            --batch_size 2048 \
            --seed_offset <seed> \
            --agent_idx <agent> \
            --ckpt_suffix _cw20_seed<seed>_agent<agent>
    "

--nv injects the host NVIDIA driver and EGL ICD into the container. The project root is bind-mounted at /workspace so results written there persist on the host.

Parallelism and Scheduling

The launcher batches runs in groups of PARALLEL (default 5 — one batch per seed, all agents in parallel). After each batch it polls pgrep -f run_cw_experiments.py every 60 seconds and waits for all processes to exit before starting the next batch.

seed=0: agents [0, 3, 4, 5, 6] launched in parallel ──► wait ──►
seed=1: agents [0, 3, 4, 5, 6] launched in parallel ──► wait ──►
seed=2: agents [0, 3, 4, 5, 6] launched in parallel ──► wait ──►
merge checkpoints

To run seeds in parallel across multiple GPUs, launch the script multiple times with different SEEDS and CUDA_VISIBLE_DEVICES:

SEEDS="0"   CUDA_VISIBLE_DEVICES=0 bash run_cw20.sh &
SEEDS="1"   CUDA_VISIBLE_DEVICES=1 bash run_cw20.sh &
SEEDS="2"   CUDA_VISIBLE_DEVICES=2 bash run_cw20.sh &
wait
bash run_cw20.sh --merge

Resume Behaviour

run_cw_experiments.py auto-resumes from any existing checkpoint:

palr_plasticity_aware_lr/results/cw_checkpoint_cw20_seed<S>_agent<A>.json

If a checkpoint exists for a given (seed, agent) pair, that run is skipped entirely. This means you can safely re-run bash run_cw20.sh after a crash or interruption — only missing runs will execute.

Logs

Each run writes to its own log file:

palr_plasticity_aware_lr/results/logs/cw20_seed<S>_agent<A>.log

To tail a running experiment:

tail -f palr_plasticity_aware_lr/results/logs/cw20_seed0_agent4.log

Post-Run: Merge and Plot

After all runs complete, the launcher auto-merges checkpoints. You can also do this manually:

# Merge all cw20 checkpoints → cw20_raw_results.json:
bash run_cw20.sh --merge

# Or directly inside the container:
apptainer exec --nv --bind $(pwd):/workspace palr_container.sif \
    bash -c "cd /workspace && python \
        palr_plasticity_aware_lr/src/merge_cw_checkpoints.py --suffix cw20"

# Generate plots:
apptainer exec --nv --bind $(pwd):/workspace palr_container.sif \
    bash -c "cd /workspace/palr_plasticity_aware_lr/src && \
        python plot_cw_results.py"

Outputs:

palr_plasticity_aware_lr/results/cw20_raw_results.json   # merged data
palr_plasticity_aware_lr/plots/                          # figures

Algorithms

PALR — Plasticity-Aware Learning Rates

Every measure_freq training steps, PALR diagnoses each network layer on a held-out diagnostic batch and adapts in two ways:

1. Per-layer learning-rate scaling

For each layer i, two plasticity metrics are computed:

dead_i — fraction of ReLU units with zero activation (range [0, 1])
rank_i — effective rank of the activation matrix (proxy for feature diversity)

A plasticity deficit is formed and used to scale that layer's gradient:

scale_i = 1 + (max_lr_scale − 1) · clip(deficit_i / (1 + rank_beta), 0, 1)

Layers with high plasticity loss receive a proportionally larger learning rate.

2. Targeted neuron perturbation

When dead_i > perturb_threshold, PALR re-initialises only the dead neurons from the He initialisation distribution. Healthy neurons are untouched, unlike shrink-and-perturb which perturbs the entire weight matrix.

Best Hyperparameters (CW10, PALR-SAC)

Parameter	Value
`beta` (LR scale sensitivity)	`0.0`
`sigma` (perturbation noise std)	`0.5`
`perturb_threshold`	`0.05`
`rank_beta`	`1.5`

Agent Index Reference

`--agent_idx`	Agent	Description
0	SAC-FixedLR	Vanilla SAC, no adaptation
1	SAC-L2Reg	L2 weight regularisation on critic loss
2	SAC-ShrinkAndPerturb	Periodic weight shrink + Gaussian noise
3	SAC-PeriodicReset	Full network reset every K episodes
4	PALR (ours)	LR scaling + targeted perturbation
5	PALR-NoPerturb	Ablation: LR scaling only
6	PALR-NoScale	Ablation: perturbation only

Environments

ContinualWorld-10 / CW20

Wraps MetaWorld MT10: 10 tasks (reach, push, pick-place, door-open, drawer-close, …) executed sequentially. Task switches are silent — the agent receives no signal, only a change in reward function.

State: 39-dimensional
Actions: 4-dimensional continuous
CW20: each of the 10 tasks appears twice (shuffled order)

JellyBeanWorld (JBW)

A 2D grid world with partial observability (11×11 visual patch, 4 discrete actions). Reward sign flips every phase_episodes episodes without notification — the agent must discover that what was previously good is now bad.

Habitat Fetch Rearrangement

4-phase curriculum (50M steps each):

Phase	Task	Description
1	RearrangePick (apple)	Pick up apple from random position
2	RearrangePick (bowl)	Pick up bowl
3	RearrangeOpen (fridge)	Open fridge door
4	RearrangePlace (sink)	Place object in sink

Observations: RGB + Depth (128×128), joint angles (7), is_holding (1), object/goal GPS (6). Uses EGL headless rendering.

Metrics

Metric	Measures
Recovery speed	Episodes to reach performance threshold after a task switch — primary continual learning metric
Final performance	Mean reward in the final episode window of each task
Dead neuron fraction	Fraction of ReLU units with zero activation; lower → more plastic
Effective rank	Entropy-based rank of activation matrix; higher → more diverse features

Results Structure

palr_plasticity_aware_lr/
├── results/
│   ├── cw_checkpoint_cw20_seed0_agent4.json   # Per-seed, per-agent checkpoints
│   ├── cw_checkpoint_cw20_seed1_agent4.json
│   └── ...
├── cw20_raw_results.json                       # Merged results (after --merge)
└── RESULTS_SUMMARY.txt                         # Human-readable benchmark summary

palr_habitat/
└── results/
    └── palr_seed0/
        ├── checkpoints/                        # Model checkpoints
        ├── tb/                                 # TensorBoard logs
        └── curriculum_events.json              # Task switch timestamps

After all runs complete, merge and plot:

cd palr_plasticity_aware_lr/src
python merge_cw_checkpoints.py --suffix cw20
python plot_cw_results.py

Local Development (without container)

# Activate the .venv (Python 3.10, CUDA 13.0):
./run.sh python palr_plasticity_aware_lr/src/run_cw_experiments.py [args]

# Or manually:
source .venv/bin/activate
export MUJOCO_GL=egl
export TF_FORCE_GPU_ALLOW_GROWTH=true
cd palr_plasticity_aware_lr/src && python run_cw_experiments.py

The .venv does not include habitat-sim (Python 3.9 only). Use the container for Habitat experiments.

HPC / SLURM

# Submit all seeds in parallel via SLURM:
bash run_on_HPC.sh

# The script builds habitat_v3.sif if not present, then submits
# seeds 0/1/2 in parallel with --sweep flag.

Dependencies Summary

Package	`palr_plasticity` (py3.10)	`palr_habitat` (py3.9)
PyTorch	2.3.1+cu121	2.3.1+cu121
habitat-sim	—	0.3.2 (EGL + Bullet)
habitat-lab	—	0.3.220241205
mujoco	3.7.0	—
metaworld	3.0.0	—
jelly-bean-world	1.0	—
tensorflow-cpu	2.15.0	—
gymnasium	1.3.0	—
gym	0.26.2	0.22.0
wandb	0.15.12	0.15.12
numpy	<2	<2

troubleshooting

If you encounter ImportError: libcudart.so.12: cannot open shared object file, ensure that the NVIDIA driver on the host is compatible with CUDA 12.1 (driver version ≥ 535.54.03). Update your driver if necessary.
For EGL errors, ensure that the container has access to the host's GPU and that the correct environment variables (MUJOCO_GL=egl, PYOPENGL_PLATFORM=egl, EGL_PLATFORM=surfaceless) are set.
If you see [Error]:[Metadata] SceneDatasetAttributesManager.cpp(305)::validateMap : navmesh_instances Key : v3_sc4_staging_17 Value : navmeshes/v3_sc4_staging_17.navmesh not found on disk as absolute path or relative to data/replica_cad, This error is benign and does not affect your training, as Habitat validates the entire dataset config at startup, even for scenes you're not using. Your config (rearrange_base.yaml) runs on v3_sc1_staging_00, so v3_sc4_staging_17 is never actually loaded. The rearrangement/fetch task doesn't rely on navmesh at runtime. Navmeshes matter for PointNav/ObjectNav (geodesic distance, SPL metrics). For robot arm rearrangement, Habitat uses physics-based collision and kinematic control — navmesh is only needed if you compute shortest-path metrics, which your trainer doesn't do.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
maniskill_vit		maniskill_vit
palr_habitat		palr_habitat
palr_plasticity_aware_lr		palr_plasticity_aware_lr
.gitignore		.gitignore
README.md		README.md
habitat_v3.def		habitat_v3.def
maniskill_vit_HPC.sh		maniskill_vit_HPC.sh
maniskill_vit_HPC_1.sh		maniskill_vit_HPC_1.sh
maniskill_vit_HPC_2.sh		maniskill_vit_HPC_2.sh
maniskill_vit_HPC_3.sh		maniskill_vit_HPC_3.sh
maniskill_vit_HPC_4.sh		maniskill_vit_HPC_4.sh
maniskill_vit_HPC_5.sh		maniskill_vit_HPC_5.sh
maniskill_vit_HPC_6.sh		maniskill_vit_HPC_6.sh
maniskill_vit_HPC_array.sh		maniskill_vit_HPC_array.sh
maniskill_vit_pbs.sh		maniskill_vit_pbs.sh
maniskill_vit_resume_pbs.sh		maniskill_vit_resume_pbs.sh
nvidia_egl_vendor.json		nvidia_egl_vendor.json
palr_container.def		palr_container.def
run.sh		run.sh
run_cw20.sh		run_cw20.sh
run_interactive.sh		run_interactive.sh
run_interactive_ablation.sh		run_interactive_ablation.sh
run_maniskill.sh		run_maniskill.sh
run_maniskill_pbs.sh		run_maniskill_pbs.sh
run_on_HPC.sh		run_on_HPC.sh

Folders and files

Latest commit

History

Repository files navigation

rlBigWorld — PALR: Plasticity-Aware Learning Rates for Continual RL

Repository Layout

Container

Build

Runtime — switching environments

Quick Start

ContinualWorld-20 (recommended entry point)

Running directly (without the launcher)

Habitat Fetch Rearrangement

Running CW20 Experiments in Singularity

Prerequisites

Modes

Environment Variable Reference

Agent Index Reference

What Runs Inside the Container

Parallelism and Scheduling

Resume Behaviour

Logs

Post-Run: Merge and Plot

Algorithms

PALR — Plasticity-Aware Learning Rates

Best Hyperparameters (CW10, PALR-SAC)

Agent Index Reference

Environments

ContinualWorld-10 / CW20

JellyBeanWorld (JBW)

Habitat Fetch Rearrangement

Metrics

Results Structure

Local Development (without container)

HPC / SLURM

Dependencies Summary

troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages