A PyTorch implementation of Denoising Diffusion Probabilistic Models (DDPM) for high-quality image generation on CelebA and CelebA-HQ. The project reproduces the core architecture from “Denoising Diffusion Probabilistic Models” (Ho et al., 2020) and extends it with:
- support for both DDPM and DDIM sampling,
- UNet backbones with and without self-attention, and
- controlled experiments on the trade-off between sample quality, sampling speed, and model capacity.
Beyond being a faithful DDPM reimplementation, this repository is structured as a small research playground to study how architectural choices and sampling strategies impact diffusion model behaviour.
- Features
- Project Structure
- Main Results
- Individual Best Picks
- Low GPU CIFAR 64x64
- Installation
- Usage
- Architecture
- Docker Support
- Citation
- DDPM Implementation: Full implementation of the Denoising Diffusion Probabilistic Model
- DDIM Support: Fast deterministic sampling with Denoising Diffusion Implicit Models
- U-Net Architecture: Custom U-Net backbone with attention mechanisms for high-resolution image generation
- Flexible Training: Support for multiple training configurations including:
- Exponential Moving Average (EMA) for stable training
- Mixed precision training with automatic mixed precision
- Gradient accumulation for large batch training
- Learning rate warmup and scheduling
- CelebA Dataset: Built-in support for CelebA dataset with automatic download and preprocessing
- Checkpoint Management: Comprehensive checkpoint saving and resuming capabilities
- Visualization Tools: Utilities for generating sample grids and denoising strips
CelebA-HQ256 — Best Samples (95e)
Izq: DDPM (110e) • Der: DDIM (350 steps, 110e)
Denoising strip CelebHQ — DDPM (T → 0)
(999, 600, 300, 200, 100, 80, 40, 10, 5, 0)
CelebA-HQ256 — Individual Best Picks
Selected single images (click to open full size)
Top: DDPM • Bottom: DDIM
Low GPU — Best Samples (50e)
Izq: DDPM (50e) • Der: DDIM (50 steps, 50e)
Denoising strip Celeb64x64 — DDPM (T → 0)
(999, 600, 300, 200, 100, 80, 40, 10, 5, 0)
This project includes a set of controlled experiments designed to reveal how architectural and sampling choices affect diffusion model performance.
We trained:
- A vanilla UNet (no attention) on CelebA 64×64
- A UNet with self-attention on CelebA-HQ 256×256
Findings:
Attention greatly improves global coherence (eye alignment, symmetry, background consistency), especially at higher resolutions. Models without attention converge faster but produce less structured faces.
Both sampling schemes were implemented:
- DDPM → stochastic, visibly better samples in our CelebA-HQ256 runs
- DDIM → deterministic, roughly 5× faster sampling for a modest drop in visual quality
Finding:
DDPM produces sharper and more realistic faces at the cost of much higher compute. DDIM offers a practical speed–quality trade-off: slightly worse samples, but dramatically cheaper and faster sampling.
Comparing 64×64 vs. 256×256 reveals how model capacity interacts with data complexity:
- Low-res (CelebA 64×64): attention brings marginal gains
- High-res (CelebA-HQ 256×256): attention becomes essential for semantic consistency
This mirrors observations in modern diffusion papers and highlights the role of global context.
CelebA256 — More Analisis
DDIM comparison in diferent Epochs
| DDIM 50 steps (45e) | DDIM 50 steps (55e) |
|---|---|
![]() |
![]() |
Denoising strips
Low GPU — More Analisi
Inference — DDPM (30 vs 50 epochs)
| DDPM — 30 epochs | DDPM — 50 epochs |
|---|---|
![]() |
![]() |
Inference — DDPM vs DDIM (50 epochs)
| DDPM (50e) | DDIM 50 steps (50e) |
|---|---|
![]() |
![]() |
Denoising strips
DDPM — 1000 epochs — denoising de T → 0 (Showing 999,600,300,200,100,80,40,10,5,0 T) :
DDIM — 50 steps — denoising de T → 0:
.
├── src/
│ ├── model/
│ │ ├── difussion_class.py # Main Diffusion class with DDPM/DDIM logic
│ │ ├── difussion_utils.py # Beta schedules and utility functions
│ │ ├── unet_backbone.py # U-Net denoiser architecture
│ │ └── attention.py # Multi-head attention blocks
│ ├── data/
│ │ ├── load_data_from_torch.py # CelebA data loading utilities
│ │ ├── load_data_local.py # Local image dataset loader
│ │ └── subset_celebra.py # Dataset subset utilities
│ ├── training_loops/
│ │ ├── main_train_loop.py # Main training loop
│ │ ├── train_one_epoch.py # Single epoch training
│ │ ├── ema.py # Exponential Moving Average
│ │ ├── chekpoints.py # Checkpoint save/load utilities
│ │ ├── grad_scaler.py # Gradient scaler for mixed precision
│ │ └── training_utils.py # Training helper functions
│ └── testing/
│ ├── ddpm_inference.py # DDPM sampling functions
│ └── ddpim_inference.py # DDIM sampling functions
├── models/ # Trained model checkpoints
├── Samples_low_gpu/ # Generated samples from low-GPU training
├── Samples_attn_net/ # Generated samples with attention network
├── Inference Samples_low_gpu/ # Post training DDPM and DDIM samples
└── notebooks_showcase/ # Jupyter notebooks for demonstration
- Python 3.8+
- CUDA-capable GPU (recommended) or CPU
- PyTorch 1.12+ with CUDA support (optional, for GPU acceleration)
- Clone the repository:
git clone <repository-url>
cd "Difussion Model"- Install dependencies:
pip install -r requirements.txtAlternatively, you can use Docker to run the project in a containerized environment:
# Build the Docker image
docker build -t ddpm-model .
# Run the container
docker run --gpus all -it ddpm-modelTrain a DDPM model on the CelebA dataset:
import torch
from src.model.difussion_class import Diffusion
from src.model.unet_backbone import build_unet_64x64
from src.data.load_data_from_torch import get_celeba_loaders
from src.training_loops.main_train_loop import train_ddpm
from src.training_loops.ema import EMA
from src.training_loops.chekpoints import make_checkpoint_utils
# Setup
device = "cuda" if torch.cuda.is_available() else "cpu"
img_size = 64
batch_size = 128
# Load data
train_loader, val_loader, test_loader = get_celeba_loaders(
root="./data",
img_size=img_size,
batch_size=batch_size)
# Initialize model and diffusion
model = build_unet_64x64(
in_channels=3,
base_channels=128,
channel_mults=(1, 2, 2, 2),
num_res_blocks=2,
attn_resolutions={16, 8},
dropout=0.1).to(device)
diffusion = Diffusion(
T=1000,
schedule="linear",
beta_min=1e-4,
beta_max=2e-2,
img_size=img_size).to(device)
# Setup optimizer and EMA
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-4, weight_decay=1e-6)
ema = EMA(model.parameters(), decay=0.9999)
# Training
train_ddpm(
model=model,
diffusion=diffusion,
train_loader=train_loader,
optimizer=optimizer,
ema=ema,
device=device,
epochs=50,
base_lr=2e-4,
warmup_steps=1000,
sample_every=5,
sample_n=36,
img_size=img_size,
ckpt_dir="checkpoints",
run_name="ddpm_celeba64",
ckpt_utils=(save_ckpt, load_ckpt))Generate samples using a trained model:
from src.testing.ddpm_inference import ddpm_infer_sample
# Load checkpoint
checkpoint = torch.load("models/celeba64_ddpm_lowgpu_55.pt")
model.load_state_dict(checkpoint["model"])
# Generate samples
ddpm_infer_sample(
model=model,
diffusion=diffusion,
n=36,
img_size=64,
device=device,
out_path="samples.png")For faster, deterministic sampling:
from src.testing.ddpim_inference import ddim_infer_sample
ddim_infer_sample(
model=model,
diffusion=diffusion,
n=36,
img_size=64,
steps=50,
eta=0.0,
device=device,
out_path="samples_ddim.png")The model implements the forward and reverse diffusion processes:
- Forward Process: Gradually adds Gaussian noise to images over T timesteps
- Reverse Process: Learns to denoise images step by step to generate new samples
The denoising network uses a U-Net architecture with:
- Encoder-Decoder Structure: Captures multi-scale features
- Residual Blocks: Facilitates gradient flow and feature learning
- Attention Mechanisms: Applied at 16x16 and 8x8 resolutions for long-range dependencies
- Time Embeddings: Sinusoidal positional embeddings for timestep conditioning
The model is trained using the simplified objective:
L_simple = E[||ε - ε_θ(x_t, t)||²]
Where:
εis the true noise added to the imageε_θ(x_t, t)is the noise predicted by the modelx_tis the noisy image at timestept
The project includes a Dockerfile for easy containerization. The Docker image includes:
- Python 3.10
- PyTorch with CUDA support
- All required dependencies
- Pre-configured environment
# Build image
docker build -t ddpm-model .
# Run with GPU support
docker run --gpus all -v $(pwd):/workspace -it ddpm-model
# Run without GPU
docker run -v $(pwd):/workspace -it ddpm-modelIf you use this code in your research, please cite the original DDPM paper:
@article{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}
@inproceedings{song2021denoising,
title={Denoising Diffusion Implicit Models},
author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
booktitle={International Conference on Learning Representations (ICLR)},
year={2021},
url={https://openreview.net/forum?id=St1giarCHYp}
}This project is open source and available under the MIT License, we encourage to cite Ho et al if using this code.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Created as part of deep learning research and model development.
Note: Training diffusion models can be computationally intensive. For best results, use a GPU with at least 8GB of VRAM. The project includes low-GPU configurations for training on limited hardware. For reference, the CelebA-HQ was trained in a NVIDIA A100 40GB VRAM.







