A professional, production-ready implementation for fine-tuning Stable Diffusion 3.5 models using LoRA (Low-Rank Adaptation) adapters. This script provides comprehensive support for both transformer and text encoder LoRA training with advanced features for memory efficiency and distributed training.
🎯 Perfect for: Custom image generation, style transfer, domain adaptation, and specialized visual content creation with minimal computational overhead.
- 🚀 SD3.5 Support: Full compatibility with Stable Diffusion 3.5 Medium architecture
- 🔧 LoRA Training: Efficient fine-tuning using Low-Rank Adaptation for both transformer and text encoders
- ⚡ Mixed Precision: FP16/BF16 training support with automatic gradient scaling
- 💾 Memory Efficient: Gradient checkpointing and optimized memory usage
- 🔄 Distributed Training: Multi-GPU support via Accelerate framework
- 📊 Advanced Sampling: Custom timestep sampling with configurable weighting schemes
- ✅ Validation: Built-in validation pipeline with image generation during training
- 📈 Comprehensive Logging: TensorBoard and Weights & Biases integration
- 🛡️ Robust Error Handling: Professional error handling and recovery mechanisms
- 🔄 Resume Training: Checkpoint saving and resuming capabilities
# PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Core ML libraries
pip install accelerate>=0.25.0 transformers>=4.35.0 diffusers>=0.25.0
pip install peft>=0.7.0 datasets>=2.15.0
# Image processing and utilities
pip install pillow>=9.0.0 tqdm>=4.64.0# For experiment tracking and logging
pip install wandb tensorboard
# For advanced optimizers
pip install bitsandbytes>=0.41.0 # 8-bit AdamW
pip install prodigyopt>=1.0 # Prodigy optimizer
# For development
pip install black flake8 pytest- Minimum: 12GB VRAM GPU (RTX 3060 12GB, RTX 4070, etc.)
- Recommended: 16GB+ VRAM GPU (RTX 4080, RTX 4090, A100, etc.)
- For distributed training: Multiple GPUs with NVLink recommended
- Stable Diffusion 3.5 Medium model weights
- Can be loaded from HuggingFace Hub or local path
# Clone the repository
git clone https://github.com/seochan99/stable-diffusion-3.5-text2image-lora.git
cd stable-diffusion-3.5-text2image-lora
# Run setup script (installs dependencies and configures accelerate)
bash scripts/setup.shWe provide an example dataset structure in examples/dataset/. You can:
Option A: Use the example structure
# Add your images to examples/dataset/images/
# Update examples/dataset/metadata.jsonl with your captionsOption B: Create your own dataset
your_dataset/
├── metadata.jsonl
└── images/
├── image1.jpg
├── image2.jpg
└── ...Where metadata.jsonl contains:
{"image": "images/image1.jpg", "caption": "A beautiful landscape"}
{"image": "images/image2.jpg", "caption": "A portrait of a person"}Easy way (recommended for beginners):
# Basic training with good defaults
bash scripts/train_basic.sh
# Advanced training with all features
bash scripts/train_advanced.shManual way (for customization):
accelerate launch train_text_to_image_lora_sd35.py \
--pretrained_model_name_or_path "stabilityai/stable-diffusion-3.5-medium" \
--train_data_dir "./examples/dataset" \
--output_dir "./outputs/sd35-lora" \
--resolution 1024 \
--train_batch_size 2 \
--num_train_epochs 10 \
--rank 64 \
--learning_rate 1e-4 \
--mixed_precision fp16 \
--validation_prompt "a beautiful landscape" \
--validation_epochs 5Once training is complete, generate images with your custom LoRA:
Easy way (recommended):
# Generate images with default settings
bash scripts/inference.sh
# Customize with environment variables
PROMPT="a futuristic cityscape at sunset" \
NUM_IMAGES=8 \
bash scripts/inference.shManual way (for full control):
python inference.py \
--lora_path "./outputs/sd35-lora-basic" \
--prompt "your amazing prompt here" \
--num_images 4 \
--height 1024 \
--width 1024 \
--num_inference_steps 28 \
--guidance_scale 7.0 \
--seed 42Customize training and inference with environment variables:
# Training customization
MODEL_NAME="stabilityai/stable-diffusion-3.5-medium" \
DATASET_DIR="./your_custom_dataset" \
BATCH_SIZE=4 \
EPOCHS=20 \
bash scripts/train_basic.sh
# Inference customization
PROMPT="your custom prompt" \
NUM_IMAGES=8 \
STEPS=50 \
RESOLUTION=1024 \
bash scripts/inference.shWant to fine-tune SD3.5 on free or paid Colab GPUs? Use the bundled notebook:
- Notebook:
SD35_LoRA_Colab.ipynb - Step-by-step guide:
docs/COLAB_FINETUNING.md
The notebook performs environment setup, Hugging Face auth, dataset validation, LoRA training, and inference validation. Follow the guide to configure dataset paths, adjust hyperparameters for T4/A100 runtimes, and resume or export runs.
| Parameter | Description | Default | Recommended |
|---|---|---|---|
--resolution |
Training image resolution | 1024 | 1024 for SD3.5 |
--train_batch_size |
Batch size per device | 4 | 2-4 depending on VRAM |
--learning_rate |
Learning rate | 1e-4 | 1e-4 to 5e-4 |
--num_train_epochs |
Number of epochs | 1 | 10-50 |
--mixed_precision |
Precision mode | None | fp16 or bf16 |
| Parameter | Description | Default | Recommended |
|---|---|---|---|
--rank |
LoRA rank | 4 | 64-128 |
--lora_alpha |
LoRA alpha scaling | None | rank * 2 |
--train_text_encoder |
Train text encoders | False | True for better results |
--text_encoder_lr |
Text encoder learning rate | 5e-5 | 1e-5 to 5e-5 |
| Parameter | Description | Default | Notes |
|---|---|---|---|
--gradient_checkpointing |
Enable gradient checkpointing | False | Reduces memory by ~50% |
--weighting_scheme |
Loss weighting scheme | "logit_normal" | Options: sigma_sqrt, mode, cosmap |
--validation_prompt |
Prompt for validation images | None | Required for validation |
--validation_epochs |
Epochs between validations | 50 | Set to 1-5 for frequent validation |
--checkpointing_steps |
Steps between checkpoints | 500 | Adjust based on training length |
--precondition_outputs |
Enable output preconditioning | 1 | As per SD3 paper |
| Parameter | Description | Default | Notes |
|---|---|---|---|
--lora_path |
Path to trained LoRA weights | Required | Directory with .safetensors |
--lora_scale |
LoRA effect strength | 1.0 | 0.0-2.0 range typical |
--num_images |
Number of images to generate | 1 | Batch generation |
--num_inference_steps |
Denoising steps | 28 | More steps = better quality |
--guidance_scale |
Prompt adherence strength | 7.0 | Higher = more faithful |
--seed |
Random seed | None | For reproducible results |
tensorboard --logdir ./outputs/sd35-lora/logs# Login first
wandb login
# Then add to training command
--report_to wandb --run_name "my-experiment"# Option 1: Reduce batch size and enable gradient checkpointing
--train_batch_size 1 --gradient_checkpointing --gradient_accumulation_steps 4
# Option 2: Use CPU offloading for models
--mixed_precision fp16 --gradient_checkpointing
# Option 3: Reduce resolution temporarily
--resolution 512# Switch to bfloat16 (recommended for modern GPUs)
--mixed_precision bf16
# Or use full precision (slower but stable)
--mixed_precision no# Enable all optimizations
--gradient_checkpointing \
--dataloader_num_workers 4 \
--mixed_precision bf16
# Use 8-bit optimizer for memory efficiency
--use_8bit_adam# Use lower learning rate for text encoders
--train_text_encoder --text_encoder_lr 1e-5
# Or disable if not needed
# (remove --train_text_encoder flag)- Gradient Checkpointing: Reduces memory by ~50% with minimal speed impact
- Mixed Precision: Use
bf16for RTX 30/40 series,fp16for older GPUs - Batch Size: Start with 1-2 and increase based on available VRAM
- Resolution: Train at 512px first, then fine-tune at 1024px
- DataLoader Workers: Set to number of CPU cores / 4
- Gradient Accumulation: Use instead of large batch sizes
- 8-bit Optimizers: Reduce memory with minimal accuracy loss
Contributions are welcome! We appreciate all forms of contributions including bug reports, feature requests, documentation improvements, and code contributions.
- Fork the repository and create your feature branch
- Make your changes with clear, descriptive commits
- Add tests for any new functionality
- Update documentation as needed
- Submit a Pull Request with a clear description
# Clone the repository
git clone https://github.com/seochan99/stable-diffusion-3.5-text2image-lora.git
cd stable-diffusion-3.5-text2image-lora
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
pip install -r requirements-dev.txt # If available
# Install pre-commit hooks (optional)
pre-commit install- Follow PEP 8 style guidelines
- Add docstrings to all functions and classes
- Write meaningful commit messages
- Test your changes thoroughly
- Update README if adding new features
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Stability AI for Stable Diffusion 3.5
- Hugging Face for the Diffusers library
- Microsoft for the LoRA technique
- The open-source community for continuous improvements
For questions, issues, or collaboration opportunities:
- Email: gmlcks00513@gmail.com
- GitHub Issues: Create an issue
- Discussions: GitHub Discussions
⭐ Star this repository if it helped you! ⭐