Create a new Conda environment:
conda create -n vtoff python=3.11
conda activate vtoffThen, clone the repository, switch to commit used for the paper, install the required packages:
git clone https://github.com/rizavelioglu/tryoffdiff.git
git switch c385ea2
cd tryoffdiff
pip install -e .Download the original VITON-HD dataset and extract it to "./data/vitonhd":
python tryoffdiff/dataset.py download-vitonhd # For a different location: output-dir="<other-folder>"As mentioned in the paper, the original dataset contains duplicates, and some training samples are leaked into the test set. Clean these with the following command:
python tryoffdiff/dataset.py clean-vitonhd # Default: `data-dir="./data/vitonhd"`For faster training, pre-extract the image features and save them instead of extracting them during training.
python tryoffdiff/dataset.py vae-encode-vitonhd \
--data-dir "./data/vitonhd/" \
--model-name "sd14" \
--batch-size 16python tryoffdiff/dataset.py siglip-encode-vitonhd \
--data-dir "./data/vitonhd/" \
--batch-size 64- Option 1 (GPU-poor) - Train with a single GPU:
Execute the following
python tryoffdiff/modeling/train.py tryoffdiff \
--save-dir "./models/" \
--data-dir "./data/vitonhd-enc-sd14/" \
--model-class-name "TryOffDiff" \
--mixed-precision "no" \
--learning-rate 0.0001 \
--train-batch-size 16 \
--num-epochs 1200 \
--save-model-epochs 100 \
--checkpoint-every-n-epochs 100- Option 2 - Train with 4-GPUs on a single node (as done in the paper):
First, configure accelerate accordingly:
accelerate configWe did not use any of the tools like dynamo, DeepSpeed, FullyShardedDataParallel etc.
Then, start training:
accelerate launch --multi_gpu --num_processes=4 tryoffdiff/modeling/train.py tryoffdiff \
--save-dir "./models/" \
--data-dir "./data/vitonhd-enc-sd14/" \
--model-class-name "TryOffDiff" \
--mixed-precision "no" \
--learning-rate 0.0001 \
--train-batch-size 16 \
--num-epochs 1201 \
--save-model-epochs 100 \
--checkpoint-every-n-epochs 100Note: See config.py(TrainingConfig) for all possible arguments, e.g. set
resume_from_checkpointto resume training from a specific checkpoint.
Each model has its own command. View all available options:
python tryoffdiff/modeling/predict.py --helpExample: Run inference with
TryOffDiff:python tryoffdiff/modeling/predict.py tryoffdiff \ --model-dir "/model_20241007_154516/" \ --model-filename "model_epoch_1200.pth" \ --batch-size 8 \ --num-inference-steps 50 \ --seed 42 \ --guidance-scale 2.0
which saves predictions to "<model-dir>/preds/" as .png files.
Note: See config.py(InferenceConfig) for all possible arguments, e.g. use the
--allflag to run inference on the entire test set.
Note: The paper uses the PNDM noise scheduler. For HuggingFace Spaces we use the EulerDiscrete scheduler.
Evaluate the predictions using:
python tryoffdiff/modeling/eval.py \
--gt-dir "./data/vitonhd/test/cloth/" \
--pred-dir "<prediction-dir>" \
--batch-size 32 \
--num-workers 4which prints the results to the console. Specifically, we use the following libraries for the implementations of the metrics presented in the paper:
pyiqa:SSIM,MS-SSIM,CW-SSIM, andLPIPS,clean-fid:FID,CLIP-FID, andKID,DISTS-pytorch:DISTS
In addition, we offer a simple GUI for visualizing predictions alongside their evaluation metrics. This tool displays the ground truth and predicted images side-by-side while providing metrics for the entire test set:
python tryoffdiff/modeling/eval_vis.py \
--gt-dir "./data/vitonhd/test/cloth/" \
--pred-dir "<prediction-dir>"