Hi,
I noticed that the code is specifically designed for images with resolutions higher than 256x256. The sampling code appears to be tailored for 256x256 images.
How did you adapt the code to work with the CIFAR-10 dataset?
Additionally, did you use the same VAE for CIFAR-10 as the one used for FFHQ and other datasets? It seems that applying the VAE would reduce the CIFAR-10 resolution from 32x32 to 4x4, which might not align with the DiT settings in your implementation.
Thanks in advance for clarifying!
Hi,
I noticed that the code is specifically designed for images with resolutions higher than 256x256. The sampling code appears to be tailored for 256x256 images.
How did you adapt the code to work with the CIFAR-10 dataset?
Additionally, did you use the same VAE for CIFAR-10 as the one used for FFHQ and other datasets? It seems that applying the VAE would reduce the CIFAR-10 resolution from 32x32 to 4x4, which might not align with the DiT settings in your implementation.
Thanks in advance for clarifying!