Thanks for your wonderful work! I am wondering how can I perform inversion on HunyuanVideo I2V? Because of the token_replace mechanism. When I encode input video, the output of VAE is not equal to latents = torch.cat[image_latents, latents[:,:,1,:,:]].
Thanks for your wonderful work! I am wondering how can I perform inversion on HunyuanVideo I2V? Because of the
token_replacemechanism. When I encode input video, the output of VAE is not equal tolatents = torch.cat[image_latents, latents[:,:,1,:,:]].