How to perform HunyuanVideo I2V inversion?

Thanks for your wonderful work! I am wondering how can I perform inversion on HunyuanVideo I2V? Because of the `token_replace` mechanism. When I encode input video, the output of VAE is not equal to `latents = torch.cat[image_latents, latents[:,:,1,:,:]]`.