Thank you for sharing your work on TangoFlux. I have a couple of questions:
-
As far as I know, the WavCaps and AudioCaps datasets used for training have lower sampling rates than 44.1 kHz. Did you upsample these datasets for training, or is the high-frequency range in the generated audio intentionally left empty?
-
Is there a specific reason for choosing Stable Audio Open's VAE for compression?
I appreciate your contributions and look forward to your clarification.
Thank you for sharing your work on TangoFlux. I have a couple of questions:
As far as I know, the WavCaps and AudioCaps datasets used for training have lower sampling rates than 44.1 kHz. Did you upsample these datasets for training, or is the high-frequency range in the generated audio intentionally left empty?
Is there a specific reason for choosing Stable Audio Open's VAE for compression?
I appreciate your contributions and look forward to your clarification.