Skip to content

Implementation Status and planned TODOs #4

@Rayhane-mamah

Description

@Rayhane-mamah

this umbrella issue tracks my current progress and discuss priority of planned TODOs. It has been closed since all objectives are hit.

Goal

  • achieve a high quality human-like text to speech synthesizer based on DeepMind's paper
  • provide a pre-trained Tacotron-2 model (Training.. checking this still)

Model

Feature Prediction Model (Done)

  • Convolutional-RNN encoder block
  • Autoregressive decoder
  • Location Sensitive Attention (+ smoothing option)
  • Dynamic stop token prediction
  • LSTM + Zoneout
  • reduction factor (not used in the T2 paper)

Wavenet vocoder conditioned on Mel-Spectrogram (Done)

  • 1D dilated convolution
  • Local conditioning
  • Global conditioning
  • Upsampling network (by transposed convolutions)
  • Mixture of logistic distributions
  • Gaussian distribution for waveforms modeling
  • Exponential Moving Average (train + synthesis)

Scripts

  • Feature prediction model: training
  • Feature prediction model: natural synthesis
  • Feature prediction model: ground-truth aligned synthesis
  • Wavenet vocoder model: training (ground truth Mel-Spectrograms)
  • Wavenet vocoder model: training (ground truth aligned Mel-Spectrograms)
  • Wavenet vocoder model: waveforms synthesis
  • Global model: synthesis (from text to waveforms)

Extra (optional):

  • Griffin-Lim (as an alternative vocoder)
  • Reduction factor (speed up training, reduce model complexity + better alignment)
  • Curriculum-Learning for RNN Natural synthesis. paper
  • Post processing network for Linear Spectrogram mapping
  • Wavenet with Gaussian distribution (reference)

Notes:

All models in this repository will be implemented in Tensorflow on a first stage, so in case you want to use a Wavenet vocoder implemented in Pytorch you can refer to this repository that shows very promising results.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions