E3 TTS

The E3 TTS model uses end-to-end diffusion based text-to-speech generation to create highly realistic speech without any intermediate representations.