Todo
- Checkpoint with optimizer state and re-loading for training
- Make model more configurable (add sequence mixer parameters to TrainConfig)
- Model removes Gaussian noise, but not outlier artifacts (at ~7k steps). What to do? Spectrogram-related loss?