Latent Score Based Generative Model

Based on @vahdatScorebasedGenerativeModeling2021

Score-Based Diffusion Model act directly on the space of the data, which might not be the most efficient distribution for generative modeling. Finding a latent representation of our data (using a Variational Autoencoder) has several advantages over the default approach:

Speed: With the right embedding, SGM results in smoother convergence in a smaller space and hopefully fewer steps.
Flexibility: SGM moves continuously through the input space. For discrete (binary, categorical, graph-like) distributions, normal SGM does not work. We can use the encoding to switch between the discreet sample space and a continuous diffusion space.
Expressivity: Utilizing a latent space has been shown to increase expressivity in generative models.

Such a model is called Latent Score-Based Generative Model (LSGM).

Training¶

Training an LSGM involves not only finding the optimal parameters for the reverse diffusion, but also the optimal parameters for the encoder and decoder. This is done by minimizing the Evidence Lower Bound of the negative log-likelihood:

Pasted image 20240725111958.png

The first term trains the Variational Autoencoder, the other terms minimize the Kullback-Leibler divergence between the encoder distribution and the SGM prior.