Entanglement of Latent Features
Neural networks tend to have polysemantic neurons. That means that there is not a one-to-one correspondence between neurons and latent features, but neurons may be activated for multiple features. A possible explanation is the superposition hypothesis (@bereskaMechanisticInterpretabilityAI2024), according to which a \(n\)-dimensional space can encode \(n\) orthogonal directions (and features), but \(\propto \exp(n)\) almost orthogonal features. So the networks can learn much more compressed representations by relaxing the orthogonality constraint of features. However, in some cases, we are interested to explicitly enforce orthogonality to disentangle latent features, so we can manipulate a single neuron (or a subset of neurons) to manipulate a single attribute in the generated sample.
Disentanglement¶
High dimensional data sampled from reality typically contains entangled features, meaning that a single input feature (e.g. a pixel) can vary for multiple latent features (e.g. cats and dogs). The process of disentanglement tries to find a latent representation for the input data in which different latent features are represented by separate neurons. For examples, see Variational Autoencoder#Disentanglement methods, Emotional Speech Synthesis (ESS)#Disentanglement methods, Disentanglement in Diffusion Models
Entanglement Metrics¶
There are multiple metrics to assess the entanglement of latent variables.
^48dc8c
- Higgins et al.: beta-VAE#Disentanglement metric
- Kim & Minh: Factor VAE#Disentanglement metric
- MIG: beta-TCVAE#Disentanglement metric