Autoencoder (AE)

The autoencoder is a special form of the [[Encoder-Decoder Model]]. Its task is to encode an input into a latent representation, usually with way less dimensions and thus much more compressed, and to then decode the latent representation back into the input. The AE is trained unsupervised, as input and output are the same. Examples for AE models are the U-Net and the Variational Autoencoder.

Architecture¶

Drawing 2024-07-25 13.15.15.excalidraw#^group=hw5MZMOrOG5ta_zEBWMwu

The architecture is simple: The input \(x\) is passed into layers of increasingly smaller dimensionality (encoder). It passes through the bottleneck layer \(z\) and is then decoded through increasing larger layer up to the dimension of the input (decoder). By training the network to minimize the difference between input and output, the AE model needs to learn a structural decomposition of the input space, so it can reconstruct the input from very little information.

Reconstruction Loss¶

The model tries to minimize the expected distance between \(x\) and \(\hat{x}=D_{\theta}(E_{\Phi}(x))\):

\[ L(\theta, \Phi)=\mathbb{E}_{x\sim\mu_{ref}}[d(x, D_{\theta}(E_{\Phi}(x)))] \]