Manifold Hypothesis

According to the manifold hypothesis, real world data tends to concentrate on lower dimensional manifolds embedded in the high dimensional data space.

For example, if you have a \(32\text{x}32\) black-and-white image, your sample space is \(x\in\mathcal{R}^{32}\). But, given some real-world data, the probability density for most of this space is \(p(x)\approx0\),. That is, most random pixel combinations wont ever come up in our data and there are only very few regions in \(\mathcal{R}^{32}\) that actually have some positive probability of occurring. Those are the modes of our data distribution \(p(x)\). According to the manifold hypothesis, those modes can be projected on a continuous lower dimensional manifold.

Compared to the scarce high-dimensional data space, those lower dimensional projections allow for a very dense representation of the sample space, which is suggested to explain the effectiveness of machine learning techniques (e.g. Variational Autoencoder)