Skip to content

Autoregressive Models

The idea of autoregression is to use past values are used to predict future ones. The formalization differs between time series analysis and deep learning approaches.

Signal Processing & Time Series Analysis

See https://en.wikipedia.org/wiki/Autoregressive_model.

In the field of signal processing, autoregressive models describe time-varying processes, that are (weekly) stationary. A random variable \(X\) at time \(t\) is modelled as a weighted linear combination of its previous realizations and some noise:

\[ X_{t}=\sum\limits_{i=1}^{p}\upvarphi_{i}X_{t-i}+\epsilon_{t} \]

ARMA Model

One common application of this formulation is in ARMA models, which combine the autoregressive part with a moving average for improved prediction:

Autoregressive Moving Average (ARMA)

Deep Learning

See https://deepgenerativemodels.github.io/notes/autoregressive/

![[Pasted image 20240809134938.png]]

In deep learning, autoregression is formulated as the factorization of a joint distribution into a chain of conditional distributions:

\[ q_{\theta}(x_{1}, ..., x_{T})=q_{\Theta}(x_{1})\prod\limits_{t=2}^{T}q_{\theta}(x_{t}|x_{1}, ..., x_{t-1}) \]

This is typically done in sequence modeling (e.g. RNNs). If we consider a binary model where the variables are represented in tabular form, the \(n\)-th variable will have \(n-1\) conditional variables. That means, we have \(2^{n-1}\) possible conditional configurations, for which we need to specify the conditional probability. The complexity of this naive representation thus is \(\mathcal{O}(2^{n})\).

Fully-Visible Sigmoid Belief Network

![[Pasted image 20240812105508.png]]

In the FVSBN, the \(i\)-th variable is represented as a parameterized linear combination of the \(i-1\) prior variables, normalized using Sigmoid:

\[ f_{i}(x_{<i})=\sigma(\alpha^{(i)}_{0}+\alpha^{(i)}_{1}x_{1}+\ldots+\alpha^{(i)}_{i-1}x_{i-1}) \]

Basically, the \(i\)-th variable is a weighted mean of the prior observations. This brings down the complexity to \(\mathcal{O}(n^{2})\) (\(n\) weights over \(n\) variable).

Neural Autoregressive Density Estimator

Pasted image 20240812105528.png

We can use the idea of FVSBNs, but parameterize the weights using a [[Multi-Layer Perceptron]]. This adds a hidden layer of size \(n\times d\), slightly increasing the complexity to \(\mathcal{O}(n^{2}d)\) (\(n\) weights over \(n\times d\) hidden layers). But if we constrain the

\[ \begin{align} \mathbf{h}_i &= \sigma(W_{., < i} \mathbf{x_{< i}} + \mathbf{c}) \\ f_i(x_1, x_2, \ldots, x_{i-1}) &= \sigma(\boldsymbol{\alpha}^{(i)}\mathbf{h}_i +b_i ) \end{align} \]

where \(\theta=\{W\in \mathbb{R}^{d\times n}, \mathbf{c} \in \mathbb{R}^d, \{\boldsymbol{\alpha}^{(i)}\in \mathbb{R}^d\}^n_{i=1}, \{b_i \in \mathbb{R}\}^n_{i=1}\}\) is the set of parameters. Since the input weights \(W\) are shared for all conditionals, the complexity is reduced to \(\mathcal{O}(nd)\).

A further extension computes means and variances for each of the conditionals and models the \(i\)-th variable as a mixture of Gaussians ([[Gaussian Mixture Model]]). This allows for modeling over real-valued data, thus this extension is called RNADE (Real-valued NADE).