The cepstrum is the inverse Fourier Transformation (IFT) of the log-scaled signal spectrum. Its a tool to investigate the periodic properties of a signals spectrum and is typically applied in the analysis of human speech. The cepstrum is a graph over quefrency (an anagram of frequency), which represents the time scale of periodicities in the frequency domain rather than in the original time domain.

Peaks in the cepstrum correspond to periodicities in the spectrum, indicating the presence of harmonics or echoes. For example, in a speech signal, a peak in the cepstrum can represent the fundamental frequency and its harmonics.

Drawing 2024-05-23 11.23.22.excalidraw#^group=gYUfAULHdFX6f69z7aV6_

The cepstrum can also be defined by taking the Fourier Transform twice (instead of the IFT), as it is equally distributed and only differs in a scaling factor. This makes it the spectrum of the spectrum. A key property of the cepstrum is that the convolution of a time signal corresponds to an addition of the cepstra (due to the logarithm applied to the spectrum).

\[ \begin{align} && y(t) &= x(t)*h(t) \\ &\Rightarrow & Y(s)&=X(s)H(s)\\ &\Rightarrow & logY(s)&=logX(s)+logH(s) \\ &\Rightarrow & \widetilde{y(n)} &= \widetilde{x(n)} + \widetilde{h(n)} \end{align} \]

The cepstrum has analogous concepts to the spectrum, but the interpretation of them is different:

Cepstrum <-> Spectrum
Liftering <-> Filtering
Quefrency <-> Frequency
Rahmonics <-> Harmonics
Saphe <-> Phase

Complex Cepstrum¶

![[Pasted image 20240603151758.png|500]]

The cepstrum as defined above is a simplification for the real part of the cepstrum only. The real cepstrum is only a subset of the full, complex cepstrum. If we consider the complex Fourier transform $\mathcal{F}$, we can write it in polar form:

\[ \log(\mathcal{F})=\log(|{\mathcal{F}|{\cdot}e^{j\varphi}})=\log(|F|)+\log(e^{ j\varphi})=\log(|F|)+j\varphi \]

Therefore, we can apply carry the phase information as a simple sum. This makes the cepstrum a revertible process, it is homomorphic.

Quefrency¶

While the cepstrum is a graph over time, it does not represent the same time scale as the original time-domain signal. Rather, it represents the period-length of periodicities in the spectrum. For example, if the original signal is a voice signal wich has a fundamental frequency of $200 Hz$ and was sampled at $44100 Hz$, then the cepstrum will have a peak at $44100Hz / 200Hz = 220.5$ samples.

Speech Processing¶

The cepstrum was originally used for dealing with echoes and seismic activity, but found more usage in speech processing in the form of the Mel-frequency Cepstrum Coefficients (MFCC).

The cepstrum is mostly sensitive to the fundamental frequency and its harmonics and much less sensitive to modulations of the amplitude and frequency that are not related to harmonics (e.g., due to noise or recording equipment), which makes it very useful when dealing with the Fundamental Frequency (F0) and pitch of the voice. Also, it helps in separating frequencies of source and filter in the Source-Filter Model.

The source (or excitation) corresponds to the harmonics created by the [[Glottis]] and is represented by high cepstral coefficients (harmonics have higher frequency in spectrum). The filter (Source-Filter Model#Tube Model of the Vocal Tract) is represented by lower cepstral coefficients (envelope has lower frequency in spectrum).

Cepstral Smoothing¶

The authors of @breithauptCepstralSmoothingSpectral2007 propose cepstral smoothing for speech analysis that is more robust to noise and other minor frequency modulations, while preserving the spectral information of Formants in the Spectral Envelope.

The smoothing is done by converting a spectrum to a cepstrum, applying a low-pass filter (which removes the harmonics from the spectrum) and applying the inverse Fourier transform.

Pasted image 20240603113856.png

The graph presents the spectral envelope as extracted by…

Fourier Transformation#Short Time Fourier Transform (STFT)
Source-Filter Model#Linear Prediction (LPC)
Cepstral Smoothing The envelope resulting from cepstral smoothing is the local average of the log spectrum. Compared to the LPC, the signal power is biased negatively, which can be compensated for. The peaks of the LPC are much sharper, as it is an all pole filter ([[Pole-Zero filter]]), which means, theoretically it has infinite peaks at the formants.

LPC to Cepstrum¶

We can transform the coefficients of the Source-Filter Model#Linear Prediction into cepstral coefficients. Doing so allows us to design low latency filters, which need to be computed only over the positive part of the cepstrum.

Derivation

Transform filter into product form The all pole filter derived in Source-Filter Model#^f4c6cb can be rewritten in as a product:

$$

H(z)=\frac{b_0}{z^{{-p}\prod\limits_{\nu=1}}}(z-p_{\nu})}=\frac{b_0}{\prod\limits_{\nu=1^{{p}(1-p_{\nu}z}})

$$ 2. Apply log For the cepstrum, we are looking at the log-spectrum, which transforms all products into sums and quotients into differences:

$$

\log H(z)=\log{b_0}-\sum\limits_{\nu=1}^{{p}(1-p_{\nu}z})

$$ 3. Series representation We now want to apply the inverse [[z-Transform]] to the log-spectrum filter. However, the log-terms complicate the transformation, thus we first use the series representation

$$

log(1-a)=-\sum\limits_{m=1}^{{\infty}\frac{a}m}{m}\qquad|a|<1

$$ which allows us to simplify our filter expression to

$$

\log H(z)=\log b_{0} - \sum\limits_{m=1}^{{\infty}\frac{p}}_{\nu}z^{-m}}{m

$$ 4. Apply z-transform We can now apply the z-transform:

$$

Z^{{-1}\left{\sum\limits_{\nu=1}}z}\log (1-p_{\nu^{{-1}\right}=\frac{1}{2{\pi}j}\oint\sum\limits_{\nu=1}{p}\sum\limits_{m=1}}\frac{p^{{m}_{\nu}}{m}z}dz

$$ an with the Cauchy integral theorem (see Complex Integral) to simplify the expression to:

$$

\begin{align}
\frac{1}{2{\pi}j}\oint Z^{n-m-1}dz &=
\begin{cases}
  1&n=m \\
  0&else
\end{cases} \\
&= \begin{cases}
  \sum\limits_{\nu=1}^{p}\frac{p_{\nu}^{p}}{n}&n\geq1 \\
  0&else
\end{cases} \\
\end{align}

$$ 2. The cepstral filter We now just need to add the term $Z^{-1}\{\log b_0\}$. But since the FT of a constant is just a delta spike at zero, we can derive the cepstral filter as:

$$

\hat{H}(z)= \begin{cases}
 \sum\limits_{\nu=1}^{p}\frac{p_{\nu}^{p}}{n}  &  n\geq1 \\
 \log b_{0}  &  n=0\\
 0  &  else
\end{cases}

$$

Minimum Phase¶

For the auto regressive signal model, the cepstrum is thus zero for negative $n$, implying a minimum phase signal where all zeroes of the transfer function fall within the unit circle. This is to be expected for an all-pole filter, as the only zeros are at $0$. We can make any system minimum phase by setting the negative quefrencies to zero and adjusting the energy accordingly.

Since the real part of the cepstrum is the even part of the complex spectrum, we can calculate it as $$

c(n)=\frac{\hat{h}(n)+\hat{h}(-n)}{2}

\[ and thus: \]

\hat{H}(z)=

\begin{cases}

\frac{1}{2}\sum\limits_{\nu=1}^{{p}\frac{p_{\nu}} & n\neq0 \}}{|n|

\log b_{0} & n=0

\end{cases}

\[ We can get a minimum phase cepstral representation by setting the real part of the cepstral filter to 0 for negative n and doubling it for positive n: \]

\hat{H}(z)=

\begin{cases}

c(0) & n=0 \

2c(n) & n>0 \

0 & else