Skip to content

The cepstrum is the inverse Fourier Transformation (IFT) of the log-scaled signal spectrum. Its a tool to investigate the periodic properties of a signals spectrum and is typically applied in the analysis of human speech. The cepstrum is a graph over quefrency (an anagram of frequency), which represents the time scale of periodicities in the frequency domain rather than in the original time domain.

Peaks in the cepstrum correspond to periodicities in the spectrum, indicating the presence of harmonics or echoes. For example, in a speech signal, a peak in the cepstrum can represent the fundamental frequency and its harmonics.

Drawing 2024-05-23 11.23.22.excalidraw#^group=gYUfAULHdFX6f69z7aV6_

The cepstrum can also be defined by taking the Fourier Transform twice (instead of the IFT), as it is equally distributed and only differs in a scaling factor. This makes it the spectrum of the spectrum. A key property of the cepstrum is that the convolution of a time signal corresponds to an addition of the cepstra (due to the logarithm applied to the spectrum).

\[ \begin{align} && y(t) &= x(t)*h(t) \\ &\Rightarrow & Y(s)&=X(s)H(s)\\ &\Rightarrow & logY(s)&=logX(s)+logH(s) \\ &\Rightarrow & \widetilde{y(n)} &= \widetilde{x(n)} + \widetilde{h(n)} \end{align} \]

The cepstrum has analogous concepts to the spectrum, but the interpretation of them is different:

  • Cepstrum <-> Spectrum
  • Liftering <-> Filtering
  • Quefrency <-> Frequency
  • Rahmonics <-> Harmonics
  • Saphe <-> Phase

Complex Cepstrum

![[Pasted image 20240603151758.png|500]]

The cepstrum as defined above is a simplification for the real part of the cepstrum only. The real cepstrum is only a subset of the full, complex cepstrum. If we consider the complex Fourier transform \(\mathcal{F}\), we can write it in polar form:

\[ \log(\mathcal{F})=\log(|{\mathcal{F}|{\cdot}e^{j\varphi}})=\log(|F|)+\log(e^{ j\varphi})=\log(|F|)+j\varphi \]

Therefore, we can apply carry the phase information as a simple sum. This makes the cepstrum a revertible process, it is homomorphic.

Quefrency

While the cepstrum is a graph over time, it does not represent the same time scale as the original time-domain signal. Rather, it represents the period-length of periodicities in the spectrum. For example, if the original signal is a voice signal wich has a fundamental frequency of \(200 Hz\) and was sampled at \(44100 Hz\), then the cepstrum will have a peak at \(44100Hz / 200Hz = 220.5\) samples.

Speech Processing

The cepstrum was originally used for dealing with echoes and seismic activity, but found more usage in speech processing in the form of the Mel-frequency Cepstrum Coefficients (MFCC).

The cepstrum is mostly sensitive to the fundamental frequency and its harmonics and much less sensitive to modulations of the amplitude and frequency that are not related to harmonics (e.g., due to noise or recording equipment), which makes it very useful when dealing with the Fundamental Frequency (F0) and pitch of the voice. Also, it helps in separating frequencies of source and filter in the Source-Filter Model.

The source (or excitation) corresponds to the harmonics created by the [[Glottis]] and is represented by high cepstral coefficients (harmonics have higher frequency in spectrum). The filter (Source-Filter Model#Tube Model of the Vocal Tract) is represented by lower cepstral coefficients (envelope has lower frequency in spectrum).

Cepstral Smoothing

The authors of @breithauptCepstralSmoothingSpectral2007 propose cepstral smoothing for speech analysis that is more robust to noise and other minor frequency modulations, while preserving the spectral information of Formants in the Spectral Envelope.

The smoothing is done by converting a spectrum to a cepstrum, applying a low-pass filter (which removes the harmonics from the spectrum) and applying the inverse Fourier transform.

Pasted image 20240603113856.png

The graph presents the spectral envelope as extracted by…

LPC to Cepstrum

We can transform the coefficients of the Source-Filter Model#Linear Prediction into cepstral coefficients. Doing so allows us to design low latency filters, which need to be computed only over the positive part of the cepstrum.

Derivation

  1. Transform filter into product form The all pole filter derived in Source-Filter Model#^f4c6cb can be rewritten in as a product:

$$

H(z)=\frac{b_0}{z{-p}\prod\limits_{\nu=1}}(z-p_{\nu})}=\frac{b_0}{\prod\limits_{\nu=1{p}(1-p_{\nu}z})

$$ 2. Apply log For the cepstrum, we are looking at the log-spectrum, which transforms all products into sums and quotients into differences:

$$

\log H(z)=\log{b_0}-\sum\limits_{\nu=1}{p}(1-p_{\nu}z)

$$ 3. Series representation We now want to apply the inverse [[z-Transform]] to the log-spectrum filter. However, the log-terms complicate the transformation, thus we first use the series representation

$$

log(1-a)=-\sum\limits_{m=1}{\infty}\frac{am}{m}\qquad|a|<1

$$ which allows us to simplify our filter expression to

$$

\log H(z)=\log b_{0} - \sum\limits_{m=1}{\infty}\frac{p}_{\nu}z^{-m}}{m

$$ 4. Apply z-transform We can now apply the z-transform:

$$

Z{-1}\left{\sum\limits_{\nu=1}z}\log (1-p_{\nu{-1}\right}=\frac{1}{2{\pi}j}\oint\sum\limits_{\nu=1}{p}\sum\limits_{m=1}\frac{p{m}_{\nu}}{m}zdz

$$ an with the Cauchy integral theorem (see Complex Integral) to simplify the expression to:

$$

\begin{align}
\frac{1}{2{\pi}j}\oint Z^{n-m-1}dz &=
\begin{cases}
  1&n=m \\
  0&else
\end{cases} \\
&= \begin{cases}
  \sum\limits_{\nu=1}^{p}\frac{p_{\nu}^{p}}{n}&n\geq1 \\
  0&else
\end{cases} \\
\end{align}

$$ 2. The cepstral filter We now just need to add the term \(Z^{-1}\{\log b_0\}\). But since the FT of a constant is just a delta spike at zero, we can derive the cepstral filter as:

$$

\hat{H}(z)= \begin{cases}
 \sum\limits_{\nu=1}^{p}\frac{p_{\nu}^{p}}{n}  &  n\geq1 \\
 \log b_{0}  &  n=0\\
 0  &  else
\end{cases}

$$

Minimum Phase

For the auto regressive signal model, the cepstrum is thus zero for negative \(n\), implying a minimum phase signal where all zeroes of the transfer function fall within the unit circle. This is to be expected for an all-pole filter, as the only zeros are at \(0\). We can make any system minimum phase by setting the negative quefrencies to zero and adjusting the energy accordingly.

Since the real part of the cepstrum is the even part of the complex spectrum, we can calculate it as $$

c(n)=\frac{\hat{h}(n)+\hat{h}(-n)}{2}

\[ and thus: \]

\hat{H}(z)=

\begin{cases}

\frac{1}{2}\sum\limits_{\nu=1}{p}\frac{p_{\nu} & n\neq0 \}}{|n|

\log b_{0} & n=0

\end{cases}

\[ We can get a minimum phase cepstral representation by setting the real part of the cepstral filter to 0 for negative n and doubling it for positive n: \]

\hat{H}(z)=

\begin{cases}

c(0) & n=0 \

2c(n) & n>0 \

0 & else

\end{cases}

$$

This filter is useful for low-latency filtering, as the filter does not have any delay as compared to most other filters which are symmetric and thus add delay.

unclear