Skip to content

Fundamental Frequency (F0)

The fundamental frequency, often abbreviated as just F0, is a fundamental parameter in speech processing, especially when it comes to prosody. It is the inverse of the fundamental period, with which the vocal cords vibrate.

Biologically, when the vocal cords are under tension, the airflow causes an increase in airflow and decrease in pressure between the cords. The vocal cords now snap together due to the Bernoulli-effect, leading to a pressure decrease and subsequent opening again.

For male speakers, the voice usually has a frequency around 100Hz, for female speakers around 200Hz, for children ca. 600Hz.

Estimation

The F0 can be estimated with different methods.

Peak Distance

A basic and fast way to do it is to use the distance between zero-crossing before peaks of the waveform.

Pasted image 20240423131954.png

However, this has a large error rate for natural speech, especially when noise comes into play.

Power Spectral Density

Power Spectral Density (PSD)#^32856

Pasted image 20240423132832.png

The first non-zero peak in the auto-correlation represents F0. It is also the first peak in the PSD. Typically for language signals, the PSD is applied on short time intervals, around 30ms. For longer intervals, the frequency estimation gets more robust, but the fundamental frequency changes across short amounts of time. For even more robustness, instead of the autocorrelation, the YIN-algorithm might be used Power Spectral Density (PSD)#Yin-Algorithm.