In general, recent studies make use of two methods for estimating formant patterns of vowel sounds: Linear prediction analysis (LP analysis, or synonymously, linear predictive coding [LPC]), and spectrographic depiction. Moreover, many studies link these two methods together, that is, calculation of numerical values of frequencies, bandwidths, and amplitudes of the formants is carried out by LP analysis, and these values are crosschecked by visual inspection of the related spectrogram.
LP analysis relies on the source-filter theory of speech production. Simply put, it is based on a decomposition of a sound wave into a source and a filter, where the filter shape is assumed to correspond to the vocal tract resonances. As a result, values for each formant can be derived from a calculated filter curve that represents the transfer function of the vocal tract.
For spectrographic depiction, a Fourier Transform (e.g., fast Fourier Transform [FFT]) needs to be performed. A good way to estimate formant frequencies is to use a wide-band spectrogram, showing frequency vs. time, with intensity as darkness. Thus, in the spectrogram, frequency ranges of highest energy (darkest bars) correspond to formants.
Both the LP analysis and the spectrographic estimation have advantages and disadvantages in terms of formant pattern estimation, which are discussed here on the basis of a practical approach in PRAAT.
Linear prediction in PRAAT: PRAAT allows the possibility of choosing between different algorithms that are all based on linear prediction (LP). This includes algorithms that are integrated in the commands ‘To LPC…’ and ‘To Formant…’ (with additional sub-commands).
In general, LP requires different parameters/coefficients that are either given within the particular algorithm, or have to be chosen by the investigator: (1) Time step(s) to determine the frames for which analysis will be carried out within the total duration of the analysis window. Thus, a low value leads to higher number of analysis frames. (2) A maximum number of formants, which determines the number of expected formants in the calculated spectrum, which are represented in the calculation in form of filter poles. (3) A frequency ceiling (in Hz) for the range of formant estimation. (4) A window length that determines the effective duration (in s) of the analysis window. (5) A formant bandwidth, which determines the frequency range of a single formant frequency. (6) A cut off frequency for pre-emphasis (in Hz; 6 dB amplitude enhancement per octave above this frequency).
In the case of ‘To LPC…’ and its sub-commands, the so-called Nyquist frequency, which is equal to half the sampling frequency of the particular signal, is automatically used as their frequency ceiling for formant estimation. Therefore, this requires (in most cases) resampling the sound before doing an analysis. This is necessary, because estimation of, for example, five formants below 5500 Hz requires a sampling frequency of 11 kHz...