One way to represent a signal is the time-domain waveform, which shows how the amplitude of the signal changes over time. Examples of time-varying signals include the temperature in temperature logs, stock-index profiles, electrocardiogram signals, vibration signals, and speech signals, such as the speech signal in the following figure.
The time-domain speech waveform in the previous figure depicts how the sound-pressure level evolves over time. The higher the sound-pressure level at any particular time, the larger the magnitude, or the absolute value, of the signal.
An important task in most speech-enhancement applications is to find the noise characteristics and then remove the noise from the speech signal. In the previous figure, the period from 1.4 s to 2.0 s is the silence period, when no speech is present. Any signal measured during this time frame is noise. In speech-enhancement applications, you often observe the signal during the silence periods to estimate the noise characteristics.
In many speech-analysis applications, it is important to identify the spectral content of the speech signal. Notice that the time waveform in the previous figure does not provide information about the spectral content of the speech signal. To determine the frequency characteristics of this speech signal, you need a way to estimate the spectral content of the signal. One possible technique is to apply the fast Fourier transform (FFT) to the signal to convert the time waveform to a frequency spectrum. The following figure is an example of the FFT.
Using the FFT to transform a time-domain signal to the frequency-domain representation of the signal can help you discover information that might be hidden in the time-domain waveform. The square of the magnitude of the FFT is called the power spectrum, which characterizes how the energy of a signal is distributed in the frequency domain.
The power spectrum of a speech signal can show the relative intensity of the energy of a signal at each frequency for the entire signal. However, the power spectrum of a signal for a shorter time scale can be more useful. For example, if a speech signal includes separate low-frequency utterances and high-frequency utterances, separate power spectra for each utterance can be useful. Even within a particular utterance, variations in signal characteristics might exist, so it is useful to analyze the signal with a short time scale. For example, it might be useful to separate unvoiced speech from voiced speech in a particular utterance.
You can use the STFT spectrogram to provide power spectra for short time scales. The following figure shows the STFT spectrogram of the speech signal in the previous two figures.
In the previous figure, the color depicts the magnitude of the energy of the signal at time, t, and frequency, f. The spectrum from red to blue corresponds to the energy level from strongest to weakest.
From the time-frequency representation in the previous figure, you can identify the silence period, and you can see the change in spectral content of the signal over time. You can see the time onset, the end, the fundamental frequency, and the harmonic frequencies of the two utterances in the previous figure. These parameters are crucial in various speech-processing applications, such as speech recognition. Compared to the speech signal and FFT figures, the spectrogram in the previous figure can illuminate better the nature of a human speech signal.
Speech signal analysis is only one example application that benefits from methods other than the FFT. Time-frequency analysis is broadly useful because most signals in real-world applications have time-varying spectral content. Analyzing the time-dependent spectra enables you to better understand the signal and the associated system. The spectrogram in the previous figure is only one of many proven time-frequency analysis methods.