Date Published: July 5, 2017
Publisher: Public Library of Science
Author(s): Sierra Broussard, Gregory Hickok, Kourosh Saberi, Ian McLoughlin.
The current study investigated how amplitude and phase information differentially contribute to speech intelligibility. Listeners performed a word-identification task after hearing spectrally degraded sentences. Each stimulus was degraded by first dividing it into segments, then the amplitude and phase components of each segment were decorrelated independently to various degrees relative to those of the original segment. Segments were then concatenated into their original sequence to present to the listener. We used three segment lengths: 30 ms (phoneme length), 250 ms (syllable length), and full sentence (non-segmented). We found that for intermediate spectral correlation values, segment length is generally inconsequential to intelligibility. Overall, intelligibility was more adversely affected by phase-spectrum decorrelation than by amplitude-spectrum decorrelation. If the phase information was left intact, decorrelating the amplitude spectrum to intermediate values had no effect on intelligibility. If the amplitude information was left intact, decorrelating the phase spectrum to intermediate values significantly degraded intelligibility. Some exceptions to this rule are described. These results delineate the range of amplitude- and phase-spectrum correlations necessary for speech processing and its dependency on the temporal window of analysis (phoneme or syllable length). Results further point to the robustness of speech information in environments that acoustically degrade cues to intelligibility (e.g., reverberant or noisy environments).
Phase spectrum analysis is often ignored in models of auditory spectral processing in humans despite our knowledge that humans are not phase deaf when listening to complex sounds. Phonemes, for example, are most often represented as a structural component of the amplitude spectrum [1–2]. However, a number of studies have found that phase plays a major role in speech analysis and recognition. Oppenheim and Lim  found evidence through informal experiments that phase information could be useful in speech-signal reconstruction for long signal times, concluding that changing the phase spectrum of a speech sound can alter its phonetic value.
Fig 2 shows average intelligibility scores for each window size as a function of amplitude- and phase-spectrum correlations. Each point is based on 10 sentences (~40 words) per listener (~200 words per point). An intelligibility score of 1 indicates that every subject correctly identified all keywords in all sentences for that condition.