School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Department of Communication Sciences & Disorders, University of South Florida, USA.
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
Brain Res. 2019 Jul 1;1714:182-192. doi: 10.1016/j.brainres.2019.02.025. Epub 2019 Feb 20.
When two voices compete, listeners can segregate and identify concurrent speech sounds using pitch (fundamental frequency, F0) and timbre (harmonic) cues. Speech perception is also hindered by the signal-to-noise ratio (SNR). How clear and degraded concurrent speech sounds are represented at early, pre-attentive stages of the auditory system is not well understood. To this end, we measured scalp-recorded frequency-following responses (FFR) from the EEG while human listeners heard two concurrently presented, steady-state (time-invariant) vowels whose F0 differed by zero or four semitones (ST) presented diotically in either clean (no noise) or noise-degraded (+5dB SNR) conditions. Listeners also performed a speeded double vowel identification task in which they were required to identify both vowels correctly. Behavioral results showed that speech identification accuracy increased with F0 differences between vowels, and this perceptual F0 benefit was larger for clean compared to noise degraded (+5dB SNR) stimuli. Neurophysiological data demonstrated more robust FFR F0 amplitudes for single compared to double vowels and considerably weaker responses in noise. F0 amplitudes showed speech-on-speech masking effects, along with a non-linear constructive interference at 0ST, and suppression effects at 4ST. Correlations showed that FFR F0 amplitudes failed to predict listeners' identification accuracy. In contrast, FFR F1 amplitudes were associated with faster reaction times, although this correlation was limited to noise conditions. The limited number of brain-behavior associations suggests subcortical activity mainly reflects exogenous processing rather than perceptual correlates of concurrent speech perception. Collectively, our results demonstrate that FFRs reflect pre-attentive coding of concurrent auditory stimuli that only weakly predict the success of identifying concurrent speech.
当两个声音竞争时,听众可以使用音高(基频,F0)和音色(谐波)线索来分离和识别同时出现的语音。语音感知也会受到信噪比(SNR)的阻碍。在听觉系统的早期、非注意阶段,同时出现的清晰和退化的语音是如何被表示的,目前还不太清楚。为此,我们在人类听众听到两个同时呈现的、稳态(时不变)元音时,测量了头皮记录的频率跟随反应(FFR),这两个元音的 F0 相差零或四个半音(ST),以同调方式呈现在干净(无噪声)或噪声退化(+5dB SNR)条件下。听众还执行了一个快速双元音识别任务,要求他们正确识别两个元音。行为结果表明,语音识别准确性随着元音之间的 F0 差异增加而提高,并且这种感知 F0 优势在干净条件下(与噪声退化相比,+5dB SNR)更大。神经生理数据表明,单音的 FFR F0 幅度比双音更稳健,而噪声中的响应则相当弱。F0 幅度显示出语音对语音的掩蔽效应,以及在 0ST 处的非线性建设性干扰和在 4ST 处的抑制效应。相关性表明,FFR F0 幅度无法预测听众的识别准确性。相比之下,FFR F1 幅度与更快的反应时间相关,尽管这种相关性仅限于噪声条件。大脑行为关联的数量有限表明,皮质下活动主要反映了外源性处理,而不是同时出现的语音感知的感知相关物。总的来说,我们的结果表明,FFR 反映了同时出现的听觉刺激的非注意编码,这些编码仅能微弱地预测识别同时出现的语音的成功。