Yellamsetty Anusha, Bidelman Gavin M
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA.
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; Univeristy of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
Hear Res. 2018 Apr;361:92-102. doi: 10.1016/j.heares.2018.01.006. Epub 2018 Feb 2.
Parsing simultaneous speech requires listeners use pitch-guided segregation which can be affected by the signal-to-noise ratio (SNR) in the auditory scene. The interaction of these two cues may occur at multiple levels within the cortex. The aims of the current study were to assess the correspondence between oscillatory brain rhythms and determine how listeners exploit pitch and SNR cues to successfully segregate concurrent speech. We recorded electrical brain activity while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero or four semitones (STs) presented in either clean or noise-degraded (+5 dB SNR) conditions. We found that behavioral identification was more accurate for vowel mixtures with larger pitch separations but F0 benefit interacted with noise. Time-frequency analysis decomposed the EEG into different spectrotemporal frequency bands. Low-frequency (θ, β) responses were elevated when speech did not contain pitch cues (0ST > 4ST) or was noisy, suggesting a correlate of increased listening effort and/or memory demands. Contrastively, γ power increments were observed for changes in both pitch (0ST > 4ST) and SNR (clean > noise), suggesting high-frequency bands carry information related to acoustic features and the quality of speech representations. Brain-behavior associations corroborated these effects; modulations in low-frequency rhythms predicted the speed of listeners' perceptual decisions with higher bands predicting identification accuracy. Results are consistent with the notion that neural oscillations reflect both automatic (pre-perceptual) and controlled (post-perceptual) mechanisms of speech processing that are largely divisible into high- and low-frequency bands of human brain rhythms.
解析同时出现的语音需要听众使用基于音高的分离方法,而这可能会受到听觉场景中信噪比(SNR)的影响。这两种线索的相互作用可能发生在大脑皮层的多个层面。本研究的目的是评估振荡性脑节律之间的对应关系,并确定听众如何利用音高和信噪比线索来成功分离同时出现的语音。我们记录了参与者在听到双元音刺激时的脑电活动,这些双元音刺激的基频(F0)相差零或四个半音(ST),呈现于纯净或噪声退化(+5 dB SNR)条件下。我们发现,对于音高间隔较大的元音混合体,行为识别更为准确,但F0优势与噪声存在相互作用。时频分析将脑电图分解为不同的频谱时间频段。当语音不包含音高线索(0ST > 4ST)或存在噪声时,低频(θ、β)反应增强,这表明与听力努力增加和/或记忆需求相关。相比之下,在音高(0ST > 4ST)和信噪比(纯净 > 噪声)变化时均观察到γ功率增加,这表明高频频段携带与声学特征和语音表征质量相关的信息。脑-行为关联证实了这些效应;低频节律的调制预测了听众感知决策的速度,而高频节律则预测了识别准确性。结果与以下观点一致,即神经振荡反映了语音处理的自动(知觉前)和受控(知觉后)机制,这些机制在很大程度上可分为人类脑节律的高频和低频频段。