Bidelman Gavin M, Yellamsetty Anusha
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, 38152, USA; Institute for Intelligent Systems, University of Memphis, Memphis, TN, 38152, USA; Univeristy of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, 38163, USA.
School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, 38152, USA.
Hear Res. 2017 Aug;351:34-44. doi: 10.1016/j.heares.2017.05.008. Epub 2017 May 25.
Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms.
行为研究表明,听众利用语音基频(F0)的内在差异来分离同时出现的语音——即所谓的“F0优势”。环境中更有利的信噪比(SNR),这一外在声学因素,同样有助于同时出现的语音的解析。在此,我们研究了这两种线索在同时出现的语音混合体的感知分离中的神经生物学基础。我们记录了事件相关脑电位(ERP),同时让听众执行一项快速双元音识别任务。听众听到两个同时出现的元音,其F0相差零或四个半音,呈现于干净(无噪声)或噪声退化(+5 dB SNR)条件下。在行为上,对于更大的F0间隔,听众更准确地正确识别两个元音,但F0优势在更有利的SNR下更为明显(即音高×SNR交互作用)。对ERP的分析表明,只有P2波(约200毫秒)表现出与行为相似的F0×SNR交互作用,并且与听众的感知F0优势相关。应用于ERP的神经分类器进一步表明,语音在200毫秒内基于SNR在神经上被分离,而基于音高的分离发生在更晚的时间(400 - 700毫秒)。与基于内在F0的分离相比,外在SNR的更早时间表明,从噪声中提取语音的皮层效率高于仅基于音高线索区分语音,后者可能需要额外的皮层过程。研究结果表明,噪声和音高差异在大脑皮层中相对较早地相互作用,并且大脑早在约200毫秒时就能确定同时出现的语音混合体的身份。