Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907
Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.
J Neurosci. 2022 Jan 12;42(2):240-254. doi: 10.1523/JNEUROSCI.1610-21.2021. Epub 2021 Nov 11.
Temporal coherence of sound fluctuations across spectral channels is thought to aid auditory grouping and scene segregation. Although prior studies on the neural bases of temporal-coherence processing focused mostly on cortical contributions, neurophysiological evidence suggests that temporal-coherence-based scene analysis may start as early as the cochlear nucleus (i.e., the first auditory region supporting cross-channel processing over a wide frequency range). Accordingly, we hypothesized that aspects of temporal-coherence processing that could be realized in early auditory areas may shape speech understanding in noise. We then explored whether physiologically plausible computational models could account for results from a behavioral experiment that measured consonant categorization in different masking conditions. We tested whether within-channel masking of target-speech modulations predicted consonant confusions across the different conditions and whether predictions were improved by adding across-channel temporal-coherence processing mirroring the computations known to exist in the cochlear nucleus. Consonant confusions provide a rich characterization of error patterns in speech categorization, and are thus crucial for rigorously testing models of speech perception; however, to the best of our knowledge, they have not been used in prior studies of scene analysis. We find that within-channel modulation masking can reasonably account for category confusions, but that it fails when temporal fine structure cues are unavailable. However, the addition of across-channel temporal-coherence processing significantly improves confusion predictions across all tested conditions. Our results suggest that temporal-coherence processing strongly shapes speech understanding in noise and that physiological computations that exist early along the auditory pathway may contribute to this process. Temporal coherence of sound fluctuations across distinct frequency channels is thought to be important for auditory scene analysis. Prior studies on the neural bases of temporal-coherence processing focused mostly on cortical contributions, and it was unknown whether speech understanding in noise may be shaped by across-channel processing that exists in earlier auditory areas. Using physiologically plausible computational modeling to predict consonant confusions across different listening conditions, we find that across-channel temporal coherence contributes significantly to scene analysis and speech perception and that such processing may arise in the auditory pathway as early as the brainstem. By virtue of providing a richer characterization of error patterns not obtainable with just intelligibility scores, consonant confusions yield unique insight into scene analysis mechanisms.
不同频率通道之间声音波动的时间相干性被认为有助于听觉分组和场景分离。尽管先前关于时间相干性处理的神经基础的研究主要集中在皮质贡献上,但神经生理学证据表明,基于时间相干性的场景分析可能早在耳蜗核(即支持宽频带跨通道处理的第一个听觉区域)就开始了。因此,我们假设可以在早期听觉区域实现的时间相干性处理的各个方面可能会影响噪声中的语音理解。然后,我们探讨了是否可以用生理上合理的计算模型来解释行为实验的结果,该实验测量了不同掩蔽条件下的辅音分类。我们测试了目标语音调制的通道内掩蔽是否可以预测不同条件下的辅音混淆,以及通过添加反映耳蜗核中已知计算的跨通道时间相干性处理是否可以改善预测。辅音混淆为语音分类中的错误模式提供了丰富的描述,因此对于严格测试语音感知模型至关重要;然而,据我们所知,它们在以前的场景分析研究中尚未被使用。我们发现,通道内调制掩蔽可以合理地解释类别混淆,但在没有时间精细结构线索的情况下会失败。然而,添加跨通道时间相干性处理可以显著改善所有测试条件下的混淆预测。我们的结果表明,时间相干性处理强烈影响噪声中的语音理解,并且听觉通路早期存在的生理计算可能有助于该过程。不同频率通道之间声音波动的时间相干性被认为对听觉场景分析很重要。先前关于时间相干性处理的神经基础的研究主要集中在皮质贡献上,并且不知道噪声中的语音理解是否可能受到早期听觉区域中存在的跨通道处理的影响。使用生理上合理的计算模型来预测不同听力条件下的辅音混淆,我们发现跨通道时间相干性对场景分析和语音感知有重要贡献,并且这种处理可能早在脑干就出现在听觉通路中。由于提供了比可理解度分数更丰富的错误模式描述,辅音混淆为场景分析机制提供了独特的见解。