Department of Neurology and Neurosurgery, Brain Center, University Medical Center Utrecht, Utrecht, The Netherlands.
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
PLoS Comput Biol. 2020 Jul 2;16(7):e1007992. doi: 10.1371/journal.pcbi.1007992. eCollection 2020 Jul.
Understanding how the human brain processes auditory input remains a challenge. Traditionally, a distinction between lower- and higher-level sound features is made, but their definition depends on a specific theoretical framework and might not match the neural representation of sound. Here, we postulate that constructing a data-driven neural model of auditory perception, with a minimum of theoretical assumptions about the relevant sound features, could provide an alternative approach and possibly a better match to the neural responses. We collected electrocorticography recordings from six patients who watched a long-duration feature film. The raw movie soundtrack was used to train an artificial neural network model to predict the associated neural responses. The model achieved high prediction accuracy and generalized well to a second dataset, where new participants watched a different film. The extracted bottom-up features captured acoustic properties that were specific to the type of sound and were associated with various response latency profiles and distinct cortical distributions. Specifically, several features encoded speech-related acoustic properties with some features exhibiting shorter latency profiles (associated with responses in posterior perisylvian cortex) and others exhibiting longer latency profiles (associated with responses in anterior perisylvian cortex). Our results support and extend the current view on speech perception by demonstrating the presence of temporal hierarchies in the perisylvian cortex and involvement of cortical sites outside of this region during audiovisual speech perception.
理解人类大脑如何处理听觉输入仍然是一个挑战。传统上,人们对低层次和高层次的声音特征进行区分,但它们的定义取决于特定的理论框架,可能与声音的神经表示不匹配。在这里,我们假设构建一个数据驱动的听觉感知神经模型,对相关声音特征的理论假设最少,可能提供一种替代方法,并可能更好地与神经反应相匹配。我们从六名观看长时间特征电影的患者中收集了脑电记录。原始电影配乐被用来训练一个人工神经网络模型,以预测相关的神经反应。该模型实现了很高的预测准确性,并很好地推广到第二个数据集,其中新的参与者观看了不同的电影。提取的自下而上的特征捕获了与声音类型特定的声学特性,并与各种反应潜伏期分布和不同的皮质分布相关。具体来说,一些特征编码与语音相关的声学特性,其中一些特征具有较短的潜伏期分布(与后颞叶皮层的反应相关),而另一些特征具有较长的潜伏期分布(与前颞叶皮层的反应相关)。我们的结果通过证明颞叶皮层中存在时间层次结构以及在视听语音感知期间涉及该区域以外的皮质部位,支持并扩展了当前对语音感知的观点。