Centre for Speech, Language and the Brain, Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, United Kingdom.
Centre for Speech, Language and the Brain, Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, United Kingdom
J Neurosci. 2019 Jan 16;39(3):519-527. doi: 10.1523/JNEUROSCI.3573-17.2018. Epub 2018 Nov 20.
Spoken word recognition in context is remarkably fast and accurate, with recognition times of ∼200 ms, typically well before the end of the word. The neurocomputational mechanisms underlying these contextual effects are still poorly understood. This study combines source-localized electroencephalographic and magnetoencephalographic (EMEG) measures of real-time brain activity with multivariate representational similarity analysis to determine directly the timing and computational content of the processes evoked as spoken words are heard in context, and to evaluate the respective roles of bottom-up and predictive processing mechanisms in the integration of sensory and contextual constraints. Male and female human participants heard simple (modifier-noun) English phrases that varied in the degree of semantic constraint that the modifier (W1) exerted on the noun (W2), as in pairs, such as "yellow banana." We used gating tasks to generate estimates of the probabilistic predictions generated by these constraints as well as measures of their interaction with the bottom-up perceptual input for W2. Representation similarity analysis models of these measures were tested against electroencephalographic and magnetoencephalographic brain data across a bilateral fronto-temporo-parietal language network. Consistent with probabilistic predictive processing accounts, we found early activation of semantic constraints in frontal cortex (LBA45) as W1 was heard. The effects of these constraints (at 100 ms after W2 onset in left middle temporal gyrus and at 140 ms in left Heschl's gyrus) were only detectable, however, after the initial phonemes of W2 had been heard. Within an overall predictive processing framework, bottom-up sensory inputs are still required to achieve early and robust spoken word recognition in context. Human listeners recognize spoken words in natural speech contexts with remarkable speed and accuracy, often identifying a word well before all of it has been heard. In this study, we investigate the brain systems that support this important capacity, using neuroimaging techniques that can track real-time brain activity during speech comprehension. This makes it possible to locate the brain areas that generate predictions about upcoming words and to show how these expectations are integrated with the evidence provided by the speech being heard. We use the timing and localization of these effects to provide the most specific account to date of how the brain achieves an optimal balance between prediction and sensory input in the interpretation of spoken language.
口语识别的速度非常快,准确率也很高,识别时间大约为 200 毫秒,通常在单词结束之前。然而,这些语境效应背后的神经计算机制仍知之甚少。本研究结合了实时脑活动的源定位脑电图和脑磁图(MEG)测量以及多元代表性相似性分析,以直接确定在听到语境中的口语单词时所引发的过程的时间和计算内容,并评估在整合感觉和语境约束方面,自上而下和预测处理机制的各自作用。男性和女性人类参与者听到了简单的(修饰语-名词)英语短语,这些短语在修饰语(W1)对名词(W2)施加的语义约束程度上有所不同,例如“黄色香蕉”。我们使用门控任务生成这些约束产生的概率预测的估计值,以及衡量它们与 W2 的下传感觉输入相互作用的指标。这些措施的代表性相似性分析模型在一个双侧额-颞-顶-颞语言网络中,通过脑电图和脑磁图数据进行了测试。与概率预测处理理论一致,我们发现,当 W1 被听到时,语义约束在额皮质(LBA45)中会被早期激活。然而,这些约束的影响(在 W2 开始后 100 毫秒出现在左中颞叶,在 140 毫秒出现在左 Heschl 回)只有在听到 W2 的初始音素之后才能检测到。在整体预测处理框架内,仍然需要下传感觉输入才能在语境中实现快速和强大的口语识别。人类听众以惊人的速度和准确性识别自然语言语境中的口语单词,通常在听到整个单词之前就已经识别出来了。在这项研究中,我们使用可以在言语理解过程中跟踪实时脑活动的神经影像学技术来研究支持这一重要能力的大脑系统。这使得我们能够定位产生关于即将到来的单词的预测的大脑区域,并展示这些期望如何与正在听到的语音提供的证据相整合。我们利用这些效应的时间和定位来提供迄今为止最具体的解释,说明大脑如何在口语语言的解释中实现预测和感觉输入之间的最佳平衡。