Computational Linguistics, Saarland University, Germany Center for Research in Language, University of California, San Diego.
Cogn Sci. 2009 May;33(3):449-96. doi: 10.1111/j.1551-6709.2009.01019.x.
Evidence from numerous studies using the visual world paradigm has revealed both that spoken language can rapidly guide attention in a related visual scene and that scene information can immediately influence comprehension processes. These findings motivated the coordinated interplay account (Knoeferle & Crocker, 2006) of situated comprehension, which claims that utterance-mediated attention crucially underlies this closely coordinated interaction of language and scene processing. We present a recurrent sigma-pi neural network that models the rapid use of scene information, exploiting an utterance-mediated attentional mechanism that directly instantiates the CIA. The model is shown to achieve high levels of performance (both with and without scene contexts), while also exhibiting hallmark behaviors of situated comprehension, such as incremental processing, anticipation of appropriate role fillers, as well as the immediate use, and priority, of depicted event information through the coordinated use of utterance-mediated attention to the scene.
大量使用视觉世界范式的研究证据表明,口语可以快速引导相关视觉场景中的注意力,并且场景信息可以立即影响理解过程。这些发现促使协调互动假说(Knoeferle 和 Crocker,2006)解释情境理解,该假说声称,话语介导的注意力是语言和场景处理这种紧密协调互动的关键。我们提出了一个递归西格玛-派神经网络模型,用于模拟场景信息的快速利用,利用一种话语介导的注意力机制,该机制直接实例化 CIA。该模型表现出很高的性能水平(有和没有场景上下文),同时也表现出情境理解的标志性行为,例如增量处理、对适当角色填充词的预测,以及通过话语介导的对场景的注意力的协调使用,立即利用和优先处理所描绘的事件信息。