Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh.
Cogn Sci. 2012 Sep-Oct;36(7):1204-23. doi: 10.1111/j.1551-6709.2012.01246.x. Epub 2012 Apr 9.
Most everyday tasks involve multiple modalities, which raises the question of how the processing of these modalities is coordinated by the cognitive system. In this paper, we focus on the coordination of visual attention and linguistic processing during speaking. Previous research has shown that objects in a visual scene are fixated before they are mentioned, leading us to hypothesize that the scan pattern of a participant can be used to predict what he or she will say. We test this hypothesis using a data set of cued scene descriptions of photo-realistic scenes. We demonstrate that similar scan patterns are correlated with similar sentences, within and between visual scenes; and that this correlation holds for three phases of the language production process (target identification, sentence planning, and speaking). We also present a simple algorithm that uses scan patterns to accurately predict associated sentences by utilizing similarity-based retrieval.
大多数日常任务都涉及多种模态,这就提出了一个问题,即认知系统如何协调这些模态的处理。在本文中,我们专注于口语过程中视觉注意和语言处理的协调。先前的研究表明,在视觉场景中的物体被提及之前就已经被注视,这使我们假设参与者的扫描模式可以用来预测他或她会说什么。我们使用一组提示的照片般逼真场景的场景描述数据集来检验这个假设。我们证明,在视觉场景内部和之间,相似的扫描模式与相似的句子相关;并且这种相关性适用于语言产生过程的三个阶段(目标识别、句子规划和说话)。我们还提出了一种简单的算法,该算法通过利用基于相似性的检索,使用扫描模式来准确预测相关句子。