de Almeida Roberto G, Di Nardo Julia, Antal Caitlyn, von Grünau Michael W
Department of Psychology, Concordia University, Montreal, QC, Canada.
Department of Linguistics, Yale University, New Haven, CT, United States.
Front Psychol. 2019 Oct 10;10:2162. doi: 10.3389/fpsyg.2019.02162. eCollection 2019.
As Macnamara (1978) once asked, how can we talk about what we see? We report on a study manipulating realistic dynamic scenes and sentences aiming to understand the interaction between linguistic and visual representations in real-world situations. Specifically, we monitored participants' eye movements as they watched video clips of everyday scenes while listening to sentences describing these scenes. We manipulated two main variables. The first was the semantic class of the verb in the sentence and the second was the action/motion of the agent in the unfolding event. The sentences employed two verb classes-causatives (e.g., ) and perception/psychological (e.g., )-which impose different constraints on the nouns that serve as their grammatical complements. The scenes depicted events in which agents either moved toward a target object (always the referent of the verb-complement noun), away from it, or remained neutral performing a given activity (such as cooking). Scenes and sentences were synchronized such that the verb onset corresponded to the first video frame of the agent motion toward or away from the object. Results show effects of agent motion but weak verb-semantic restrictions: causatives draw more attention to potential referents of their grammatical complements than perception verbs only when the agent moves toward the target object. Crucially, we found no anticipatory verb-driven eye movements toward the target object, contrary to studies using non-naturalistic and static scenes. We propose a model in which linguistic and visual computations in real-world situations occur largely independent of each other during the early moments of perceptual input, but rapidly interact at a central, conceptual system using a common, propositional code. Implications for language use in real world contexts are discussed.
正如麦克纳马拉(1978)曾经问过的那样,我们如何谈论我们所看到的东西?我们报告了一项研究,该研究操纵现实动态场景和句子,旨在了解现实世界情境中语言和视觉表征之间的相互作用。具体而言,我们监测了参与者在观看日常场景视频片段并同时听描述这些场景的句子时的眼动情况。我们操纵了两个主要变量。第一个是句子中动词的语义类别,第二个是事件展开过程中施事者的动作/运动。句子使用了两类动词——使役动词(例如……)和感知/心理动词(例如……)——这两类动词对作为其语法补足语的名词施加了不同的限制。场景描绘了施事者朝着目标物体移动(目标物体始终是动词补足语名词的所指对象)、远离目标物体或保持中立进行特定活动(如烹饪)的事件。场景和句子是同步的,这样动词开始出现时对应于施事者朝着或远离物体移动的第一个视频帧。结果显示了施事者动作的影响,但动词语义限制较弱:只有当施事者朝着目标物体移动时,使役动词比感知动词更能吸引对其语法补足语潜在所指对象的注意力。至关重要的是,与使用非自然和静态场景的研究相反,我们没有发现动词驱动的朝向目标物体的预期眼动。我们提出了一个模型,在该模型中,现实世界情境中的语言和视觉计算在感知输入的早期阶段很大程度上相互独立进行,但在一个中央概念系统中使用共同的命题代码迅速相互作用。我们还讨论了该研究对现实世界语境中语言使用的启示。