Elsner Micha, Clarke Alasdair, Rohde Hannah
Department of Linguistics, The Ohio State University.
Department of Psychology, University of Essex.
Cogn Sci. 2018 Jun;42 Suppl 4:940-973. doi: 10.1111/cogs.12507. Epub 2017 Jun 26.
Speakers' perception of a visual scene influences the language they use to describe it-which objects they choose to mention and how they characterize the relationships between them. We show that visual complexity can either delay or facilitate description generation, depending on how much disambiguating information is required and how useful the scene's complexity can be in providing, for example, helpful landmarks. To do so, we measure speech onset times, eye gaze, and utterance content in a reference production experiment in which the target object is either unique or non-unique in a visual scene of varying size and complexity. Speakers delay speech onset if the target object is non-unique and requires disambiguation, and we argue that this reflects the cost of deciding on a high-level strategy for describing it. The eye-tracking data demonstrate that these delays increase when speakers are able to conduct an extensive early visual search, implying that when speakers scan too little of the scene early on, they may decide to begin speaking before becoming aware that their description is underspecified. Speakers' content choices reflect the visual makeup of the scene-the number of distractors present and the availability of useful landmarks. Our results highlight the complex role of visual perception in reference production, showing that speakers can make good use of complexity in ways that reflect their visual processing of the scene.
说话者对视觉场景的感知会影响他们用于描述该场景的语言——他们选择提及哪些物体以及如何描述这些物体之间的关系。我们发现,视觉复杂性既可能延迟也可能促进描述的生成,这取决于需要多少消除歧义的信息,以及场景的复杂性在提供例如有用地标方面有多有用。为此,我们在一个参考生成实验中测量了言语起始时间、目光注视和话语内容,在该实验中,目标物体在不同大小和复杂程度的视觉场景中要么是唯一的,要么不是唯一的。如果目标物体不是唯一的且需要消除歧义,说话者会延迟言语起始,我们认为这反映了确定描述它的高级策略的成本。眼动追踪数据表明,当说话者能够进行广泛的早期视觉搜索时,这些延迟会增加,这意味着当说话者早期对场景扫视太少时,他们可能会在意识到自己的描述不够详细之前就决定开始说话。说话者的内容选择反映了场景的视觉构成——存在的干扰物数量以及有用地标的可用性。我们的结果凸显了视觉感知在参考生成中的复杂作用,表明说话者能够以反映他们对场景视觉处理的方式充分利用复杂性。