Kvasova Daria, Coll Llucia, Stewart Travis, Soto-Faraco Salvador
Center for Brain and Cognition, Department of Communication and Information Technologies, Universitat Pompeu Fabra, Carrer de Ramón Trias i Fargas 25-27, Barcelona, 08005, Spain.
Multiple Sclerosis Centre of Catalonia (Cemcat), Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona, Spain.
Psychol Res. 2024 Oct;88(7):2138-2148. doi: 10.1007/s00426-024-02018-8. Epub 2024 Aug 6.
In real-world scenes, the different objects and events are often interconnected within a rich web of semantic relationships. These semantic links help parse information efficiently and make sense of the sensory environment. It has been shown that, during goal-directed search, hearing the characteristic sound of an everyday life object helps finding the affiliate objects in artificial visual search arrays as well as in naturalistic, real-life videoclips. However, whether crossmodal semantic congruence also triggers orienting during spontaneous, not goal-directed observation is unknown. Here, we investigated this question addressing whether crossmodal semantic congruence can attract spontaneous, overt visual attention when viewing naturalistic, dynamic scenes. We used eye-tracking whilst participants (N = 45) watched video clips presented alongside sounds of varying semantic relatedness with objects present within the scene. We found that characteristic sounds increased the probability of looking at, the number of fixations to, and the total dwell time on semantically corresponding visual objects, in comparison to when the same scenes were presented with semantically neutral sounds or just with background noise only. Interestingly, hearing object sounds not met with an object in the scene led to increased visual exploration. These results suggest that crossmodal semantic information has an impact on spontaneous gaze on realistic scenes, and therefore on how information is sampled. Our findings extend beyond known effects of object-based crossmodal interactions with simple stimuli arrays and shed new light on the role that audio-visual semantic relationships out in the perception of everyday life scenarios.
在现实世界场景中,不同的物体和事件常常在丰富的语义关系网络中相互关联。这些语义联系有助于高效解析信息并理解感官环境。研究表明,在目标导向搜索过程中,听到日常生活物体的特征声音有助于在人工视觉搜索阵列以及自然主义的现实生活视频片段中找到相关物体。然而,跨模态语义一致性在自发的而非目标导向的观察过程中是否也会引发定向反应尚不清楚。在此,我们研究了这个问题,即观看自然主义的动态场景时,跨模态语义一致性是否能够吸引自发的、明显的视觉注意力。我们采用眼动追踪技术,让参与者(N = 45)观看与场景中出现的物体具有不同语义相关性的声音同时呈现的视频片段。我们发现,与相同场景仅伴有语义中性声音或仅有背景噪音呈现时相比,特征声音增加了注视语义对应视觉物体的概率、注视次数以及总停留时间。有趣的是,听到场景中未出现物体的声音会导致视觉探索增加。这些结果表明,跨模态语义信息会对现实场景中的自发注视产生影响,进而影响信息的采样方式。我们的研究结果超越了基于物体的跨模态与简单刺激阵列相互作用的已知效应,为视听语义关系在日常生活场景感知中的作用提供了新的见解。