Bujok Ronny, Meyer Antje S, Bosker Hans Rutger
Max Planck Institute for Psycholinguistics, The Netherlands.
International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, The Netherlands.
Lang Speech. 2025 Mar;68(1):181-203. doi: 10.1177/00238309241258162. Epub 2024 Jun 14.
Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating or ). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.
人类交流本质上是多模态的。听觉语音以及视觉线索都可用于理解另一个说话者。大多数关于视听语音感知的研究都集中在语音片段(即语音)的感知上。然而,对于视觉信息对诸如词汇重音等语音超音段方面感知的影响,我们了解得较少。在两项实验中,我们研究了不同视觉线索(如面部发音线索和节拍手势)对词汇重音视听感知的影响。我们呈现了双音节荷兰语重音对的听觉词汇重音连续体,并搭配说话者在第一个或第二个音节上产生重音的视频(例如,发音为 或 )。此外,我们将在任一音节上产生词汇重音的说话者面部与在第一个或第二个音节上做出节拍手势的有动作的身体进行了组合和完全交叉。结果表明,人们在静音视频中成功利用视觉发音线索来感知重音。然而,在视听条件下,我们未能发现视觉发音线索的影响。相比之下,我们发现节拍手势与语音的时间对齐有力地影响了参与者对词汇重音的感知。这些结果凸显了在多模态语境中考虑语言超音段方面的重要性。