Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
Q J Exp Psychol (Hove). 2020 Oct;73(10):1523-1536. doi: 10.1177/1747021820914564. Epub 2020 Apr 20.
Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two "Go Fish"-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants' target categorisation responses. These findings contribute to a better understanding of how what we see influences what we hear.
口语变化多样,因此听者会根据周围的声学环境(如前一句的语速)来理解语音。例如,在荷兰语中,处于短 /ɑ/ 和长 /a:/ 之间的元音,如果前一句语速较慢,则被感知为短 /ɑ/,但如果前一句语速较快,则被感知为长 /a:/。尽管视觉发音线索对言语理解的影响已得到充分证实,但目前尚不清楚言语速度的视觉线索是否也会影响后续的口语识别。在两项类似于“钓鱼”的实验中,参与者先听到纯音频(听觉语音+注视十字)、纯视频(无声的说话人头)和视听(语音+视频)语境句,然后呈现出中间包含短 /ɑ/ 和长 /a:/ 元音的模糊目标词。在实验 1 中,目标词始终以纯音频呈现,没有视觉发音线索。尽管音频和视听语境引起了语速效应(即快速语境后更倾向于长 /a:/ 反应),但纯视频语境没有引起该效应。在实验 2 中,当目标词以视听形式呈现时,所有三种条件(包括纯视频条件)都观察到了语速效应。这表明语境句中言语速度的视觉线索会影响对后续视觉目标线索(例如唇开口时长)的感知,而在视听整合阶段,这些线索会影响参与者对目标词的分类反应。这些发现有助于更好地理解我们所看到的如何影响我们所听到的。