Masapollo Matthew, Polka Linda, Ménard Lucie
School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th Floor, Montreal, QC H3A 1G1, Canada; Centre for Research on Brain, Language, and Music, McGill University, 3640 de la Montagne, Montreal, Quebec H3G 2A8, Canada.
School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th Floor, Montreal, QC H3A 1G1, Canada; Centre for Research on Brain, Language, and Music, McGill University, 3640 de la Montagne, Montreal, Quebec H3G 2A8, Canada.
Cognition. 2017 Sep;166:358-370. doi: 10.1016/j.cognition.2017.06.001. Epub 2017 Jun 8.
Speech perceivers are universally biased toward "focal" vowels (i.e., vowels whose adjacent formants are close in frequency, which concentrates acoustic energy into a narrower spectral region). This bias is demonstrated in phonetic discrimination tasks as a directional asymmetry: a change from a relatively less to a relatively more focal vowel results in significantly better performance than a change in the reverse direction. We investigated whether the critical information for this directional effect is limited to the auditory modality, or whether visible articulatory information provided by the speaker's face also plays a role. Unimodal auditory and visual as well as bimodal (auditory-visual) vowel stimuli were created from video recordings of a speaker producing variants of /u/, differing in both their degree of focalization and visible lip rounding (i.e., lip compression and protrusion). In Experiment 1, we confirmed that subjects showed an asymmetry while discriminating the auditory vowel stimuli. We then found, in Experiment 2, a similar asymmetry when subjects lip-read those same vowels. In Experiment 3, we found asymmetries, comparable to those found for unimodal vowels, for bimodal vowels when the audio and visual channels were phonetically-congruent. In contrast, when the audio and visual channels were phonetically-incongruent (as in the "McGurk effect"), this asymmetry was disrupted. These findings collectively suggest that the perceptual processes underlying the "focal" vowel bias are sensitive to articulatory information available across sensory modalities, and raise foundational issues concerning the extent to which vowel perception derives from general-auditory or speech-gesture-specific processes.
语音感知者普遍对“焦点”元音存在偏好(即相邻共振峰频率相近的元音,这种元音将声能集中在较窄的频谱区域)。在语音辨别任务中,这种偏好表现为一种方向性不对称:从相对不太聚焦的元音变为相对更聚焦的元音,辨别表现显著优于相反方向的变化。我们研究了这种方向性效应的关键信息是否仅限于听觉模态,或者说话者面部提供的可见发音信息是否也起作用。我们从一位说话者发出的/u/变体的视频记录中创建了单模态听觉和视觉以及双模态(听觉 - 视觉)元音刺激,这些变体在聚焦程度和可见唇形圆度(即唇部收缩和突出)方面都有所不同。在实验1中,我们证实受试者在辨别听觉元音刺激时表现出不对称性。然后在实验2中,我们发现当受试者唇读相同元音时也存在类似的不对称性。在实验3中,我们发现当音频和视频通道在语音上一致时,双模态元音的不对称性与单模态元音的情况相当。相比之下,当音频和视频通道在语音上不一致时(如在“麦格克效应”中),这种不对称性就会被打破。这些发现共同表明,“焦点”元音偏好背后的感知过程对跨感官模态的发音信息敏感,并引发了关于元音感知在多大程度上源自一般听觉或特定语音手势过程的基础问题。