Suppr超能文献

言语感知中的自动视听整合。

Automatic audiovisual integration in speech perception.

作者信息

Gentilucci Maurizio, Cattaneo Luigi

机构信息

Dipartimento di Neuroscienze, Universitá di Parma, Via Volturno 39, 43100, Parma, Italy.

出版信息

Exp Brain Res. 2005 Nov;167(1):66-75. doi: 10.1007/s00221-005-0008-z. Epub 2005 Oct 29.

Abstract

Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants' spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants' voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both the visual and acoustical inputs always contribute to the representation of a string of phonemes and that cross-modal integration occurs by extracting mouth articulation features peculiar for the pronunciation of that string of phonemes.

摘要

两项实验旨在确定视觉和听觉输入的特征是否总是融合到语音的感知表征中,以及这种视听整合是基于跨模态绑定功能还是基于模仿。在麦格克范式中,要求观察者大声重复演员说出的一串音素(音素串的听觉呈现),而演员的嘴则模仿不同音素串的发音(视觉呈现)。在一个对照实验中,参与者阅读相同的印刷字母串。该条件旨在分析语音模式和控制模仿的唇运动学。在对照实验和一致的视听呈现中,即当发音的口部手势与音素串的发出一致时,语音频谱和唇运动学根据发音的音素串而变化。在麦格克范式中,参与者没有意识到视觉和听觉刺激之间的不一致。对参与者口语反应的声学分析显示出三种不同的模式:两种刺激的融合(麦格克效应)、重复听觉呈现的音素串,以及较少出现的与演员模仿的口部手势对应的音素串。然而,对后两种反应的分析表明,参与者语音频谱的第二共振峰总是与在一致的视听呈现中记录的值不同。它接近在另一种模态中呈现的音素串的第二共振峰的值,而这个值显然被忽略了。重复听觉呈现的音素串的参与者的唇运动学受到对演员模仿的唇运动观察的影响,但仅在发唇音辅音时。讨论这些数据支持这样的假设,即视觉和听觉输入的特征总是有助于音素串的表征,并且跨模态整合是通过提取该音素串发音特有的口部发音特征来实现的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验