Remez Robert E, Fellowes Jennifer M, Pisoni David B, Goh Winston D, Rubin Philip E
Department of Psychology, Barnard College, 3009 Broadway, New York, NY 10027-6598, USA.
Speech Commun. 1998 Oct 1;26(1):65-73. doi: 10.1016/S0167-6393(98)00050-8.
Theoretical and practical motives alike have prompted recent investigations of multimodal speech perception. Theoretically, multimodal studies have extended the conceptualization of perceptual organization beyond the familiar modality-bound accounts deriving from Gestalt psychology. Practically, such investigations have been driven by a need to understand the proficiency of multimodal speech perception using an electrocochlear prosthesis for hearing. In each domain, studies have shown that perceptual organization of speech can occur even when the perceiver's auditory experience departs from natural speech qualities. Accordingly, our research examined auditor-visual multimodal integration of videotaped faces and selected acoustic constituents of speech signals, each realized as a single sinewave tone accompanying a video image of an articulating face. The single tone reproduced the frequency and amplitude of the phonatory cycle or of one of the lower three oral formants. Our results showed a distinct advantage for the condition pairing the video image of the face with a sinewave replicating the second formant, despite its unnatural timbre and its presentation in acoustic isolation from the rest of the speech signal. Perceptual coherence of multimodal speech in these circumstances is established when the two modalities concurrently specify the same underlying phonetic attributes.
理论动机和实际动机都促使了近期对多模态语音感知的研究。从理论上讲,多模态研究将感知组织的概念化扩展到了超出源自格式塔心理学的常见模态限制的解释。实际上,此类研究是由理解使用人工耳蜗进行听力的多模态语音感知能力的需求所推动的。在每个领域,研究都表明,即使感知者的听觉体验偏离自然语音特征,语音的感知组织也可能发生。因此,我们的研究考察了录像面部与语音信号选定声学成分的视听多模态整合,每种成分都被实现为伴随一个发音面部视频图像的单个正弦波音调。该单音调再现了发声周期或下三个口腔共振峰之一的频率和振幅。我们的结果表明,将面部视频图像与复制第二共振峰的正弦波配对的条件具有明显优势,尽管其音色不自然且与语音信号的其余部分在声学上隔离呈现。当两种模态同时指定相同的潜在语音属性时,在这些情况下多模态语音的感知连贯性就得以确立。