Suppr超能文献

言语的多模态感知组织:来自口语话语声调类似物的证据。

Multimodal perceptual organization of speech: Evidence from tone analogs of spoken utterances.

作者信息

Remez Robert E, Fellowes Jennifer M, Pisoni David B, Goh Winston D, Rubin Philip E

机构信息

Department of Psychology, Barnard College, 3009 Broadway, New York, NY 10027-6598, USA.

出版信息

Speech Commun. 1998 Oct 1;26(1):65-73. doi: 10.1016/S0167-6393(98)00050-8.

Abstract

Theoretical and practical motives alike have prompted recent investigations of multimodal speech perception. Theoretically, multimodal studies have extended the conceptualization of perceptual organization beyond the familiar modality-bound accounts deriving from Gestalt psychology. Practically, such investigations have been driven by a need to understand the proficiency of multimodal speech perception using an electrocochlear prosthesis for hearing. In each domain, studies have shown that perceptual organization of speech can occur even when the perceiver's auditory experience departs from natural speech qualities. Accordingly, our research examined auditor-visual multimodal integration of videotaped faces and selected acoustic constituents of speech signals, each realized as a single sinewave tone accompanying a video image of an articulating face. The single tone reproduced the frequency and amplitude of the phonatory cycle or of one of the lower three oral formants. Our results showed a distinct advantage for the condition pairing the video image of the face with a sinewave replicating the second formant, despite its unnatural timbre and its presentation in acoustic isolation from the rest of the speech signal. Perceptual coherence of multimodal speech in these circumstances is established when the two modalities concurrently specify the same underlying phonetic attributes.

摘要

理论动机和实际动机都促使了近期对多模态语音感知的研究。从理论上讲,多模态研究将感知组织的概念化扩展到了超出源自格式塔心理学的常见模态限制的解释。实际上,此类研究是由理解使用人工耳蜗进行听力的多模态语音感知能力的需求所推动的。在每个领域,研究都表明,即使感知者的听觉体验偏离自然语音特征,语音的感知组织也可能发生。因此,我们的研究考察了录像面部与语音信号选定声学成分的视听多模态整合,每种成分都被实现为伴随一个发音面部视频图像的单个正弦波音调。该单音调再现了发声周期或下三个口腔共振峰之一的频率和振幅。我们的结果表明,将面部视频图像与复制第二共振峰的正弦波配对的条件具有明显优势,尽管其音色不自然且与语音信号的其余部分在声学上隔离呈现。当两种模态同时指定相同的潜在语音属性时,在这些情况下多模态语音的感知连贯性就得以确立。

相似文献

1
Multimodal perceptual organization of speech: Evidence from tone analogs of spoken utterances.
Speech Commun. 1998 Oct 1;26(1):65-73. doi: 10.1016/S0167-6393(98)00050-8.
2
Coding of the speech spectrum in three time-varying sinusoids.
Ann N Y Acad Sci. 1983;405:485-9. doi: 10.1111/j.1749-6632.1983.tb31663.x.
3
Audio-visual perception of sinewave speech in an adult cochlear implant user: a case study.
Ear Hear. 2001 Oct;22(5):412-9. doi: 10.1097/00003446-200110000-00005.
4
Acoustic characteristics of fricatives, amplitude of formants and clarity of speech produced without and with a medical mask.
Int J Lang Commun Disord. 2022 Mar;57(2):366-380. doi: 10.1111/1460-6984.12705. Epub 2022 Feb 15.
5
Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.
J Assoc Res Otolaryngol. 2012 Apr;13(2):269-280. doi: 10.1007/s10162-011-0307-y. Epub 2011 Dec 13.
6
On the perceptual organization of speech.
Psychol Rev. 1994 Jan;101(1):129-156. doi: 10.1037/0033-295X.101.1.129.
7
Information for coarticulation: Static signal properties or formant dynamics?
J Exp Psychol Hum Percept Perform. 2014 Jun;40(3):1228-36. doi: 10.1037/a0036214. Epub 2014 Apr 14.
8
Short-term reorganization of auditory analysis induced by phonetic experience.
J Cogn Neurosci. 2003 May 15;15(4):549-58. doi: 10.1162/089892903321662930.
9
Learning to recognize talkers from natural, sinewave, and reversed speech samples.
J Exp Psychol Hum Percept Perform. 2002 Dec;28(6):1447-69.
10
Perceiving the sex and identity of a talker without natural vocal timbre.
Percept Psychophys. 1997 Aug;59(6):839-49. doi: 10.3758/bf03205502.

引用本文的文献

1
Congruent aero-tactile stimuli bias perception of voicing continua.
Front Hum Neurosci. 2022 Jul 15;16:879981. doi: 10.3389/fnhum.2022.879981. eCollection 2022.
2
Influences of selective adaptation on perception of audiovisual speech.
J Phon. 2016 May;56:75-84. doi: 10.1016/j.wocn.2016.02.004.
3
Visibility of speech articulation enhances auditory phonetic convergence.
Atten Percept Psychophys. 2016 Jan;78(1):317-33. doi: 10.3758/s13414-015-0982-6.
4
Speech through ears and eyes: interfacing the senses with the supramodal brain.
Front Psychol. 2013 Jul 12;4:388. doi: 10.3389/fpsyg.2013.00388. eCollection 2013.
5
Specification of cross-modal source information in isolated kinematic displays of speech.
J Acoust Soc Am. 2004 Jul;116(1):507-18. doi: 10.1121/1.1757454.
6
Talker and lexical effects on audiovisual word recognition by adults with cochlear implants.
J Speech Lang Hear Res. 2003 Apr;46(2):390-404. doi: 10.1044/1092-4388(2003/032).
7
Audio-visual perception of sinewave speech in an adult cochlear implant user: a case study.
Ear Hear. 2001 Oct;22(5):412-9. doi: 10.1097/00003446-200110000-00005.

本文引用的文献

1
Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics.
Speech Commun. 1996 Dec 1;20(3):255-272. doi: 10.1016/S0167-6393(96)00063-5.
3
Temporal constraints on the McGurk effect.
Percept Psychophys. 1996 Apr;58(3):351-62. doi: 10.3758/bf03206811.
4
On the perceptual organization of speech.
Psychol Rev. 1994 Jan;101(1):129-156. doi: 10.1037/0033-295X.101.1.129.
5
Speech perception without traditional speech cues.
Science. 1981 May 22;212(4497):947-9. doi: 10.1126/science.7233191.
6
Voice pitch as an aid to lipreading.
Nature. 1981 May 14;291(5811):150-2. doi: 10.1038/291150a0.
8
On the perception of intonation from sinusoidal sentences.
Percept Psychophys. 1984 May;35(5):429-40. doi: 10.3758/bf03203919.
9
On the role of visual rate information in phonetic perception.
Percept Psychophys. 1985 Sep;38(3):269-76. doi: 10.3758/bf03207154.
10
Speechreading supplemented with formant-frequency information from voiced speech.
J Acoust Soc Am. 1985 Jan;77(1):314-7. doi: 10.1121/1.392230.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验