在会话音节率下说出的连续音节中的辅音感知。

Consonant Perception in Connected Syllables Spoken at a Conversational Syllabic Rate.

机构信息

Audiology and Speech Pathology Center, 8395Walter Reed National Military Medical Center, Bethesda, MD, USA.

出版信息

Trends Hear. 2023 Jan-Dec;27:23312165231156673. doi: 10.1177/23312165231156673.

DOI:10.1177/23312165231156673

PMID:36794551

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9936395/

Abstract

Closed-set consonant identification, measured using nonsense syllables, has been commonly used to investigate the encoding of speech cues in the human auditory system. Such tasks also evaluate the robustness of speech cues to masking from background noise and their impact on auditory-visual speech integration. However, extending the results of these studies to everyday speech communication has been a major challenge due to acoustic, phonological, lexical, contextual, and visual speech cue differences between consonants in isolated syllables and in conversational speech. In an attempt to isolate and address some of these differences, recognition of consonants spoken in multisyllabic nonsense phrases (e.g., aBaSHaGa spoken as /ɑbɑɡɑ/) produced at an approximately conversational syllabic rate was measured and compared with consonant recognition using Vowel-Consonant-Vowel bisyllables spoken in isolation. After accounting for differences in stimulus audibility using the Speech Intelligibility Index, consonants spoken in sequence at a conversational syllabic rate were found to be more difficult to recognize than those produced in isolated bisyllables. Specifically, place- and manner-of-articulation information was transmitted better in isolated nonsense syllables than for multisyllabic phrases. The contribution of visual speech cues to place-of-articulation information was also lower for consonants spoken in sequence at a conversational syllabic rate. These data imply that auditory-visual benefit based on models of feature complementarity from isolated syllable productions may over-estimate real-world benefit of integrating auditory and visual speech cues.

摘要

采用无意义音节进行的闭音节辅音识别已广泛用于研究人类听觉系统对语音线索的编码。这些任务还评估了语音线索对背景噪声掩蔽的稳健性及其对视听语音整合的影响。然而，由于孤立音节和会话语音中辅音之间的声学、语音学、词汇、语境和视觉语音线索差异，将这些研究结果扩展到日常言语交流一直是一个主要挑战。为了分离和解决其中的一些差异，测量并比较了在近似会话音节率下以多音节无意义短语（例如，以 /ɑbɑɡɑ/ 发音的 aBaSHaGa）发音的辅音识别，以及在孤立音节中以元音-辅音-元音双音节发音的辅音识别。在用语音可懂度指数（Speech Intelligibility Index）对刺激可懂度进行差异校正后，发现以会话音节率连续发音的辅音比孤立双音节发音的辅音更难识别。具体来说，在孤立的无意义音节中，发音位置和发音方式的信息传递比多音节短语更好。对于以会话音节率连续发音的辅音，视觉语音线索对发音位置信息的贡献也较低。这些数据表明，基于孤立音节产生的特征互补模型的视听增益可能高估了整合听觉和视觉语音线索的实际增益。