Amano-Kusumoto Akiko, Hosom John-Paul, Kain Alexander, Aronoff Justin M
Department of Human Communication Science Devices, House Research Institute 2100 West Third Street, Los Angeles, California 90057.
Center for Spoken Language Understanding (CSLU), Oregon Health & Science University (OHSU) 20000 NW Walker Road, Beaverton, Oregon 97006.
Speech Commun. 2014 Apr 1;59:1-9. doi: 10.1016/j.specom.2013.12.001.
Previous studies have shown that "clear" speech, where the speaker intentionally tries to enunciate, has better intelligibility than "conversational" speech, which is produced in regular conversation. However, conversational and clear speech vary along a number of acoustic dimensions and it is unclear what aspects of clear speech lead to better intelligibility. Previously, Kain et al. [J. Acoust. Soc. Am. (4), 2308-2319 (2008)] showed that a combination of short-term spectra and duration was responsible for the improved intelligibility of one speaker. This study investigates subsets of specific features of short-term spectra including temporal aspects. Similar to Kain's study, hybrid stimuli were synthesized with a combination of features from clear speech and complementary features from conversational speech to determine which acoustic features cause the improved intelligibility of clear speech. Our results indicate that, although steady-state formant values of tense vowels contributed to the intelligibility of clear speech, neither the steady-state portion nor the formant transition was sufficient to yield comparable intelligibility to that of clear speech. In contrast, when the entire formant contour of conversational speech including the phoneme duration was replaced by that of clear speech, intelligibility was comparable to that of clear speech. It indicated that the combination of formant contour and duration information was relevant to the improved intelligibility of clear speech. The study provides a better understanding of the relevance of different aspects of formant contours to the improved intelligibility of clear speech.
先前的研究表明,说话者有意清晰发音的“清晰”言语比日常对话中产生的“会话”言语具有更好的可懂度。然而,会话言语和清晰言语在多个声学维度上存在差异,尚不清楚清晰言语的哪些方面会带来更好的可懂度。此前,凯恩等人[《美国声学学会杂志》(4),2308 - 2319(2008)]表明,短期频谱和时长的组合是一位说话者可懂度提高的原因。本研究调查了包括时间特征在内的短期频谱特定特征的子集。与凯恩的研究类似,通过将清晰言语的特征与会话言语的互补特征相结合来合成混合刺激,以确定哪些声学特征会导致清晰言语的可懂度提高。我们的结果表明,尽管紧元音的稳态共振峰值有助于清晰言语的可懂度,但稳态部分和共振峰过渡都不足以产生与清晰言语相当的可懂度。相比之下,当用清晰言语的整个共振峰轮廓(包括音素时长)取代会话言语的共振峰轮廓时,可懂度与清晰言语相当。这表明共振峰轮廓和时长信息的组合与清晰言语可懂度的提高相关。该研究有助于更好地理解共振峰轮廓的不同方面与清晰言语可懂度提高之间的相关性。