Hillenbrand J M, Clark M J, Houde R A
Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, Michigan 49008, USA.
J Acoust Soc Am. 2000 Dec;108(6):3013-22. doi: 10.1121/1.1323463.
This study was designed to examine the role of duration in vowel perception by testing listeners on the identification of CVC syllables generated at different durations. Test signals consisted of synthesized versions of 300 utterances selected from a large, multitalker database of /hVd/ syllables [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Four versions of each utterance were synthesized: (1) an original duration set (vowel duration matched to the original utterance), (2) a neutral duration set (duration fixed at 272 ms, the grand mean across all vowels), (3) a short duration set (duration fixed at 144 ms, two standard deviations below the mean), and (4) a long duration set (duration fixed at 400 ms, two standard deviations above the mean). Experiment 1 used a formant synthesizer, while a second experiment was an exact replication using a sinusoidal synthesis method that represented the original vowel spectrum more precisely than the formant synthesizer. Findings included (1) duration had a small overall effect on vowel identity since the great majority of signals were identified correctly at their original durations and at all three altered durations; (2) despite the relatively small average effect of duration, some vowels, especially [see text] and [see text], were significantly affected by duration; (3) some vowel contrasts that differ systematically in duration, such as [see text], and [see text], were minimally affected by duration; (4) a simple pattern recognition model appears to be capable of accounting for several features of the listening test results, especially the greater influence of duration on some vowels than others; and (5) because a formant synthesizer does an imperfect job of representing the fine details of the original vowel spectrum, results using the formant-synthesized signals led to a slight overestimate of the role of duration in vowel recognition, especially for the shortened vowels.
本研究旨在通过测试听众对不同时长下生成的CVC音节的识别,来检验时长在元音感知中的作用。测试信号由从一个大型多说话者/hVd/音节数据库中选出的300个话语的合成版本组成[希伦布兰德等人,《美国声学学会杂志》97, 3099 - 3111(1995)]。每个话语合成了四个版本:(1)原始时长组(元音时长与原始话语匹配),(2)中性时长组(时长固定为272毫秒,即所有元音的总平均值),(3)短时长组(时长固定为144毫秒,比平均值低两个标准差),以及(4)长时长组(时长固定为400毫秒,比平均值高两个标准差)。实验1使用了共振峰合成器,而第二个实验是使用正弦合成方法进行的精确复制,该方法比共振峰合成器更精确地呈现了原始元音频谱。研究结果包括:(1)时长对元音识别的总体影响较小,因为绝大多数信号在其原始时长以及所有三个改变后的时长下都能被正确识别;(2)尽管时长的平均影响相对较小,但一些元音,尤其是[见原文]和[见原文],受时长的影响显著;(3)一些在时长上有系统差异的元音对比,如[见原文]和[见原文],受时长的影响最小;(4)一个简单的模式识别模型似乎能够解释听力测试结果的几个特征,尤其是时长对某些元音的影响大于其他元音;(5)由于共振峰合成器在呈现原始元音频谱的精细细节方面做得不够完美,使用共振峰合成信号得到的结果导致对时长在元音识别中的作用略有高估,尤其是对于缩短的元音。