Kato H, Tsuzaki M, Sagisaka Y
ATR Human Information Processing Research Laboratories, Hikaridai, Seikacho, Kyoto, Japan.
J Acoust Soc Am. 1998 Jul;104(1):540-9. doi: 10.1121/1.423301.
Few perceptual studies of the temporal aspects of speech have investigated the influence of changes in segmental durations in terms of acceptability. Aiming to contribute to the assessment of rules for assigning segmental durations in speech synthesis, the current study measured the perceptual acceptability of changes in the segmental duration of vowels as a function of the segment attributes or context, such as base duration, temporal position in a word, vowel quality, and voicing of the following segment. Seven listeners estimated the acceptability of word stimuli in which one of the vowels was subjected to a temporal modification from -50 ms (for shortening) to +50 ms (for lengthening) in 5-ms steps. The temporal modification was applied to vowel segments in 70 word contexts; their durations ranged from 35-145 ms, the mora position in the word was first or third, the vowel quality was /a/ or /i/, and the following segment was a voiced or an unvoiced consonant. The experimental results showed that the listeners' acceptable range of durational modification was narrower for vowels in the first moraic position in the word than for those in the third moraic position. The acceptable range was also narrower for the vowel /a/ than for the vowel /i/, and similarly narrower for vowels followed by unvoiced consonants than for those followed by voiced consonants. The vowel that fell into the least vulnerable class (the third /i/, followed by a voiced consonant) required 140% of the modification of that which fell into the most vulnerable class (the first /a/, followed by an unvoiced consonant) to yield the same acceptability decrement. In contrast, the effect of the original vowel duration on the acceptability of temporal modifications was not significant despite its wide variation (35-145 ms).
很少有关于语音时间维度的知觉研究从可接受性方面探讨音段时长变化的影响。为了有助于评估语音合成中分配音段时长的规则,本研究测量了元音音段时长变化的知觉可接受性,该变化是音段属性或语境的函数,如基础时长、在单词中的时间位置、元音音质以及后续音段的浊音性。七名听众评估了单词刺激的可接受性,其中一个元音在-50毫秒(缩短)到+50毫秒(延长)范围内以5毫秒步长进行时间修改。时间修改应用于70个单词语境中的元音段;它们的时长范围为35 - 145毫秒,在单词中的莫拉位置为第一个或第三个,元音音质为/a/或/i/,后续音段为浊辅音或清辅音。实验结果表明,对于单词中处于第一个莫拉位置的元音,听众可接受的时长修改范围比处于第三个莫拉位置的元音更窄。/a/元音的可接受范围也比/i/元音更窄,同样,后续为清辅音的元音的可接受范围比后续为浊辅音的元音更窄。处于最不易受影响类别(第三个/i/,后续为浊辅音)的元音需要比处于最易受影响类别(第一个/a/,后续为清辅音)的元音多140%的修改量才能产生相同程度的可接受性下降。相比之下,尽管原始元音时长变化范围很大(35 - 145毫秒),但其对时间修改可接受性的影响并不显著。