语音时间感知中元音起始和偏移的功能差异：局部变化检测与语速辨别

Functional differences between vowel onsets and offsets in temporal perception of speech: local-change detection and speaking-rate discrimination.

作者信息

Kato Hiroaki, Tsuzaki Minoru, Sagisaka Yoshinori

机构信息

ATR Human Information Science Laboratories, Hikaridai, Seika-cho, Kyoto 619-0288, Japan.

出版信息

J Acoust Soc Am. 2003 Jun;113(6):3379-89. doi: 10.1121/1.1568760.

DOI:10.1121/1.1568760

PMID:12822808

Abstract

To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.

摘要

为了提供一个用于语音合成中时长规则客观评估的感知框架，进行了两项实验来研究元音（V）起始和V结束在标记语音感知时间结构功能上的差异。第一个实验测量了四拍（CVCVCVCV）日语单词中时间修改的可检测性。在V起始条件下，元音的起始间隔被统一改变（延长或缩短），而它们的结束间隔保持不变。在V结束条件下，情况则相反。这些操作没有改变整个单词的时长。每个修改后的单词都与未修改的对应单词配对，并将配对呈现给听众，要求他们对配对单词之间的差异进行评分。结果表明，听众在检测V起始和V结束条件下的时间修改能力上没有显著差异。在第二个实验中，要求听众估计他们在与第一个实验相同的刺激集上感知到的语速差异。有趣的是，结果显示在V起始和V结束条件下听众的表现存在明显差异。具体来说，改变V起始间隔会改变感知到的语速，尽管整个单词的时长保持不变，但两者呈现出线性关系（r = -0.9）。相比之下，修改V结束间隔与感知到的语速没有明显关系。第二个实验还表明，听众在语速辨别方面表现良好（变化率为3.5%-5%）。结合两项实验中听众时间处理范围（局部或全局）的差异对这些结果进行了讨论。