Bulut Murtaza, Narayanan Shrikanth
Signal Analysis and Interpretation Laboratory, Electrical Engineering Department, University of Southern California, Los Angeles, California 90089, USA.
J Acoust Soc Am. 2008 Jun;123(6):4547-58. doi: 10.1121/1.2909562.
Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.
语音中的情感信息通常根据诸如基频(F0)、时长和能量等韵律特征来描述。本文重点关注如何利用基频特征有效地对语音信号中的情感特质进行参数化。采用分析合成方法,系统地修改情感话语的基频均值、范围和形状属性。结果显示了在不引起情感感知发生任何显著变化的情况下,可以修改的基频参数方面。为了模拟这种行为,引入了情感区域的概念。情感区域代表情感语音中存在的变异性,并为研究用于情感判断的语音线索提供了一种新方法。该方法应用于基频,但也可用于韵律的其他方面,如时长或响度。还给出了影响情感区域的因素的统计分析,以及关于基频修改对情感和语音质量感知的影响的讨论。结果表明,对于情感表达而言,基频范围比基频均值更重要。