Bänziger Tanja, Hosoya Georg, Scherer Klaus R
Department of Psychology, Mid Sweden University, Östersund, Sweden.
Department of Educational Science and Psychology, Freie Universität, Berlin, Germany.
PLoS One. 2015 Sep 1;10(9):e0136675. doi: 10.1371/journal.pone.0136675. eCollection 2015.
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).
我们建议使用一个涵盖编码、传输和解码过程的全面的语音情感交流路径模型,对情感表达和识别的数据集进行实证建模。基于专业演员的语音情感演绎语料库和普通听众的情感推断,该方法的实用性在来自两种不同文化和语言的两个数据集上得到了证明。使用透镜模型方程、层次回归和多元路径分析来比较在四种情感类别(恐惧、愤怒、快乐和悲伤)的语音表达中,客观测量的声学线索和听众感知的主观语音线索对情感推断方差的相对贡献。虽然结果证实了唤醒在语音情感交流中的核心作用,但通过识别携带特定情感类别信息的远端线索和近端感知的独特组合(独立于唤醒),证明了应用扩展路径建模框架的实用性。生成的统计模型表明,需要开发更复杂的声学参数来解释主观语音质量感知的远端基础,这些感知在情感推断中占了很大比例的方差,特别是语音不稳定性和粗糙度。这里倡导的一般方法以及具体结果,为心理学(特别是情感和社会感知研究)以及工程和计算机科学(特别是情感计算领域的研究与开发,尤其是关于化身中的自动情感检测和合成情感表达)的工作开辟了新的研究策略。