Toivanen Juhani, Väyrynen Eero, Seppänen Tapio
Department of Electrical and Information Engineering, Information Processing Laboratory, P.O. BOX 4500, FIN-90014, University of Oulu, Finland.
Lang Speech. 2004;47(Pt 4):383-412. doi: 10.1177/00238309040470040301.
In this paper, experiments on the automatic discrimination of basic emotions from spoken Finnish are described. For the purpose of the study, a large emotional speech corpus of Finnish was collected; 14 professional actors acted as speakers, and simulated four primary emotions when reading out a semantically neutral text. More than 40 prosodic features were derived and automatically computed from the speech samples. Two application scenarios were tested: the first scenario was speaker-independent for a small domain of speakers while the second scenario was completely speaker-independent. Human listening experiments were conducted to assess the perceptual adequacy of the emotional speech samples. Statistical classification experiments indicated that, with the optimal combination of prosodic feature vectors, automatic emotion discrimination performance close to human emotion recognition ability was achievable.
本文描述了从芬兰语语音中自动辨别基本情绪的实验。为了进行这项研究,收集了一个大型的芬兰语情感语音语料库;14名专业演员作为说话者,在朗读语义中性的文本时模拟了四种基本情绪。从语音样本中提取并自动计算了40多个韵律特征。测试了两种应用场景:第一种场景是针对一小部分说话者的独立于说话者的情况,而第二种场景是完全独立于说话者的情况。进行了人类听力实验,以评估情感语音样本的感知充分性。统计分类实验表明,通过韵律特征向量的最佳组合,可以实现接近人类情感识别能力的自动情感辨别性能。