Amorese Terry, Cuciniello Marialucia, Alterio Anna, Pepe Daniele, Scharenborg Odette, Cordasco Gennaro, Esposito Anna
Department of Psychology, Università degli Studi della Campania "L. Vanvitelli", Caserta, Italy.
Multimedia Computing Group, Delft University of Technology, Delft, Netherlands.
Front Psychol. 2025 Jun 25;16:1548975. doi: 10.3389/fpsyg.2025.1548975. eCollection 2025.
This work aims to understand the contextual factors affecting speech emotion recognition (SER), more specifically the current research investigates whether the identification of vocal emotional expressions of anger, fear, sadness, joy, and neutrality is affected by three factors: (a) the experimental setting, exploring vocal emotion recognition in both a controlled, soundproof laboratory and a more natural listening environment; (b) the effect of stimuli's background noise: sentences were presented with three different levels of noise to gradually increase the level of difficulty: one clear (no noise) condition and two noise conditions; (c) language familiarity, since the stimuli comprised Italian sentences, and participants were both native (Italians) and Dutch speakers, who did not know Italian.
Dutch and Italian participants were involved in a vocal emotion recognition task carried out in two different experimental settings (realistic vs. laboratory). The stimuli were vocal utterances from the Italian EMOVO dataset, conveying emotions like anger, fear, sadness, joy, and neutrality, and were presented in three different noise conditions.
Concerning the effect of the experimental setting, even in higher levels of background noise conditions, individuals possess the remarkable ability to discern emotional nuances conveyed through voice. Regarding familiarity with the language, differences in emotion recognition performance between the Italian and Dutch listeners were observed, but the error magnitude was contingent on the emotional categories. Higher noise levels reduced accuracy, but people could still discern emotions, especially prosody.
The study highlighted that emotion recognition is influenced by variables such as listening context, background noise, and language familiarity. These results could be useful for developing robust Speech Emotion Recognition (SER) systems and improving human-computer interaction.
本研究旨在了解影响语音情感识别(SER)的情境因素,更具体地说,当前的研究调查了愤怒、恐惧、悲伤、喜悦和中性等声音情感表达的识别是否受到三个因素的影响:(a)实验环境,在隔音的受控实验室和更自然的聆听环境中探索语音情感识别;(b)刺激的背景噪声的影响:句子以三种不同的噪声水平呈现,以逐渐增加难度:一种清晰(无噪声)条件和两种噪声条件;(c)语言熟悉度,由于刺激包括意大利语句子,参与者既有以意大利语为母语的人(意大利人),也有不懂意大利语的荷兰语使用者。
荷兰和意大利的参与者参与了在两种不同实验环境(现实环境与实验室环境)中进行的语音情感识别任务。刺激材料是来自意大利EMOVO数据集的语音话语,传达愤怒、恐惧、悲伤、喜悦和中性等情感,并以三种不同的噪声条件呈现。
关于实验环境的影响,即使在较高水平的背景噪声条件下,个体也具有通过声音辨别情感细微差别的显著能力。关于语言熟悉度,观察到意大利和荷兰听众在情感识别表现上存在差异,但误差幅度取决于情感类别。较高的噪声水平降低了准确性,但人们仍然可以辨别情感,尤其是韵律。
该研究强调情感识别受聆听环境、背景噪声和语言熟悉度等变量的影响。这些结果可能有助于开发强大的语音情感识别(SER)系统并改善人机交互。