Department of Signal Processing and Acoustics, Aalto University, Otakaari 3, FI-00076 Espoo, Finland.
Sensors (Basel). 2022 Jun 29;22(13):4931. doi: 10.3390/s22134931.
Understanding of the perception of emotions or affective states in humans is important to develop emotion-aware systems that work in realistic scenarios. In this paper, the perception of emotions in naturalistic human interaction (audio-visual data) is studied using perceptual evaluation. For this purpose, a naturalistic audio-visual emotion database collected from TV broadcasts such as soap-operas and movies, called the IIIT-H Audio-Visual Emotion (IIIT-H AVE) database, is used. The database consists of audio-alone, video-alone, and audio-visual data in English. Using data of all three modes, perceptual tests are conducted for four basic emotions (angry, happy, neutral, and sad) based on category labeling and for two dimensions, namely arousal (active or passive) and valence (positive or negative), based on dimensional labeling. The results indicated that the participants' perception of emotions was remarkably different between the audio-alone, video-alone, and audio-video data. This finding emphasizes the importance of emotion-specific features compared to commonly used features in the development of emotion-aware systems.
理解人类情感或情感状态的感知对于开发在现实场景中工作的情感感知系统非常重要。在本文中,使用感知评估研究了自然主义人类交互(视听数据)中的情感感知。为此,使用了一个从肥皂剧和电影等电视广播中收集的自然主义视听情感数据库,称为 IIIT-H 视听情感(IIIT-H AVE)数据库。该数据库包含英语的音频、视频和视听数据。使用所有三种模式的数据,根据类别标记对愤怒、快乐、中性和悲伤这四种基本情绪进行感知测试,根据维度标记对唤醒(主动或被动)和效价(积极或消极)这两个维度进行感知测试。结果表明,参与者对音频、视频和视听数据的情感感知存在显著差异。这一发现强调了在开发情感感知系统时,与常用特征相比,情感特定特征的重要性。