Nazarbayev University, Department of Computer Science, Astana, 010000, Republic of Kazakhstan.
Korea University, Department of Artificial Intelligence, Seoul, 02841, Republic of Korea.
Sci Data. 2024 Sep 19;11(1):1026. doi: 10.1038/s41597-024-03838-4.
Understanding emotional states is pivotal for the development of next-generation human-machine interfaces. Human behaviors in social interactions have resulted in psycho-physiological processes influenced by perceptual inputs. Therefore, efforts to comprehend brain functions and human behavior could potentially catalyze the development of AI models with human-like attributes. In this study, we introduce a multimodal emotion dataset comprising data from 30-channel electroencephalography (EEG), audio, and video recordings from 42 participants. Each participant engaged in a cue-based conversation scenario, eliciting five distinct emotions: neutral, anger, happiness, sadness, and calmness. Throughout the experiment, each participant contributed 200 interactions, which encompassed both listening and speaking. This resulted in a cumulative total of 8,400 interactions across all participants. We evaluated the baseline performance of emotion recognition for each modality using established deep neural network (DNN) methods. The Emotion in EEG-Audio-Visual (EAV) dataset represents the first public dataset to incorporate three primary modalities for emotion recognition within a conversational context. We anticipate that this dataset will make significant contributions to the modeling of the human emotional process, encompassing both fundamental neuroscience and machine learning viewpoints.
理解情绪状态对于下一代人机接口的发展至关重要。社交互动中的人类行为导致了受感知输入影响的心理生理过程。因此,努力理解大脑功能和人类行为有可能促进具有类人属性的人工智能模型的发展。在这项研究中,我们引入了一个多模态情感数据集,该数据集包含来自 42 名参与者的 30 通道脑电图 (EEG)、音频和视频记录的数据。每个参与者都参与了基于提示的对话场景,引发了五种不同的情绪:中性、愤怒、快乐、悲伤和平静。在整个实验过程中,每个参与者贡献了 200 次交互,包括听和说。这导致所有参与者的总交互次数达到 8400 次。我们使用已建立的深度神经网络 (DNN) 方法评估了每个模态的情感识别基线性能。EAV 数据集是第一个在对话环境中整合三种主要模态进行情感识别的公共数据集。我们预计,该数据集将对人类情感过程的建模做出重大贡献,涵盖基础神经科学和机器学习的观点。