Department of Computer Science and Engineering, University of Quebec in Outaouais, Gatineau, QC J8Y 3G5, Canada.
Department of Psychoeducation and Psychology, University of Quebec in Outaouais, Gatineau, QC J8X 3X7, Canada.
Sensors (Basel). 2024 Jul 7;24(13):4398. doi: 10.3390/s24134398.
The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA's video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson's correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions.
人的认知状态可以使用情绪状态的双循环模型进行分类,这是一个连续的二维模型:唤醒度和效价。本研究的目的是选择机器学习模型(多个),将其集成到虚拟现实(VR)系统中,为心理健康障碍患者运行认知矫正练习。因此,预测情绪状态对于为这些个体定制治疗方案至关重要。我们利用远程协作和情感交互(RECOLA)数据库,使用机器学习技术预测唤醒度和效价值。RECOLA 包括人类参与者之间交互的音频、视频和生理记录。为了让学习者专注于最相关的数据,从原始数据中提取特征。这些特征可以是预先设计的、学习的或使用深度学习器隐式提取的。我们之前关于视频记录的工作侧重于预先设计的和学习的视觉特征。在本文中,我们将工作扩展到深度视觉特征。我们的深度视觉特征是使用我们之前在 RECOLA 的全/半人脸视频帧上训练过的 MobileNet-v2 卷积神经网络(CNN)提取的。由于我们工作的最终目的是将我们的解决方案集成到使用头戴式显示器的实际 VR 应用程序中,因此我们以半脸作为概念验证进行了实验。然后,使用可优化的集成回归来通过提取的深度特征预测唤醒度和效价值。我们还将提取的视觉特征与预先设计的视觉特征融合,并使用组合特征集预测唤醒度和效价值。为了提高我们的预测性能,我们进一步融合了可优化集成模型和 MobileNet-v2 模型的预测。在决策融合之后,我们在唤醒度预测方面实现了均方根误差(RMSE)为 0.1140、皮尔逊相关系数(PCC)为 0.8000 和一致性相关系数(CCC)为 0.7868。我们在效价预测方面实现了 RMSE 为 0.0790、PCC 为 0.7904 和 CCC 为 0.7645。