Yalçin Nursel, Alisawi Muthana
Department of Computer and Instructional Technologies Education, Gazi Faculty of Education, Gazi University, Ankara, Türkiye.
Institute of Information, Computer Sciences, Gazi University, Ankara, Türkiye.
Heliyon. 2024 Oct 4;10(20):e38913. doi: 10.1016/j.heliyon.2024.e38913. eCollection 2024 Oct 30.
Facial expression recognition (FER) plays a pivotal role in various applications, ranging from human-computer interaction to psychoanalysis. To improve the accuracy of facial emotion recognition (FER) models, this study focuses on enhancing and augmenting FER datasets. It comprehensively analyzes the Facial Emotion Recognition dataset (FER13) to identify defects and correct misclassifications. The FER13 dataset represents a crucial resource for researchers developing Deep Learning (DL) models aimed at recognizing emotions based on facial features. Subsequently, this article develops a new facial dataset by expanding upon the original FER13 dataset. Similar to the FER + dataset, the expanded dataset incorporates a wider range of emotions while maintaining data accuracy. To further improve the dataset, it will be integrated with the extended Cohn-Kanade (CK+) dataset. This paper investigates the application of modern DL models to enhance emotion recognition in human faces. By training a new dataset, the study demonstrates significant performance gains compared with its counterparts. Furthermore, the article examines recent advances in FER technology and identifies critical requirements for DL models to overcome the inherent challenges of this task effectively. The study explores several DL architectures for emotion recognition in facial image datasets, with a particular focus on convolutional neural networks (CNNs). Our findings indicate that complex architecture, such as EfficientNetB7, outperforms other DL architectures, achieving a test accuracy of 78.9 %. Notably, the model surpassed the EfficientNet-XGBoost model, especially when used with the new dataset. Our approach leverages EfficientNetB7 as a backbone to build a model capable of efficiently recognizing emotions from facial images. Our proposed model, EfficientNetB7-CNN, achieved a peak accuracy of 81 % on the test set despite facing challenges such as GPU memory limitations. This demonstrates the model's robustness in handling complex facial expressions. Furthermore, to enhance feature extraction and attention mechanisms, we propose a new hybrid model, CBAM-4CNN, which integrates the convolutional block attention module (CBAM) with a custom 4-layer CNN architecture. The results showed that the CBAM-4CNN model outperformed existing models, achieving higher accuracy, precision, and recall metrics across multiple emotion classes. The results highlight the critical role of comprehensive and diverse data in enhancing model performance for facial emotion recognition.
面部表情识别(FER)在从人机交互到精神分析等各种应用中都起着关键作用。为了提高面部情感识别(FER)模型的准确性,本研究专注于增强和扩充FER数据集。它对面部情感识别数据集(FER13)进行了全面分析,以识别缺陷并纠正错误分类。FER13数据集是研究人员开发旨在基于面部特征识别情感的深度学习(DL)模型的关键资源。随后,本文通过扩展原始FER13数据集开发了一个新的面部数据集。与FER +数据集类似,扩展后的数据集在保持数据准确性的同时纳入了更广泛的情感。为了进一步改进数据集,它将与扩展的科恩 - 卡纳德(CK +)数据集集成。本文研究了现代DL模型在增强人脸情感识别方面的应用。通过训练新数据集,该研究表明与同类模型相比有显著的性能提升。此外,本文研究了FER技术的最新进展,并确定了DL模型有效克服此任务固有挑战的关键要求。该研究探索了几种用于面部图像数据集中情感识别的DL架构,特别关注卷积神经网络(CNN)。我们的研究结果表明,诸如EfficientNetB7之类的复杂架构优于其他DL架构,测试准确率达到78.9%。值得注意的是,该模型超过了EfficientNet - XGBoost模型,特别是在与新数据集一起使用时。我们的方法利用EfficientNetB7作为主干来构建一个能够从面部图像中有效识别情感的模型。尽管面临诸如GPU内存限制等挑战,我们提出的模型EfficientNetB7 - CNN在测试集上仍达到了81%的峰值准确率。这证明了该模型在处理复杂面部表情方面的鲁棒性。此外,为了增强特征提取和注意力机制,我们提出了一种新的混合模型CBAM - 4CNN,它将卷积块注意力模块(CBAM)与自定义的4层CNN架构集成在一起。结果表明,CBAM - 4CNN模型优于现有模型,在多个情感类别上实现了更高的准确率、精确率和召回率指标。结果突出了全面且多样的数据在增强面部情感识别模型性能方面的关键作用。