Qian Cheng, Lobo Marques João Alexandre, de Alexandria Auzuir Ripardo, Fong Simon James
Institute of Data Engineering and Science, University of Saint Joseph, Macau SAR, China.
Laboratory of Applied Neurosciences, University of Saint Joseph, Macau SAR, China.
Sensors (Basel). 2025 Feb 27;25(5):1478. doi: 10.3390/s25051478.
Facial expression recognition (FER) is essential for discerning human emotions and is applied extensively in big data analytics, healthcare, security, and user experience enhancement. This study presents a comprehensive evaluation of ten state-of-the-art deep learning models-VGG16, VGG19, ResNet50, ResNet101, DenseNet, GoogLeNet V1, MobileNet V1, EfficientNet V2, ShuffleNet V2, and RepVGG-on the task of facial expression recognition using the FER2013 dataset. Key performance metrics, including test accuracy, training time, and weight file size, were analyzed to assess the learning efficiency, generalization capabilities, and architectural innovations of each model. EfficientNet V2 and ResNet50 emerged as top performers, achieving high accuracy and stable convergence using compound scaling and residual connections, enabling them to capture complex emotional features with minimal overfitting. DenseNet, GoogLeNet V1, and RepVGG also demonstrated strong performance, leveraging dense connectivity, inception modules, and re-parameterization techniques, though they exhibited slower initial convergence. In contrast, lightweight models such as MobileNet V1 and ShuffleNet V2, while excelling in computational efficiency, faced limitations in accuracy, particularly in challenging emotion categories like "fear" and "disgust". The results highlight the critical trade-offs between computational efficiency and predictive accuracy, emphasizing the importance of selecting appropriate architecture based on application-specific requirements. This research contributes to ongoing advancements in deep learning, particularly in domains such as facial expression recognition, where capturing subtle and complex patterns is essential for high-performance outcomes.
面部表情识别(FER)对于辨别人类情感至关重要,并广泛应用于大数据分析、医疗保健、安全和用户体验增强等领域。本研究使用FER2013数据集,对十种先进的深度学习模型——VGG16、VGG19、ResNet50、ResNet101、DenseNet、GoogLeNet V1、MobileNet V1、EfficientNet V2、ShuffleNet V2和RepVGG——在面部表情识别任务上进行了全面评估。分析了关键性能指标,包括测试准确率、训练时间和权重文件大小,以评估每个模型的学习效率、泛化能力和架构创新。EfficientNet V2和ResNet50表现出色,通过复合缩放和残差连接实现了高精度和稳定收敛,能够以最小的过拟合捕捉复杂的情感特征。DenseNet、GoogLeNet V1和RepVGG也表现出强大的性能,利用密集连接、Inception模块和重新参数化技术,尽管它们的初始收敛较慢。相比之下,诸如MobileNet V1和ShuffleNet V2等轻量级模型虽然在计算效率方面表现出色,但在准确率方面存在局限性,尤其是在“恐惧”和“厌恶”等具有挑战性的情感类别中。结果突出了计算效率和预测准确率之间的关键权衡,强调了根据特定应用需求选择合适架构的重要性。这项研究为深度学习的持续发展做出了贡献,特别是在面部表情识别等领域,在这些领域中捕捉微妙和复杂的模式对于实现高性能结果至关重要。