Suppr超能文献

基于多尺度特征融合和补丁丢弃的增强型混合视觉 Transformer 在面部表情识别中的应用。

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition.

机构信息

College of Computer Science and Technology, Changchun University, No. 6543, Satellite Road, Changchun 130022, China.

School of Computer Science Technology, Beijing Institute of Technology, Beijing 100811, China.

出版信息

Sensors (Basel). 2024 Jun 26;24(13):4153. doi: 10.3390/s24134153.

Abstract

Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition.

摘要

卷积神经网络 (CNN) 在面部表情识别 (FER) 领域取得了重大进展。然而,由于遮挡、光照变化和头部姿势变化等挑战,在真实环境中进行面部表情识别仍然极具挑战性。同时,仅基于 CNN 的方法严重依赖于局部空间特征,缺乏全局信息,并且难以在计算复杂度和识别精度之间取得平衡。因此,基于 CNN 的模型在充分解决 FER 问题方面仍存在不足。为了解决这些问题,我们提出了一种基于混合视觉转换器的轻量级面部表情识别方法。该方法通过改进的注意力模块捕捉多尺度面部特征,实现更丰富的特征融合,增强网络对关键面部表情区域的感知能力,并提高特征提取能力。此外,为了进一步提高模型的性能,我们设计了补丁丢弃 (PD) 模块。该模块旨在模拟人类视觉系统对局部特征的注意力分配机制,引导网络关注最具判别力的特征,减少无关特征的影响,并直观地降低计算成本。大量实验表明,我们的方法显著优于其他方法,在 RAF-DB 上的准确率达到 86.51%,在 FER2013 上的准确率接近 70%,模型大小仅为 3.64MB。这些结果表明,我们的方法为面部表情识别领域提供了新的视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd47/11243949/aefb0d4759b3/sensors-24-04153-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验