Suppr超能文献

通过深度学习与注意力机制的早期融合,利用可见光和红外光进行面部表情识别。

Facial expression recognition using visible and IR by early fusion of deep learning with attention mechanism.

作者信息

Naseem Muhammad Tahir, Lee Chan-Su, Shahzad Tariq, Khan Muhammad Adnan, Abu-Mahfouz Adnan M, Ouahada Khmaies

机构信息

Department of Electronic Engineering, Yeungnam University, Gyeongsan, Republic of Korea.

Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa.

出版信息

PeerJ Comput Sci. 2025 Mar 12;11:e2676. doi: 10.7717/peerj-cs.2676. eCollection 2025.

Abstract

Facial expression recognition (FER) has garnered significant attention due to advances in artificial intelligence, particularly in applications like driver monitoring, healthcare, and human-computer interaction, which benefit from deep learning techniques. The motivation of this research is to address the challenges of accurately recognizing emotions despite variations in expressions across emotions and similarities between different expressions. In this work, we propose an early fusion approach that combines features from visible and infrared modalities using publicly accessible VIRI and NVIE databases. Initially, we developed single-modality models for visible and infrared datasets by incorporating an attention mechanism into the ResNet-18 architecture. We then extended this to a multi-modal early fusion approach using the same modified ResNet-18 with attention, achieving superior accuracy through the combination of convolutional neural network (CNN) and transfer learning (TL). Our multi-modal approach attained 84.44% accuracy on the VIRI database and 85.20% on the natural visible and infrared facial expression (NVIE) database, outperforming previous methods. These results demonstrate that our single-modal and multi-modal approaches achieve state-of-the-art performance in FER.

摘要

由于人工智能的进步,面部表情识别(FER)受到了广泛关注,尤其是在诸如驾驶员监控、医疗保健和人机交互等应用中,这些应用受益于深度学习技术。本研究的动机是解决尽管不同情绪的表情存在差异且不同表情之间存在相似性,但仍要准确识别情绪的挑战。在这项工作中,我们提出了一种早期融合方法,该方法使用公开可用的VIRI和NVIE数据库,结合可见光和红外模态的特征。最初,我们通过将注意力机制纳入ResNet-18架构,为可见光和红外数据集开发了单模态模型。然后,我们将其扩展为使用相同的带有注意力的修改后的ResNet-18的多模态早期融合方法,通过结合卷积神经网络(CNN)和迁移学习(TL)实现了更高的准确率。我们的多模态方法在VIRI数据库上达到了84.44%的准确率,在自然可见光和红外面部表情(NVIE)数据库上达到了85.20%的准确率,优于以前的方法。这些结果表明,我们的单模态和多模态方法在面部表情识别中达到了当前的先进性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/49e4/11935750/dc848fd2c198/peerj-cs-11-2676-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验