一种基于交互式信息的具有双注意力机制的DCNN-BiLSTM面部表情识别模型。

An interactive information based DCNN-BiLSTM model with dual attention mechanism for facial expression recognition.

作者信息

Jayaraman Samanthisvaran, Mahendran Anand

机构信息

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India.

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai campus, Vellore, Tamilnadu, India.

出版信息

Sci Rep. 2025 Jul 19;15(1):26287. doi: 10.1038/s41598-025-09709-1.

DOI:10.1038/s41598-025-09709-1

PMID:40683909

Abstract

Human's facial expressions and emotions have direct impact on their action and decision-making abilities. Basic CNN models are complexity of speeding up the operation to minimize the complexity. In this paper, we have proposed a Deep Convolutional Neural Networks along with Bi-Long Short Term Memory, which is followed by a single and cross-fusion attention mechanism for gathering both spatial and channel information from feature vector maps. Piecewise Cubic Polynomial and linear activation function was used to speed up Interactive Learning Information (ILI). Global Average Pooling (GAP) computes weights for feature vector maps; softmax classifier is used to classify images into 7 classes based on the expression present on the input images. The proposed model's performance was compared with benchmarking methods like NGO-BiLSTM, ICNN-BiLSTM and HCNN-LSTM. The proposed model resulted with better accuracy than other methods with 82.89%, 96.78%, 95.78%, and 95.87% on FER 2013, CK+, RAF-DB and JAFFE datasets and also resulted in lower False Recognition Rate (FAR) of 7.23%, 1.42%, 1.96% and 1.78% on all four datasets respectively. The proposed model has performed well than other benchmarking models with high Genuine Recognition Rate (GAR) of 88.57% on FER2013, 97.23% on CK+, 96.87% on RAF-DB and 96.32% on JAFFE datasets respectively.

摘要

人类的面部表情和情绪对其行动和决策能力有直接影响。基本的卷积神经网络模型的复杂性在于加速运算以最小化复杂度。在本文中，我们提出了一种深度卷积神经网络与双向长短期记忆相结合的模型，随后是一个单一和交叉融合注意力机制，用于从特征向量图中收集空间和通道信息。使用分段三次多项式和线性激活函数来加速交互式学习信息（ILI）。全局平均池化（GAP）计算特征向量图的权重；softmax分类器用于根据输入图像上呈现的表情将图像分类为7个类别。将所提出模型的性能与NGO-BiLSTM、ICNN-BiLSTM和HCNN-LSTM等基准方法进行了比较。在FER 2013、CK+、RAF-DB和JAFFE数据集上，所提出的模型比其他方法具有更高的准确率，分别为82.89%、96.78%、95.78%和95.87%，并且在所有四个数据集上的误识率（FAR）也更低，分别为7.23%、1.42%、1.96%和1.78%。在所提出的模型在FER2013上的真识率（GAR）为88.57%，在CK+上为97.23%，在RAF-DB上为96.87%，在JAFFE数据集上为96.32%，比其他基准模型表现更好。