Zhou Guangyao, Xie Yuanlun, Fu Yiqin, Wang Zhaokun
School of Computing and Artificial Intelligence, Southwest Jiaotong University, China.
School of Information and Software Engineering, University of Electronic Science and Technology of China, China.
Neural Netw. 2025 Mar;183:106937. doi: 10.1016/j.neunet.2024.106937. Epub 2024 Nov 26.
Facial expression recognition (FER) in the wild is a challenging pattern recognition task affected by the images' low quality and has attracted broad interest in computer vision. Existing FER methods failed to obtain sufficient accuracy to support the practical applications, especially in scenarios with low fault tolerance, which limits the adaptability of FER. Targeting exploring the possibility of further improving the accuracy of FER in the wild, this paper proposes a novel single model named R18+FAML and an ensemble model named R18+FAML-FGA-T2V, which applies intra-feature fusion within a single network, feature fusion among multiple networks, and the ensemble decision strategy. Based on the backbone of ResNet18 (R18), R18+FAML combines internal feature fusion and three attention blocks, as well as uses multiple loss functions (FAML) to improve the diversity of the feature extraction. To effectively integrate feature extractors from multiple networks, we propose feature fusion among networks based on the genetic algorithm (FGA). Comprehensively considering and utilizing more classification information, we propose an ensemble strategy, i.e., the improved top-two-voting (T2V) of multiple networks with the same structure. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas by integrating interest areas of multiple networks. From experiments on three challenging FER datasets in the wild including RAF-DB, AffectNet-8 and AffectNet-7, our single model R18+FAML and ensemble model R18+FAML-FGA-T2V achieve the accuracies of 90.32,62.17,65.83% and 91.59,63.27,66.63% respectively, both achieving the state-of-the-art results.
野外面部表情识别(FER)是一项具有挑战性的模式识别任务,受图像质量低的影响,在计算机视觉领域引起了广泛关注。现有的FER方法未能获得足够的准确率来支持实际应用,尤其是在容错率低的场景中,这限制了FER的适应性。为了探索进一步提高野外FER准确率的可能性,本文提出了一种名为R18+FAML的新型单模型和一种名为R18+FAML-FGA-T2V的集成模型,该集成模型在单个网络内应用特征内融合、多个网络间的特征融合以及集成决策策略。基于ResNet18(R18)的骨干网络,R18+FAML结合了内部特征融合和三个注意力块,并使用多个损失函数(FAML)来提高特征提取的多样性。为了有效整合来自多个网络的特征提取器,我们提出了基于遗传算法(FGA)的网络间特征融合。综合考虑并利用更多的分类信息,我们提出了一种集成策略,即对具有相同结构的多个网络进行改进的前两名投票(T2V)。结合上述策略,R18+FAML-FGA-T2V可以通过整合多个网络的感兴趣区域来聚焦主要的表情感知区域。在包括RAF-DB、AffectNet-8和AffectNet-7在内的三个具有挑战性的野外FER数据集上的实验表明,我们的单模型R18+FAML和集成模型R18+FAML-FGA-T2V分别达到了90.32%、62.17%、65.83%和91.59%、63.27%、66.63%的准确率,均取得了当前最优的结果。