Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079, China.
Sensors (Basel). 2023 Jul 30;23(15):6799. doi: 10.3390/s23156799.
Facial expressions help individuals convey their emotions. In recent years, thanks to the development of computer vision technology, facial expression recognition (FER) has become a research hotspot and made remarkable progress. However, human faces in real-world environments are affected by various unfavorable factors, such as facial occlusion and head pose changes, which are seldom encountered in controlled laboratory settings. These factors often lead to a reduction in expression recognition accuracy. Inspired by the recent success of transformers in many computer vision tasks, we propose a model called the fine-tuned channel-spatial attention transformer (FT-CSAT) to improve the accuracy of recognition of FER in the wild. FT-CSAT consists of two crucial components: channel-spatial attention module and fine-tuning module. In the channel-spatial attention module, the feature map is input into the channel attention module and the spatial attention module sequentially. The final output feature map will effectively incorporate both channel information and spatial information. Consequently, the network becomes adept at focusing on relevant and meaningful features associated with facial expressions. To further improve the model's performance while controlling the number of excessive parameters, we employ a fine-tuning method. Extensive experimental results demonstrate that our FT-CSAT outperforms the state-of-the-art methods on two benchmark datasets: RAF-DB and FERPlus. The achieved recognition accuracy is 88.61% and 89.26%, respectively. Furthermore, to evaluate the robustness of FT-CSAT in the case of facial occlusion and head pose changes, we take tests on Occlusion-RAF-DB and Pose-RAF-DB data sets, and the results also show that the superior recognition performance of the proposed method under such conditions.
面部表情有助于个体传达情感。近年来,得益于计算机视觉技术的发展,面部表情识别(FER)已成为研究热点,并取得了显著进展。然而,真实环境中的人脸会受到各种不利因素的影响,例如面部遮挡和头部姿势变化等,这些因素在受控的实验室环境中很少出现。这些因素通常会导致表情识别精度降低。受转换器在许多计算机视觉任务中取得的最新成功的启发,我们提出了一种名为微调通道-空间注意力转换器(FT-CSAT)的模型,以提高野外 FER 识别的准确性。FT-CSAT 由两个关键组件组成:通道-空间注意力模块和微调模块。在通道-空间注意力模块中,特征图依次输入到通道注意力模块和空间注意力模块中。最终的输出特征图将有效地结合通道信息和空间信息。因此,网络变得擅长于关注与面部表情相关的相关和有意义的特征。为了在控制过多参数数量的同时进一步提高模型的性能,我们采用了微调方法。广泛的实验结果表明,我们的 FT-CSAT 在两个基准数据集 RAF-DB 和 FERPlus 上优于最先进的方法。分别达到了 88.61%和 89.26%的识别准确率。此外,为了评估 FT-CSAT 在面部遮挡和头部姿势变化情况下的鲁棒性,我们在 Occlusion-RAF-DB 和 Pose-RAF-DB 数据集上进行了测试,结果也表明了该方法在这种情况下的优越识别性能。