Liu Jing-Wei, Lin Xiao-Yuan, Ji Peng-Fei, Chen Jia-Ming, Zhang Jun
Department of Computer Science, Capital University of Economics and Business, Beijing, 100070, China.
College of Computer Science, Beijing University of Technology, Beijing, 100124, China.
Sci Rep. 2025 Jul 1;15(1):22219. doi: 10.1038/s41598-025-07416-5.
Deep learning techniques, particularly Convolutional Neural Networks (CNNs), have been widely recognized as effective tools for facial expression recognition applications. The accuracy of facial expression recognition application requires further enhancement. Main work and effects of this study are as follows: First, the first convolutional layer of CNN is substituted with a Multi-scale Convolutional (MsC) layer, resulting in the proposal of the Multi-scale CNN (MCNN). Experimental results indicate that MCNN achieves an average accuracy improvement of 1.339% over CNN. Second, a wavelet Channel Attention (wCA) mechanism is incorporated after the first pooling layer of CNN, leading to the proposal of the wCA-based CNN (wCA-CNN). Experimental results demonstrate that wCA-CNN achieves an average accuracy improvement of 1.414% over CNN. Third, by substituting the first convolutional layer of the CNN with the MsC layer and incorporating wCA mechanism after the first pooling layer, the wCA-based Multi-scale CNN (wCA-MCNN) is introduced. Experimental results reveal that wCA-MCNN achieves an average accuracy improvement of 2.921% compared to CNN. Fourth, the Residual Network (ResNet18) is selected as a baseline model and improved accordingly. Compared to ResNet18, the accuracy of the proposed MsC-ResNet18, wCA-ResNet18, and MsC-wCA-ResNet18 improved by 0.845%, 0.835%, and 1.810%, respectively. Fifth, all the above proposed methods are evaluated by two datasets: the Facial Expression of Students in Real-Class (FESR) dataset collected from our real classroom and the Karolinska Directed Emotional Faces (KDEF) dataset.
深度学习技术,尤其是卷积神经网络(CNN),已被广泛认为是面部表情识别应用的有效工具。面部表情识别应用的准确性需要进一步提高。本研究的主要工作和成果如下:第一,用多尺度卷积(MsC)层替换CNN的第一个卷积层,从而提出了多尺度CNN(MCNN)。实验结果表明,MCNN比CNN的平均准确率提高了1.339%。第二,在CNN的第一个池化层之后引入小波通道注意力(wCA)机制,从而提出了基于wCA的CNN(wCA-CNN)。实验结果表明,wCA-CNN比CNN的平均准确率提高了1.414%。第三,通过用MsC层替换CNN的第一个卷积层,并在第一个池化层之后引入wCA机制,引入了基于wCA的多尺度CNN(wCA-MCNN)。实验结果表明,与CNN相比,wCA-MCNN的平均准确率提高了2.921%。第四,选择残差网络(ResNet18)作为基线模型并进行相应改进。与ResNet18相比,所提出的MsC-ResNet18、wCA-ResNet18和MsC-wCA-ResNet18的准确率分别提高了0.845%、0.835%和1.810%。第五,所有上述提出的方法都通过两个数据集进行评估:从我们的真实课堂收集的真实课堂学生面部表情(FESR)数据集和卡罗林斯卡定向情感面孔(KDEF)数据集。