Fan Xinqi, Jiang Mingjie, Shahid Ali Raza, Yan Hong
Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China.
Electrical and Computer Engineering Department, COMSATS University Islamabad, Islamabad, Pakistan.
Cogn Neurodyn. 2022 Aug;16(4):847-858. doi: 10.1007/s11571-021-09761-3. Epub 2022 Jan 5.
Recognition of facial expressions plays an important role in understanding human behavior, classroom assessment, customer feedback, education, business, and many other human-machine interaction applications. Some researchers have realized that using features corresponding to different scales can improve the recognition accuracy, but there is a lack of a systematic study to utilize the scale information. In this work, we proposed a hierarchical scale convolutional neural network (HSNet) for facial expression recognition, which can systematically enhance the information extracted from the kernel, network, and knowledge scale. First, inspired by that the facial expression can be defined by different size facial action units and the power of sparsity, we proposed dilation Inception blocks to enhance kernel scale information extraction. Second, to supervise relatively shallow layers for learning more discriminated features from different size feature maps, we proposed a feature guided auxiliary learning approach to utilize high-level semantic features to guide the shallow layers learning. Last, since human cognitive ability can progressively be improved by learned knowledge, we mimicked such ability by knowledge transfer learning from related tasks. Extensive experiments on lab-controlled, synthesized, and in-the-wild databases showed that the proposed method substantially boosts performance, and achieved state-of-the-art accuracy on most databases. Ablation studies proved the effectiveness of modules in the proposed method.
面部表情识别在理解人类行为、课堂评估、客户反馈、教育、商业以及许多其他人机交互应用中发挥着重要作用。一些研究人员已经意识到,使用对应不同尺度的特征可以提高识别准确率,但缺乏对尺度信息利用的系统研究。在这项工作中,我们提出了一种用于面部表情识别的分层尺度卷积神经网络(HSNet),它可以系统地增强从内核、网络和知识尺度中提取的信息。首先,受面部表情可由不同大小的面部动作单元和稀疏性力量定义的启发,我们提出了扩张Inception模块来增强内核尺度信息提取。其次,为了监督相对较浅的层从不同大小的特征图中学习更具区分性的特征,我们提出了一种特征引导辅助学习方法,利用高级语义特征来指导浅层学习。最后,由于人类认知能力可以通过所学知识逐步提高,我们通过从相关任务进行知识迁移学习来模拟这种能力。在实验室控制、合成和真实场景数据库上进行的大量实验表明,所提出的方法显著提高了性能,并在大多数数据库上达到了当前最优的准确率。消融研究证明了所提方法中各模块的有效性。