IEEE Trans Image Process. 2021;30:444-457. doi: 10.1109/TIP.2020.3037467. Epub 2020 Nov 24.
Facial expression recognition has become a newly-emerging topic in recent decades, which has important value in the field of human-computer interaction. In this paper, we present a deep learning based approach, named frequency neural network (FreNet), for facial expression recognition. Different from convolutional neural network in spatial domain, FreNet inherits the advantages of processing image in frequency domain, such as efficient computation and spatial redundancy elimination. First, we propose the learnable multiplication kernel and construct multiple multiplication layers to learn features in frequency domain. Second, a summarization layer is proposed following multiplication layers to further yield high-level features. Third, based on the property of discrete cosine transform (DCT), we utilize multiplication layers and summarization layer to construct the Basic-FreNet, which can yield high-level features on the widely used DCT feature. Finally, to further achieve better performance on Basic-FreNet, we propose the Block-FreNet in which the weight-shared multiplication kernel is designed for feature learning and the block sub-sampling is designed for dimension reduction. The experimental results show that the Block-FreNet not only achieves superior performance, but also greatly reduces the computational cost. To our best knowledge, the proposed approach is the first attempt to fill in the blank of frequency based deep learning model for facial expression recognition.
面部表情识别是近几十年来新兴的研究课题,在人机交互领域具有重要的应用价值。本文提出了一种基于深度学习的方法,称为频域神经网络(FreNet),用于面部表情识别。与在空间域的卷积神经网络不同,FreNet继承了频域处理图像的优势,例如高效计算和空间冗余消除。首先,我们提出了可学习的乘法核,并构建了多个乘法层来学习频域特征。其次,在乘法层之后提出了一个汇总层,以进一步生成高级特征。第三,基于离散余弦变换(DCT)的性质,我们利用乘法层和汇总层构建了基本 FreNet,它可以在广泛使用的 DCT 特征上生成高级特征。最后,为了在基本 FreNet 上进一步获得更好的性能,我们提出了块 FreNet,其中共享权重的乘法核用于特征学习,块子采样用于降维。实验结果表明,块 FreNet 不仅具有优异的性能,而且大大降低了计算成本。据我们所知,该方法是首次尝试填补基于频域的深度学习模型在面部表情识别中的空白。