Suppr超能文献

VaBTFER:一种用于面部表情识别的有效变体二进制转换器。

VaBTFER: An Effective Variant Binary Transformer for Facial Expression Recognition.

机构信息

College of Information Science and Technology, Nanjing Forestry University, NanJing 100190, China.

出版信息

Sensors (Basel). 2023 Dec 27;24(1):147. doi: 10.3390/s24010147.

Abstract

Existing Transformer-based models have achieved impressive success in facial expression recognition (FER) by modeling the long-range relationships among facial muscle movements. However, the size of pure Transformer-based models tends to be in the million-parameter level, which poses a challenge for deploying these models. Moreover, the lack of inductive bias in Transformer usually leads to the difficulty of training from scratch on limited FER datasets. To address these problems, we propose an effective and lightweight variant Transformer for FER called VaTFER. In VaTFER, we firstly construct action unit (AU) tokens by utilizing action unit-based regions and their histogram of oriented gradient (HOG) features. Then, we present a novel spatial-channel feature relevance Transformer (SCFRT) module, which incorporates multilayer channel reduction self-attention (MLCRSA) and a dynamic learnable information extraction (DLIE) mechanism. MLCRSA is utilized to model long-range dependencies among all tokens and decrease the number of parameters. DLIE's goal is to alleviate the lack of inductive bias and improve the learning ability of the model. Furthermore, we use an excitation module to replace the vanilla multilayer perception (MLP) for accurate prediction. To further reduce computing and memory resources, we introduce a binary quantization mechanism, formulating a novel lightweight Transformer model called variant binary Transformer for FER (VaBTFER). We conduct extensive experiments on several commonly used facial expression datasets, and the results attest to the effectiveness of our methods.

摘要

现有的基于 Transformer 的模型通过建模面部肌肉运动之间的长程关系,在面部表情识别(FER)方面取得了令人瞩目的成功。然而,纯基于 Transformer 的模型的大小往往在百万参数级别,这给这些模型的部署带来了挑战。此外,Transformer 中缺乏归纳偏差通常导致在有限的 FER 数据集上从头开始训练的困难。为了解决这些问题,我们提出了一种有效的轻量级 FER 专用 Transformer 变体,称为 VaTFER。在 VaTFER 中,我们首先通过利用基于动作单元的区域及其方向梯度直方图(HOG)特征来构建动作单元(AU)令牌。然后,我们提出了一种新颖的空间-通道特征相关性 Transformer(SCFRT)模块,它结合了多层通道减少自注意力(MLCRSA)和动态可学习信息提取(DLIE)机制。MLCRSA 用于建模所有令牌之间的长程依赖关系,并减少参数数量。DLIE 的目标是缓解归纳偏差的缺乏并提高模型的学习能力。此外,我们使用激励模块来替代香草多层感知(MLP)以进行准确预测。为了进一步减少计算和内存资源,我们引入了二进制量化机制,形成了一种新的轻量级 Transformer 模型,称为变体二进制 Transformer 用于 FER(VaBTFER)。我们在几个常用的面部表情数据集上进行了广泛的实验,结果证明了我们方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1c8/10781231/76f15a930ed4/sensors-24-00147-g001.jpg

相似文献

1
VaBTFER: An Effective Variant Binary Transformer for Facial Expression Recognition.
Sensors (Basel). 2023 Dec 27;24(1):147. doi: 10.3390/s24010147.
2
Facial Expression Recognition Based on Fine-Tuned Channel-Spatial Attention Transformer.
Sensors (Basel). 2023 Jul 30;23(15):6799. doi: 10.3390/s23156799.
4
Face-mask-aware Facial Expression Recognition based on Face Parsing and Vision Transformer.
Pattern Recognit Lett. 2022 Dec;164:173-182. doi: 10.1016/j.patrec.2022.11.004. Epub 2022 Nov 9.
5
TFE: A Transformer Architecture for Occlusion Aware Facial Expression Recognition.
Front Neurorobot. 2021 Oct 25;15:763100. doi: 10.3389/fnbot.2021.763100. eCollection 2021.
6
Two-Level Spatio-Temporal Feature Fused Two-Stream Network for Micro-Expression Recognition.
Sensors (Basel). 2024 Feb 29;24(5):1574. doi: 10.3390/s24051574.
7
Evaluation and analysis of visual perception using attention-enhanced computation in multimedia affective computing.
Front Neurosci. 2024 Aug 7;18:1449527. doi: 10.3389/fnins.2024.1449527. eCollection 2024.
8
Facial Expression Recognition Based on Squeeze Vision Transformer.
Sensors (Basel). 2022 May 13;22(10):3729. doi: 10.3390/s22103729.
9
Transformer with difference convolutional network for lightweight universal boundary detection.
PLoS One. 2024 Apr 16;19(4):e0302275. doi: 10.1371/journal.pone.0302275. eCollection 2024.
10
Hierarchical attention network with progressive feature fusion for facial expression recognition.
Neural Netw. 2024 Feb;170:337-348. doi: 10.1016/j.neunet.2023.11.033. Epub 2023 Nov 14.

本文引用的文献

1
Multiview Learning With Robust Double-Sided Twin SVM.
IEEE Trans Cybern. 2022 Dec;52(12):12745-12758. doi: 10.1109/TCYB.2021.3088519. Epub 2022 Nov 18.
2
Learning Dynamic Relationships for Facial Expression Recognition Based on Graph Convolutional Network.
IEEE Trans Image Process. 2021;30:7143-7155. doi: 10.1109/TIP.2021.3101820. Epub 2021 Aug 12.
3
Learning Robust Discriminant Subspace Based on Joint L₂,ₚ- and L₂,ₛ-Norm Distance Metrics.
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):130-144. doi: 10.1109/TNNLS.2020.3027588. Epub 2022 Jan 5.
4
Nonpeaked Discriminant Analysis for Data Representation.
IEEE Trans Neural Netw Learn Syst. 2019 Dec;30(12):3818-3832. doi: 10.1109/TNNLS.2019.2944869. Epub 2019 Nov 13.
5
Recalibrating Fully Convolutional Networks With Spatial and Channel "Squeeze and Excitation" Blocks.
IEEE Trans Med Imaging. 2019 Feb;38(2):540-549. doi: 10.1109/TMI.2018.2867261.
6
Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition.
IEEE Trans Syst Man Cybern B Cybern. 2011 Feb;41(1):38-52. doi: 10.1109/TSMCB.2010.2044788. Epub 2010 Apr 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验