Gong Liang Yu, Li Xue Jun, Chong Peter Han Joo
Department of Electrical and Electronic Engineering, Auckland University of Technology, Auckland 1010, New Zealand.
Sensors (Basel). 2024 Nov 29;24(23):7635. doi: 10.3390/s24237635.
Spoofing attacks (or Presentation Attacks) are easily accessible to facial recognition systems, making the online financial system vulnerable. Thus, it is urgent to develop an anti-spoofing solution with superior generalization ability due to the high demand for spoofing attack detection. Although multi-modality methods such as combining depth images with RGB images and feature fusion methods could currently perform well with certain datasets, the cost of obtaining the depth information and physiological signals, especially that of the biological signal is relatively high. This paper proposes a representation learning method of an Auto-Encoder structure based on Swin Transformer and ResNet, then applies cross-entropy loss, semi-hard triplet loss, and Smooth L1 pixel-wise loss to supervise the model training. The architecture contains three parts, namely an Encoder, a Decoder, and an auxiliary classifier. The Encoder part could effectively extract the features with patches' correlations and the Decoder aims to generate universal "Clue Maps" for further contrastive learning. Finally, the auxiliary classifier is adopted to assist the model in making the decision, which regards this result as one preliminary result. In addition, extensive experiments evaluated Attack Presentation Classification Error Rate (APCER), Bonafide Presentation Classification Error Rate (BPCER) and Average Classification Error Rate (ACER) performances on the popular spoofing databases (CelebA, OULU, and CASIA-MFSD) to compare with several existing anti-spoofing models, and our approach could outperform existing models which reach 1.2% and 1.6% ACER on intra-dataset experiment. In addition, the inter-dataset on CASIA-MFSD (training set) and Replay-attack (Testing set) reaches a new state-of-the-art performance with 23.8% Half Total Error Rate (HTER).
欺骗攻击(或表现攻击)对于面部识别系统来说很容易实现,这使得在线金融系统变得脆弱。因此,由于对欺骗攻击检测的高需求,迫切需要开发一种具有卓越泛化能力的反欺骗解决方案。尽管诸如将深度图像与RGB图像相结合的多模态方法以及特征融合方法目前在某些数据集上表现良好,但获取深度信息和生理信号的成本,尤其是生物信号的成本相对较高。本文提出了一种基于Swin Transformer和ResNet的自动编码器结构的表示学习方法,然后应用交叉熵损失、半硬三元组损失和平滑L1逐像素损失来监督模型训练。该架构包含三个部分,即一个编码器、一个解码器和一个辅助分类器。编码器部分可以有效地提取具有补丁相关性的特征,解码器旨在生成通用的“线索图”以进行进一步的对比学习。最后,采用辅助分类器来协助模型做出决策,将此结果视为一个初步结果。此外,通过在流行的欺骗数据库(CelebA、OULU和CASIA - MFSD)上进行广泛实验来评估攻击表现分类错误率(APCER)、真实表现分类错误率(BPCER)和平均分类错误率(ACER)性能,以与几种现有的反欺骗模型进行比较,我们的方法在数据集内实验中可以优于现有模型,ACER达到1.2%和1.6%。此外,在CASIA - MFSD(训练集)和Replay - attack(测试集)上的跨数据集实验达到了新的最优性能,半总错误率(HTER)为23.8%。