School of Information Network Security, People's Public Security University of China, Beijing, China.
Department of Criminal Investigation, Sichuan Police College, Luzhou, China.
PLoS One. 2024 Oct 10;19(10):e0311720. doi: 10.1371/journal.pone.0311720. eCollection 2024.
The malicious use of deepfake videos seriously affects information security and brings great harm to society. Currently, deepfake videos are mainly generated based on deep learning methods, which are difficult to be recognized by the naked eye, therefore, it is of great significance to study accurate and efficient deepfake video detection techniques. Most of the existing detection methods focus on analyzing the discriminative information in a specific feature domain for classification from a local or global perspective. Such detection methods based on a single type feature have certain limitations in practical applications. In this paper, we propose a deepfake detection method with the ability to comprehensively analyze the forgery face features, which integrates features in the space domain, noise domain, and frequency domain, and uses the Inception Transformer to learn the mix of global and local information dynamically. We evaluate the proposed method on the DFDC, Celeb-DF, and FaceForensic++ benchmark datasets. Extensive experiments verify the effectiveness and good generalization of the proposed method. Compared with the optimal model, the proposed method with a small number of parameters does not use pre-training, distillation, or assembly, but still achieves competitive performance. The ablation experiments evaluate the role of each component.
深度伪造视频的恶意使用严重影响信息安全,给社会带来极大危害。目前,深度伪造视频主要基于深度学习方法生成,难以被肉眼识别,因此研究准确高效的深度伪造视频检测技术具有重要意义。现有的大多数检测方法主要从局部或全局角度分析特定特征域中的判别信息进行分类。这种基于单一类型特征的检测方法在实际应用中具有一定的局限性。本文提出了一种具有综合分析伪造人脸特征能力的深度伪造检测方法,该方法集成了空域、噪声域和频域的特征,并使用 Inception Transformer 动态学习全局和局部信息的混合。我们在 DFDC、Celeb-DF 和 FaceForensic++基准数据集上评估了所提出的方法。大量实验验证了所提出方法的有效性和良好的泛化能力。与最优模型相比,所提出的方法具有较少的参数,不使用预训练、蒸馏或组装,但仍能实现具有竞争力的性能。消融实验评估了每个组件的作用。