Suppr超能文献

通过概率训练实现内在对抗鲁棒性。

Toward Intrinsic Adversarial Robustness Through Probabilistic Training.

出版信息

IEEE Trans Image Process. 2023;32:3862-3872. doi: 10.1109/TIP.2023.3290532. Epub 2023 Jul 14.

Abstract

Modern deep neural networks have made numerous breakthroughs in real-world applications, yet they remain vulnerable to some imperceptible adversarial perturbations. These tailored perturbations can severely disrupt the inference of current deep learning-based methods and may induce potential security hazards to artificial intelligence applications. So far, adversarial training methods have achieved excellent robustness against various adversarial attacks by involving adversarial examples during the training stage. However, existing methods primarily rely on optimizing injective adversarial examples correspondingly generated from natural examples, ignoring potential adversaries in the adversarial domain. This optimization bias can induce the overfitting of the suboptimal decision boundary, which heavily jeopardizes adversarial robustness. To address this issue, we propose Adversarial Probabilistic Training (APT) to bridge the distribution gap between the natural and adversarial examples via modeling the latent adversarial distribution. Instead of tedious and costly adversary sampling to form the probabilistic domain, we estimate the adversarial distribution parameters in the feature level for efficiency. Moreover, we decouple the distribution alignment based on the adversarial probability model and the original adversarial example. We then devise a novel reweighting mechanism for the distribution alignment by considering the adversarial strength and the domain uncertainty. Extensive experiments demonstrate the superiority of our adversarial probabilistic training method against various types of adversarial attacks in different datasets and scenarios.

摘要

现代深度神经网络在实际应用中取得了许多突破,但它们仍然容易受到一些难以察觉的对抗性扰动的影响。这些定制的扰动可以严重干扰当前基于深度学习的方法的推断,并可能给人工智能应用带来潜在的安全隐患。到目前为止,对抗性训练方法通过在训练阶段涉及对抗性示例,在对抗性攻击方面取得了优异的鲁棒性。然而,现有的方法主要依赖于优化从自然示例相应生成的注入性对抗性示例,而忽略了对抗性领域中的潜在对手。这种优化偏差会导致次优决策边界的过度拟合,这严重危及对抗性的鲁棒性。为了解决这个问题,我们提出了对抗性概率训练 (APT),通过对潜在的对抗性分布进行建模来弥合自然和对抗性示例之间的分布差距。我们没有通过繁琐且昂贵的对抗性抽样来形成概率性域,而是在特征级别估计对抗性分布参数以提高效率。此外,我们根据对抗性概率模型和原始对抗性示例来分离分布对齐。然后,我们通过考虑对抗性强度和域不确定性,为分布对齐设计了一种新颖的重新加权机制。广泛的实验表明,我们的对抗性概率训练方法在不同数据集和场景中的各种对抗性攻击中具有优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验