从广义边界视角分析对抗训练中的隐性偏差

Lyu Bochen, Zhu Zhanxing

IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):8025-8039. doi: 10.1109/TPAMI.2025.3575618.

Adversarial training has been empirically demonstrated as an effective strategy to improve the robustness of deep neural networks (DNNs) against adversarial examples. However, the underlying reason of its effectiveness is still non-transparent. In this paper we conduct both extensive theoretical and empirical analysis on the implicit bias induced by adversarial training from a generalized margin perspective. Our results focus on adversarial training for homogeneous DNNs. In particular, (i) For deep linear networks with $\ell _{p}$ℓp-norm perturbation, we show that weight matrices of adjacent layers get aligned and the converged parameters maximize the margin of adversarial examples, which can be further viewed as a generalized margin of the original dataset that can be achieved by an interpolation solution between $\ell _{2}$ℓ2-SVM and $\ell _{q}$ℓq-SVM where $1/p + 1/q=1$1/p+1/q=1. (ii) For general homogeneous DNNs, including both linear and nonlinear ones, we investigate adversarial training with a variety of adversarial perturbations in a unified manner. Specifically, we show that the direction of the limit point of parameters converges to a KKT point of a constrained optimization problem that aims to maximize the margin for adversarial examples. Additionally, as an application of this general result for two special linear homogeneous DNNs, diagonal linear networks and linear convolutional networks, we show that adversarial training with $\ell _{p}$ℓp-norm perturbation equivalently minimizes an interpolation norm that depends on the depth, the architecture, and the value of $p$p in the predictor space. Extensive experiments are conducted to verify theoretical claims. Our results theoretically provide the basis for the longstanding folklore Madry et al. 2018 that adversarial training modifies the decision boundary by utilizing adversarial examples to improve robustness, and potentially provide insights for designing new robust training strategies.

对抗训练已通过实验证明是提高深度神经网络（DNN）对对抗样本鲁棒性的有效策略。然而，其有效性的根本原因仍不明确。在本文中，我们从广义边际的角度对对抗训练引起的隐式偏差进行了广泛的理论和实证分析。我们的结果聚焦于同质DNN的对抗训练。具体而言，（i）对于具有$\ell _{p}$范数扰动的深度线性网络，我们表明相邻层的权重矩阵会对齐，并且收敛后的参数会使对抗样本的边际最大化，这可以进一步看作是原始数据集的广义边际，它可以通过$\ell _{2}$支持向量机（SVM）和$\ell _{q}$SVM之间的插值解来实现，其中$1/p + 1/q = 1$。（ii）对于包括线性和非线性的一般同质DNN，我们以统一的方式研究了具有各种对抗扰动的对抗训练。具体来说，我们表明参数极限点的方向收敛到一个约束优化问题的KKT点，该问题旨在最大化对抗样本的边际。此外，作为这一一般结果在两个特殊线性同质DNN（对角线性网络和线性卷积网络）上的应用，我们表明使用$\ell _{p}$范数扰动的对抗训练等效地最小化了一个依赖于预测器空间中的深度、架构和$p$值的插值范数。我们进行了大量实验来验证理论主张。我们的结果在理论上为Madry等人2018年的长期经验法则提供了基础，即对抗训练通过利用对抗样本修改决策边界来提高鲁棒性，并可能为设计新的鲁棒训练策略提供见解。

相似文献

Analyzing the Implicit Bias of Adversarial Training From a Generalized Margin Perspective.

IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):8025-8039. doi: 10.1109/TPAMI.2025.3575618.

Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.

Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.

Sexual Harassment and Prevention Training

Short-Term Memory Impairment

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

A medical image classification method based on self-regularized adversarial learning.

Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

Factors that impact on the use of mechanical ventilation weaning protocols in critically ill adults and children: a qualitative evidence-synthesis.

Cochrane Database Syst Rev. 2016 Oct 4;10(10):CD011812. doi: 10.1002/14651858.CD011812.pub2.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Analyzing the Implicit Bias of Adversarial Training From a Generalized Margin Perspective.

IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):8025-8039. doi: 10.1109/TPAMI.2025.3575618.

Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.

Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.

Sexual Harassment and Prevention Training

Short-Term Memory Impairment

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

A medical image classification method based on self-regularized adversarial learning.

Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

Factors that impact on the use of mechanical ventilation weaning protocols in critically ill adults and children: a qualitative evidence-synthesis.

Cochrane Database Syst Rev. 2016 Oct 4;10(10):CD011812. doi: 10.1002/14651858.CD011812.pub2.

Analyzing the Implicit Bias of Adversarial Training From a Generalized Margin Perspective.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献