Bu Zhiqi, Wang Hua, Dai Zongyu, Long Qi
University of Pennsylvania.
Transact Mach Learn Res. 2023 Jun;2023.
Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more . Our code can be found at https://github.com/woodyx218/opacus_global_clipping.
差分隐私(DP)训练通常以收敛速度较慢(从而导致准确率较低)以及比非隐私训练更严重的校准错误为代价来保护数据隐私。为了分析DP训练的收敛性,我们通过神经切线核(NTK)的视角进行连续时间分析,NTK刻画了DP训练中逐样本梯度裁剪和噪声添加的情况,适用于任意网络架构和损失函数。有趣的是,我们表明噪声添加仅影响隐私风险,而不影响收敛性或校准,而逐样本梯度裁剪(在平坦和逐层裁剪方式下)仅影响收敛性和校准。此外,我们观察到,虽然使用小裁剪范数训练的DP模型通常能达到最佳准确率,但校准效果不佳,因此不可靠。与之形成鲜明对比的是,使用大裁剪范数训练的DP模型享有相同的隐私保证和相似的准确率,但校准效果明显更好。我们的代码可在https://github.com/woodyx218/opacus_global_clipping找到。