Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.
Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
Biol Direct. 2018 Feb 6;13(1):1. doi: 10.1186/s13062-017-0203-4.
Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities.
We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy.
The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields.
This article was reviewed by Zoltan Gaspari and David Kreil.
个性化推荐系统的用户面临着一个困境:推荐可以通过从数据中学习来改进,但前提是其他用户愿意共享他们的私人信息。精准医学中,良好的个性化预测至关重要,但预测所基于的基因组信息也特别敏感,因为它直接识别患者,因此难以匿名化。差分隐私已成为一种有前途的潜在解决方案:如果无法区分个体患者的存在,则认为隐私是足够的。然而,使用当前方法进行差分隐私学习并不能在可行的数据大小和维度下提高预测精度。
我们通过展示一种新的稳健私有回归方法在私人药物敏感性预测的准确性方面取得了显著的提高,证明了在强大的差分隐私保证下,可以学习到有用的预测器,甚至可以从中等大小的数据集中学习到有用的预测器。我们的方法在相对较强的差分隐私保证下,仅使用 4 倍的数据就可以匹配最先进的非私有 lasso 回归的预测精度。通过限制私人信息的共享,通过降低维度和将异常值投影到更紧的边界来拟合,从而为了获得相同的隐私而需要添加更少的噪声,从而在有限的数据下实现了良好的性能。
所提出的差分隐私回归方法结合了理论吸引力和渐近效率,即使在中等大小的数据下,也具有良好的预测精度。由于简单实现的方法已经在具有挑战性的基因组数据上显示出了前景,我们预计在许多领域的实际应用中将会迅速取得进展。
本文由 Zoltan Gaspari 和 David Kreil 进行了评审。