Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Sci Adv. 2024 Feb 2;10(5):eadh8601. doi: 10.1126/sciadv.adh8601. Epub 2024 Jan 31.
Modern machine learning models toward various tasks with omic data analysis give rise to threats of privacy leakage of patients involved in those datasets. Here, we proposed a secure and privacy-preserving machine learning method (PPML-Omics) by designing a decentralized differential private federated learning algorithm. We applied PPML-Omics to analyze data from three sequencing technologies and addressed the privacy concern in three major tasks of omic data under three representative deep learning models. We examined privacy breaches in depth through privacy attack experiments and demonstrated that PPML-Omics could protect patients' privacy. In each of these applications, PPML-Omics was able to outperform methods of comparison under the same level of privacy guarantee, demonstrating the versatility of the method in simultaneously balancing the privacy-preserving capability and utility in omic data analysis. Furthermore, we gave the theoretical proof of the privacy-preserving capability of PPML-Omics, suggesting the first mathematically guaranteed method with robust and generalizable empirical performance in protecting patients' privacy in omic data.
基于组学数据分析的现代机器学习模型引发了涉及这些数据集的患者隐私泄露的威胁。在这里,我们通过设计去中心化差分隐私联邦学习算法,提出了一种安全且保护隐私的机器学习方法(PPML-Omics)。我们将 PPML-Omics 应用于分析来自三种测序技术的数据,并在三个代表性的深度学习模型下的三个主要组学数据分析任务中解决隐私问题。我们通过隐私攻击实验深入研究了隐私泄露问题,并证明了 PPML-Omics 可以保护患者的隐私。在这些应用中,PPML-Omics 在相同的隐私保护级别下都优于比较方法,这表明该方法在同时平衡隐私保护能力和组学数据分析实用性方面具有多功能性。此外,我们给出了 PPML-Omics 的隐私保护能力的理论证明,这是第一个在保护组学数据中患者隐私方面具有稳健且可推广的经验性能的数学保证方法。