Suppr超能文献

使用机器学习和部分同态加密实现快速且可扩展的私密基因型插补

Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption.

作者信息

Sarkar Esha, Chielle Eduardo, Gürsoy Gamze, Mazonka Oleg, Gerstein Mark, Maniatakos Michail

机构信息

Tandon School of Engineering, New York University, New York, NY 11201, USA.

New York University Abu Dhabi, Abu Dhabi, United Arab Emirates.

出版信息

IEEE Access. 2021;9:93097-93110. doi: 10.1109/access.2021.3093005. Epub 2021 Jun 28.

Abstract

The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets.

摘要

基因组测序技术的最新进展为理解人类遗传变异与疾病之间的关系提供了前所未有的机遇。然而,对大量个体进行全基因组基因分型的成本仍然过高。预测缺失遗传变异基因型的插补方法被广泛使用,特别是在全基因组关联研究中。准确的基因型插补需要复杂的统计方法。由于该问题的数据密集型和计算密集型性质,插补工作越来越多地外包出去,这引发了严重的隐私问题。在这项工作中,我们研究了使用机器学习(ML)和标准化同态加密方案——Paillier密码系统来实现快速、可扩展且准确的隐私保护基因型插补的解决方案。基于ML的隐私保护推理在单输出多类分类设置中已针对计算量大的非线性函数进行了大量优化。然而,每个个体的每个基因组有大量多类输出需要针对此应用进行进一步优化和/或近似。在这里,我们探索线性模型用于基因型插补的有效性,以便使用标准化同态加密方案将其转换为隐私保护等效模型。我们的结果表明,我们的隐私保护基因型插补方法的性能与最先进的明文解决方案相当,即使在多达80,000个目标的真实世界大规模数据集上,也能实现高达99%的微曲线下面积得分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00c9/8409799/1df263d80842/nihms-1721233-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验