Kim Miran, Harmanci Arif Ozgun, Bossuat Jean-Philippe, Carpov Sergiu, Cheon Jung Hee, Chillotti Ilaria, Cho Wonhee, Froelicher David, Gama Nicolas, Georgieva Mariya, Hong Seungwan, Hubaux Jean-Pierre, Kim Duhyeong, Lauter Kristin, Ma Yiping, Ohno-Machado Lucila, Sofia Heidi, Son Yongha, Song Yongsoo, Troncoso-Pastoriza Juan, Jiang Xiaoqian
Department of Computer Science and Engineering and Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA.
Cell Syst. 2021 Nov 17;12(11):1108-1120.e4. doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.
Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.
基因型填充是基因组数据分析中的一个基本步骤,它利用附近“标签”变体的现有基因型来预测缺失的变体基因型。尽管研究人员可以将基因型填充外包,但隐私问题可能会禁止与不可信的填充服务共享遗传数据。在此,我们利用高效的同态加密(HE)技术开发了安全的基因型填充方法。在基于HE的方法中,基因型数据在传输、存储和分析过程中都是安全的。只有所有者才能对其进行解密。我们将安全填充方法与三种最先进的非安全方法进行了比较,发现基于HE的方法在常见变体方面能够提供具有可比准确性的遗传数据安全性。基于HE的方法的时间和内存需求与非安全方法相当或更低。我们的结果表明,基于HE的方法实际上可以对高通量遗传数据分析执行资源密集型计算。源代码可在https://github.com/K-miran/secure-imputation上免费下载。