外包全基因组关联研究的高效验证。

Efficient verification for outsourced genome-wide association studies.

机构信息

Rutgers University, Newark, NJ, USA.

University of Texas Health Science Center at Houston, TX, USA.

出版信息

J Biomed Inform. 2021 May;117:103714. doi: 10.1016/j.jbi.2021.103714. Epub 2021 Mar 10.

DOI:10.1016/j.jbi.2021.103714

PMID:33711538

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8131235/

Abstract

With cloud computing is being widely adopted in conducting genome-wide association studies (GWAS), how to verify the integrity of outsourced GWAS computation remains to be accomplished. Here, we propose two novel algorithms to generate synthetic SNPs that are indistinguishable from real SNPs. The first method creates synthetic SNPs based on the phenotype vector, while the second approach creates synthetic SNPs based on real SNPs that are most similar to the phenotype vector. The time complexity of the first approach and the second approach is Om and Omlogn, respectively, where m is the number of subjects while n is the number of SNPs. Furthermore, through a game theoretic analysis, we demonstrate that it is possible to incentivize honest behavior by the server by coupling appropriate payoffs with randomized verification. We conduct extensive experiments of our proposed methods, and the results show that beyond a formal adversarial model, when only a few synthetic SNPs are generated and mixed into the real data they cannot be distinguished from the real SNPs even by a variety of predictive machine learning models. We demonstrate that the proposed approach can ensure that logistic regression for GWAS can be outsourced in an efficient and trustworthy way.

摘要

随着云计算在全基因组关联研究（GWAS）中的广泛应用，如何验证外包 GWAS 计算的完整性仍然有待完成。在这里，我们提出了两种新的算法来生成与真实 SNP 无法区分的合成 SNP。第一种方法基于表型向量生成合成 SNP，而第二种方法则基于与表型向量最相似的真实 SNP 生成合成 SNP。第一种方法和第二种方法的时间复杂度分别为 Om 和 Omlogn，其中 m 是受试者的数量，n 是 SNP 的数量。此外，通过博弈论分析，我们证明通过将适当的报酬与随机验证相结合，可以激励服务器的诚实行为。我们对所提出的方法进行了广泛的实验，结果表明，在正式的对抗模型之外，当只生成少量的合成 SNP 并将其混入真实数据中时，即使使用各种预测机器学习模型，也无法将它们与真实 SNP 区分开来。我们证明了所提出的方法可以确保 GWAS 的逻辑回归可以以高效和值得信赖的方式进行外包。

相似文献

Efficient verification for outsourced genome-wide association studies.外包全基因组关联研究的高效验证。

J Biomed Inform. 2021 May;117:103714. doi: 10.1016/j.jbi.2021.103714. Epub 2021 Mar 10.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

Secure count query on encrypted genomic data.加密基因组数据上的安全计数查询。

J Biomed Inform. 2018 May;81:41-52. doi: 10.1016/j.jbi.2018.03.003. Epub 2018 Mar 15.

Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.通过动态聚类检测高阶全基因组上位性相互作用的云计算。

BMC Bioinformatics. 2014 Apr 10;15:102. doi: 10.1186/1471-2105-15-102.

FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.FORESEE：基于同态加密的全外包安全基因组研究

BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S5. doi: 10.1186/1472-6947-15-S5-S5. Epub 2015 Dec 21.

Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.利用深度学习和全基因组关联研究对非裔美国妇女的由上位效应驱动的早产进行分类。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):668-678. doi: 10.1109/TCBB.2018.2868667. Epub 2018 Sep 3.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study.eCEO：一种在全基因组关联研究中的高效云上位计算模型。

Bioinformatics. 2011 Apr 15;27(8):1045-51. doi: 10.1093/bioinformatics/btr091. Epub 2011 Mar 2.

Optimized homomorphic encryption solution for secure genome-wide association studies.优化的同态加密解决方案，用于安全的全基因组关联研究。

BMC Med Genomics. 2020 Jul 21;13(Suppl 7):83. doi: 10.1186/s12920-020-0719-9.

Mixture SNPs effect on phenotype in genome-wide association studies.全基因组关联研究中混合单核苷酸多态性对表型的影响。

BMC Genomics. 2015 Feb 3;16(1):3. doi: 10.1186/1471-2164-16-3.

引用本文的文献

Descriptor: .描述符：.

IEEE Data Descr. 2024;2:1-7. doi: 10.1109/ieeedata.2024.3505852. Epub 2024 Nov 26.

Blockchain Based Secure Federated Learning With Local Differential Privacy and Incentivization.基于区块链的具有局部差分隐私和激励机制的安全联邦学习

IEEE Trans Priv. 2024;1:31-44. doi: 10.1109/tp.2024.3487819. Epub 2024 Nov 8.

Genomic privacy preservation in genome-wide association studies: taxonomy, limitations, challenges, and vision.全基因组关联研究中的基因组隐私保护：分类法、局限性、挑战和展望。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae356.

Efficient Federated Kinship Relationship Identification.高效的联邦亲属关系识别

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:534-543. eCollection 2023.

Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies.全基因组关联研究中结果的隐私保护与高效验证

Proc Priv Enhanc Technol. 2022;2022(3):732-753. doi: 10.56553/popets-2022-0094.

本文引用的文献

10 Years of GWAS Discovery: Biology, Function, and Translation.全基因组关联研究十年发现：生物学、功能与转化

Am J Hum Genet. 2017 Jul 6;101(1):5-22. doi: 10.1016/j.ajhg.2017.06.005.

FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.FORESEE：基于同态加密的全外包安全基因组研究

BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S5. doi: 10.1186/1472-6947-15-S5-S5. Epub 2015 Dec 21.

Private genome analysis through homomorphic encryption.通过同态加密进行个人基因组分析。

BMC Med Inform Decis Mak. 2015;15 Suppl 5(Suppl 5):S3. doi: 10.1186/1472-6947-15-S5-S3. Epub 2015 Dec 21.

HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS.HEALER：用于全基因组关联研究中安全罕见病变异分析的精确逻辑回归同态计算

Bioinformatics. 2016 Jan 15;32(2):211-8. doi: 10.1093/bioinformatics/btv563. Epub 2015 Oct 6.

Fast Exact Search in Hamming Space With Multi-Index Hashing.基于多索引哈希的 Hamming 空间快速精确搜索。

IEEE Trans Pattern Anal Mach Intell. 2014 Jun;36(6):1107-19. doi: 10.1109/TPAMI.2013.231.

Scalable Nearest Neighbor Algorithms for High Dimensional Data.高维数据的可扩展最近邻算法。

IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2227-40. doi: 10.1109/TPAMI.2014.2321376.

Warfarin pharmacogenetics.华法林药物遗传学

Trends Cardiovasc Med. 2015 Jan;25(1):33-41. doi: 10.1016/j.tcm.2014.09.001. Epub 2014 Sep 6.

Chapter 11: Genome-wide association studies.第十一章：全基因组关联研究。

PLoS Comput Biol. 2012;8(12):e1002822. doi: 10.1371/journal.pcbi.1002822. Epub 2012 Dec 27.

An entropy test for single-locus genetic association analysis.单基因座遗传关联分析的熵检验。

BMC Genet. 2010 Mar 23;11:19. doi: 10.1186/1471-2156-11-19.

Genome-wide association studies in pharmacogenomics.基因组范围内的药物基因组学关联研究。

Nat Rev Genet. 2010 Apr;11(4):241-6. doi: 10.1038/nrg2751.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验