Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
Department of Computer Science, Stanford University, Stanford, California, USA.
Nat Biotechnol. 2018 Jul;36(6):547-551. doi: 10.1038/nbt.4108. Epub 2018 May 7.
Most sequenced genomes are currently stored in strict access-controlled repositories. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets. However, concerns over genetic data privacy may deter individuals from contributing their genomes to scientific studies and could prevent researchers from sharing data with the scientific community. Although cryptographic techniques for secure data analysis exist, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy.
大多数测序基因组目前存储在严格的访问控制存储库中。免费访问这些数据可以提高全基因组关联研究(GWAS)识别致病遗传变异的能力,并有助于发现新的药物靶点。然而,对遗传数据隐私的担忧可能会阻止个人将基因组贡献给科学研究,并可能阻止研究人员与科学界共享数据。尽管存在用于安全数据分析的加密技术,但没有一种技术可以扩展到计算密集型分析,如 GWAS。在这里,我们描述了一种大规模全基因组分析协议,该协议可在 9K、13K 和 23K 个体中进行质量控制和群体分层校正,同时保持潜在基因型和表型的机密性。我们表明,该协议可以有效地扩展到一百万人。这种方法可能有助于使科学界获得当前受限制的数据,并有可能实现安全的基因组众包,允许个人在不损害其隐私的情况下将其基因组贡献给一项研究。