Department of Computer Science, Indiana University, Bloomington, IN, USA.
Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
Nat Methods. 2020 Mar;17(3):295-301. doi: 10.1038/s41592-020-0761-8. Epub 2020 Mar 4.
Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware-software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors-in particular, Intel's SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel 'sketching' algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.
全基因组关联研究(GWAS),特别是针对罕见疾病的研究,可能需要在多个机构之间交换敏感的基因组数据。由于隐私问题,基因组数据共享通常不可行,因此开发了加密方法,如安全多方计算(SMC)协议,旨在提供隐私保护的合作 GWAS。不幸的是,这些方法的计算开销对于人类基因组规模的数据仍然是不可承受的。在这里,我们介绍了 SkSES(https://github.com/ndokmai/sgx-genome-variants-search),这是一种用于隐私保护的合作 GWAS 的软硬 hybrid 方法,它将最先进的加密协议的运行时间提高了两个数量级。SkSES 方法基于当前一代微处理器(特别是 Intel 的 SGX)提供的可信执行环境(TEE)。为了克服 TEE 的严重内存限制,SkSES 采用了新颖的“草图”算法,在输入的 VCF 文件中维护关于基因组变体的基本统计信息。通过另外结合有效的数据压缩和群体分层减少方法,SkSES 可以快速、准确地以隐私保护的方式识别队列中的前 k 个基因组变体。