IBM Research - Australia, 204 Lygon Street, Carlton, VIC, 3053, Australia ; Department of Computing and Information Systems, University of Melbourne, Parkville, VIC, 3010, Australia.
IBM Research - Australia, 204 Lygon Street, Carlton, VIC, 3053, Australia.
Health Inf Sci Syst. 2015 Feb 24;3(Suppl 1 HISA Big Data in Biomedicine and Healthcare 2013 Con):S3. doi: 10.1186/2047-2501-3-S1-S3. eCollection 2015.
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
全基因组关联研究(GWAS)是系统发现与特定疾病相关的单核苷酸多态性(SNP)的常用方法。常用的单变量分析方法可能会错过重要的 SNP 关联,而这些关联只有通过复杂疾病的多变量分析才能显现。然而,多变量 SNP 分析目前受到其内在计算复杂性的限制。在这项工作中,我们提出了一个利用超级计算机的计算框架。基于我们的结果,我们在维多利亚生命科学计算计划的完整“Avoca”IBM Blue Gene/Q 安装上对 110 万 SNP GWAS 数据进行了三向相互作用分析,这需要超过 5.8 年的时间。这比其他基于 CPU 的方法的估计快数百倍,比 GPU 方法的估计快四倍,表明应用于相互作用分析的硬件水平的提高如何改变可以进行的分析类型。此外,假设如我们的结果所示,线性扩展得以维持,那么在劳伦斯利弗莫尔国家实验室目前最大的 IBM Blue Gene/Q 超级计算机“sequoia”上进行相同的分析将需要不到 3 个月的时间。鉴于本研究中使用的实现可以进一步优化,这个运行时间意味着在大型现代 GWAS 上进行更高阶相互作用研究的详尽分析变得可行。