Chen Gary K, Guo Yunfei
Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA.
Division of Biostatics, Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA ; Zilkha Neurogenetic Institute, University of Southern California Los Angeles, CA, USA.
Front Genet. 2013 Dec 3;4:266. doi: 10.3389/fgene.2013.00266.
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
尽管在现代全基因组关联研究(GWAS)中投入了巨额资金来收集数千人的DNA样本并生成种系变异数据,但在解释常见疾病的大部分遗传力方面,进展一直令人沮丧地缓慢。当今对每个单核苷酸多态性(SNP)标记进行独立假设检验的范式不太可能充分反映疾病风险中的复杂生物学过程。相反,例如将风险建模为在一条通路中协同作用和/或对对数风险进行非加性相互作用的SNP集合,可能是现代研究中进行基因定位的更明智方法。由于现代基因型阵列上即使是适度规模的SNP面板(50万个标记)也会带来组合难题,需要测试数百亿个模型以寻找相互作用的证据,因此在全基因组范围内实施此类分析很快就会变得难以处理。在本文中,我们深入分析了为通过使用称为图形处理单元(GPU)的显卡上的处理器来明确克服这些巨大计算障碍而开发的程序。我们提供了关于GPU技术的教程,这将说明它们为何在当今的数值科学家当中越来越受欢迎。一个明显的优势是单个GPU上可用的微处理器核心的惊人密度。高端服务器最多有24个英特尔或AMD CPU核心,而英伟达最新的GPU产品有超过2600个核心。每个计算节点最多可配备4个GPU设备。在GPU上的成功因问题而异。然而,由于这些问题中存在高度并行性,上位性筛选表现良好。我们所综述的论文经常报告GPU比标准CPU实现的加速超过两个数量级(>100倍)。