Sinnott-Armstrong Nicholas A, Greene Casey S, Cancare Fabio, Moore Jason H
Computational Genetics Lab, Department of Genetics, Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH, USA.
BMC Res Notes. 2009 Jul 24;2:149. doi: 10.1186/1756-0500-2-149.
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.
We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500.
Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
人类遗传学家如今能够测量来自整个人类基因组的超过一百万个DNA序列变异。新的挑战是开发出在计算上可行的方法,能够分析这些数据以寻找与常见人类疾病的关联,尤其是在存在基因上位效应的情况下。基因上位效应描述的是多个基因以复杂的非线性方式相互作用来决定个体疾病风险的情况,并且被认为在常见疾病中普遍存在。多因素降维法(MDR)是一种能够检测基因上位效应的算法。使用MDR进行详尽分析在计算上通常成本很高,尤其是对于高阶相互作用。此前这一挑战通过并行计算和昂贵的硬件得以解决。我们在此研究的方法利用了为计算机图形设计的通用硬件。在现代计算机中,图形处理单元(GPU)比中央处理器(CPU)具有更大的内存带宽和计算能力,非常适合解决这个问题。视频游戏行业的发展带来了规模经济,使得这些强大的组件能够以非常低的成本轻易获得。在此我们在GPU上实现并评估MDR算法的性能。主要关注的是进行基因上位效应分析所需的时间以及现有解决方案的性价比。
我们发现,在GPU上使用MDR始终比功能丰富的Java软件包和C++集群实现提高了每台机器的性能。与在CPU上运行Java实现的8核工作站相比,运行GPU实现的GPU工作站将计算时间缩短了160倍。这个GPU工作站的性能与在Beowulf集群上运行优化后的C++实现的150个核心相似。此外,这个GPU系统提供了极具性价比的性能,同时还能让CPU用于其他任务。包含三个GPU的GPU工作站成本为2000美元,而在Beowulf集群上获得类似性能需要150个CPU核心,包括集群系统额外的基础设施和支持成本,总计约82,500美元。
基于图形硬件的计算提供了一种经济高效的手段,无需计算集群的基础设施,就能在大型数据集上使用MDR进行基因上位效应的遗传分析。