Silberstein M, Tzemach A, Dovgolevsky N, Fishelson M, Schuster A, Geiger D
Computer Science Department, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel.
Am J Hum Genet. 2006 Jun;78(6):922-35. doi: 10.1086/504158. Epub 2006 May 1.
Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We present a distributed system called "SUPERLINK-ONLINE," for the computation of multipoint LOD scores of large inbred pedigrees. It achieves high performance via the efficient parallelization of the algorithms in SUPERLINK, a state-of-the-art serial program for these tasks, and through the use of the idle cycles of thousands of personal computers. The main algorithmic challenge has been to efficiently split a large task for distributed execution in a highly dynamic, nondedicated running environment. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either the installation of software or the maintenance of a complicated distributed environment. As the system was being developed, it was extensively tested by collaborating medical centers worldwide on a variety of real data sets, some of which are presented in this article.
在孟德尔疾病和复杂疾病的研究中,计算连锁对数计分(LOD分数)是绘制疾病易感基因图谱的一种重要工具。然而,对于具有大量缺失数据的大型近交家系,计算精确的多点似然性往往超出了单台计算机的能力范围。我们提出了一种名为“SUPERLINK-ONLINE”的分布式系统,用于计算大型近交家系的多点LOD分数。它通过对SUPERLINK(用于这些任务的最先进的串行程序)中的算法进行高效并行化,并利用数千台个人计算机的空闲周期来实现高性能。主要的算法挑战在于如何在高度动态、非专用的运行环境中有效地拆分大型任务以进行分布式执行。值得注意的是,该系统可在线使用,无需安装软件或维护复杂的分布式环境即可进行计算密集型分析。在系统开发过程中,全球各地的合作医疗中心在各种真实数据集上对其进行了广泛测试,本文展示了其中一些数据集。