Andrade Jorge, Andersen Malin, Sillén Anna, Graff Caroline, Odeberg Jacob
Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), SE-10691 Stockholm, Sweden.
Eur J Hum Genet. 2007 Jun;15(6):694-702. doi: 10.1038/sj.ejhg.5201815. Epub 2007 Mar 21.
In genetics, with increasing data sizes and more advanced algorithms for mining complex data, a point is reached where increased computational capacity or alternative solutions becomes unavoidable. Most contemporary methods for linkage analysis are based on the Lander-Green hidden Markov model (HMM), which scales exponentially with the number of pedigree members. In whole genome linkage analysis, genotype simulations become prohibitively time consuming to perform on single computers. We have developed "Grid-Allegro", a Grid aware implementation of the Allegro software, by which several thousands of genotype simulations can be performed in parallel in short time. With temporary installations of the Allegro executable and datasets on remote nodes at submission, the need of predefined Grid run-time environments is circumvented. We evaluated the performance, efficiency and scalability of this implementation in a genome scan on Swedish multiplex Alzheimer's disease families. We demonstrate that "Grid-Allegro" allows for the full exploitation of the features available in Allegro for genome-wide linkage. The implementation of existing bioinformatics applications on Grids (Distributed Computing) represent a cost-effective alternative for addressing highly resource-demanding and data-intensive bioinformatics task, compared to acquiring and setting up clusters of computational hardware in house (Parallel Computing), a resource not available to most geneticists today.
在遗传学领域,随着数据量的不断增加以及用于挖掘复杂数据的算法日益先进,达到了一个阶段,即增加计算能力或采用替代解决方案变得不可避免。大多数当代连锁分析方法基于兰德 - 格林隐马尔可夫模型(HMM),该模型的计算量会随着家系成员数量呈指数级增长。在全基因组连锁分析中,在单台计算机上进行基因型模拟会变得极其耗时。我们开发了“Grid - Allegro”,它是Allegro软件的一种支持网格计算的实现方式,通过它可以在短时间内并行执行数千次基因型模拟。通过在提交时将Allegro可执行文件和数据集临时安装在远程节点上,避免了对预定义网格运行时环境的需求。我们在对瑞典多病例阿尔茨海默病家系的基因组扫描中评估了这种实现方式的性能、效率和可扩展性。我们证明“Grid - Allegro”能够充分利用Allegro中可用于全基因组连锁分析的功能。与在内部购置和搭建计算硬件集群(并行计算)相比,在网格(分布式计算)上实现现有的生物信息学应用是解决高资源需求和数据密集型生物信息学任务的一种经济高效的替代方案,而大多数遗传学家目前无法获得并行计算所需的资源。