Keane T M, Naughton T J, Travers S A A, McInerney J O, McCormack G P
Department of Computer Science, National University of Ireland Maynooth, Ireland.
Bioinformatics. 2005 Apr 1;21(7):969-74. doi: 10.1093/bioinformatics/bti100. Epub 2004 Oct 28.
In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis.
We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.
近年来,人们对使用统计方法构建大型且准确的系统发育树越来越感兴趣。然而,对于大量的分类单元而言,仅使用单个处理器构建大型且准确的树是不可行的。为了满足最大似然法巨大的计算需求,人们开发了一些专门的并行程序。我们对当前这组并行系统发育程序表达了一些担忧,这些程序目前严重限制了基于最大似然法的系统发育分析中并行计算的广泛应用。
我们确定了系统发育分析适用于大规模异构分布式计算。我们完成了一个名为“基于最大似然法的分布式系统发育重建”的分布式且完全跨平台的系统发育树构建程序。它在所有似然计算中使用一种已经得到验证的基于最大似然法的树构建算法和一个流行的系统发育分析库。它提供了目前可用的最广泛的DNA替代模型集之一。据我们所知,我们是第一个报告完成了一个分布式系统发育树构建程序的,该程序仅使用机器的空闲时钟周期就能实现近乎线性的加速。对于那些在学术或企业环境中拥有数百台空闲台式机的人,我们展示了分布式计算如何能提供一台“免费”的最大似然法超级计算机。