MultiPhyl:一个使用分布式计算的高通量系统发育基因组学网络服务器。
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing.
作者信息
Keane Thomas M, Naughton Thomas J, McInerney James O
机构信息
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA Hinxton, UK.
出版信息
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W33-7. doi: 10.1093/nar/gkm359. Epub 2007 Jun 6.
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php.
随着全基因组测序数量的稳步增加,人们对基于大量单个基因家族进行大规模系统发育基因组分析越来越感兴趣。最大似然法(ML)已被反复证明是构建系统发育树最准确的方法之一。最近,基于最大似然的树搜索方法在算法上有了许多改进。然而,使用单台计算机分析许多基因家族的进化历史仍然可能需要很长时间。分布式计算是指一种将多台计算机的计算能力结合起来以执行一些更大规模整体计算的方法。在本文中,我们展示了分布式系统发育平台MultiPhyl的首个高通量实现,它能够利用许多异构非专用机器的闲置计算资源形成一台系统发育超级计算机。MultiPhyl允许用户同时上传数百或数千个氨基酸或核苷酸比对,并使用许多台式计算机对每个比对执行计算密集型任务,如模型选择、树搜索和自展检验。该程序实现了一组88种氨基酸模型和56种核苷酸最大似然模型以及多种用于在替代模型之间进行选择的统计方法。MultiPhyl网络服务器可供公众使用,网址为:http://www.cs.nuim.ie/distributed/multiphyl.php。