基于最大似然法的短序列读取进化定位的性能、准确性和网络服务器。
Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.
机构信息
The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany.
出版信息
Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.
We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
我们提出了一种进化放置算法(EPA)和一个 Web 服务器,用于在最大似然模型下快速将序列片段(短读段)分配到给定系统发育树的边缘。我们在几个真实数据集上评估了算法的准确性,并使用编辑距离和 BLAST 将其与基于两两序列比较的放置进行了比较。我们引入了一种缓慢而准确的以及一种快速而不太准确的放置算法。对于缓慢算法,我们开发了额外的启发式技术,这些技术在准确性略有下降的情况下,其运行时间几乎与快速版本相同。当使用这些额外的启发式时,对于具有大量短查询序列的数据集中,更准确算法的运行时间与简单的 BLAST 搜索相当。此外,EPA 的准确性显著更高,特别是在参考拓扑中的分类单元样本稀疏或不足时。我们的算法已集成到 RAxML 中,因此为基于树的短序列读取的进化起源和组成推断提供了一种与 BLAST 一样快速但更准确的替代方法。我们还在积极开发一个 Web 服务器,该服务器提供使用 EPA 在树上计算读取位置的免费服务。