同时进行系统发育重建和多序列比对。

Simultaneous phylogeny reconstruction and multiple sequence alignment.

作者信息

Yue Feng, Shi Jian, Tang Jijun

机构信息

Ludwig Institute for Cancer Research, UCSD School of Medicine, 9500 Gilman Drive, La Jolla, CA 92093, USA.

出版信息

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-10-S1-S11.

DOI:10.1186/1471-2105-10-S1-S11

PMID:19208110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2648791/

Abstract

BACKGROUND

A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality.

RESULTS

We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality.

CONCLUSION

We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments.

摘要

背景

系统发育树是一组生物体的进化历史。迄今为止，序列数据仍然是用于系统发育重建的最常用数据类型。在任何序列可用于系统发育重建之前，它们必须进行比对，并且多重序列比对的质量已被证明会影响推断的系统发育树的质量。同时，所有当前的多重序列比对程序都使用引导树来生成比对，并且实验表明，好的引导树可以显著提高多重比对的质量。

结果

我们设计了一种新算法，可同时比对多个序列并搜索能产生最佳比对的系统发育树。我们还将该算法实现为一个C程序包，它可以处理DNA和蛋白质数据，并且可以采用简单的代价模型以及复杂的替换矩阵，如PAM250或BLOSUM62。将新方法的性能与其他流行的多重序列比对工具的性能进行了比较，包括广泛使用的程序如ClustalW和T-Coffee。实验结果表明，该方法在系统发育准确性和比对质量方面都具有良好的性能。