Zhou Hongyi, Zhou Yaoqi
Department of Physiology and Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York, Buffalo, NY 14214, USA.
Bioinformatics. 2005 Sep 15;21(18):3615-21. doi: 10.1093/bioinformatics/bti582. Epub 2005 Jul 14.
Multiple sequence alignment is an essential part of bioinformatics tools for a genome-scale study of genes and their evolution relations. However, making an accurate alignment between remote homologs is challenging. Here, we develop a method, called SPEM, that aligns multiple sequences using pre-processed sequence profiles and predicted secondary structures for pairwise alignment, consistency-based scoring for refinement of the pairwise alignment and a progressive algorithm for final multiple alignment.
The alignment accuracy of SPEM is compared with those of established methods such as ClustalW, T-Coffee, MUSCLE, ProbCons and PRALINE(PSI) in easy (homologs) and hard (remote homologs) benchmarks. Results indicate that the average sum of pairwise alignment scores given by SPEM are 7-15% higher than those of the methods compared in aligning remote homologs (sequence identity <30%). Its accuracy for aligning homologs (sequence identity >30%) is statistically indistinguishable from those of the state-of-the-art techniques such as ProbCons or MUSCLE 6.0.
The SPEM server and its executables are available on http://theory.med.buffalo.edu.
多序列比对是基因组规模基因及其进化关系研究的生物信息学工具的重要组成部分。然而,对远源同源物进行准确比对具有挑战性。在此,我们开发了一种名为SPEM的方法,该方法使用预处理的序列谱和预测的二级结构进行成对比对,基于一致性的评分来优化成对比对,并使用渐进算法进行最终的多序列比对。
在简单(同源物)和困难(远源同源物)基准测试中,将SPEM的比对准确性与ClustalW、T-Coffee、MUSCLE、ProbCons和PRALINE(PSI)等既定方法的准确性进行了比较。结果表明,在比对远源同源物(序列同一性<30%)时,SPEM给出的成对比对分数的平均总和比所比较的方法高7-15%。其比对同源物(序列同一性>30%)的准确性在统计学上与ProbCons或MUSCLE 6.0等最先进技术的准确性没有区别。
SPEM服务器及其可执行文件可在http://theory.med.buffalo.edu上获得。