Rangwala Huzefa, Karypis George
Department of Computer Science & Engineering, University of Minnesota Minneapolis, MN 55455, USA.
Bioinformatics. 2007 Jan 15;23(2):e17-23. doi: 10.1093/bioinformatics/btl297.
Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling.
We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO.
We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable--a critical piece of information for comparative modeling applications.
蛋白质序列比对在计算生物学中起着关键作用,因为它是许多旨在解决比较基因组学、结构和功能预测以及同源建模问题的分析任务中不可或缺的一部分。
我们开发了新颖的序列比对算法,该算法基于短的固定长度或可变长度的高得分子序列来计算一对序列之间的比对。我们的算法通过反复选择得分最高的子序列对并使用它们来构建最终比对的小部分来构建比对。我们利用PSI-BLAST生成的序列谱,并采用源自PICASSO的谱对谱评分方案。
我们在两个最近发布的基准数据集上评估了计算得到的比对的性能,并将它们与现有基于动态规划的谱对谱局部和全局序列比对算法计算得到的比对进行了比较。我们的结果表明,新算法实现的比对与现有算法相当或更好。此外,我们的结果还表明,这些算法可用于提供关于哪些比对位置更可靠的更好信息——这是比较建模应用的关键信息。