Department of Electrical Engineering and Computer Science, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208, USA.
Adv Exp Med Biol. 2011;696:297-306. doi: 10.1007/978-1-4419-7046-6_30.
There has been a deluge of biological sequence data in the public domain, which makes sequence comparison one of the most fundamental computational problems in bioinformatics. The biologists routinely use pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is a well-known fact that almost everything in bioinformatics depends on the inter-relationship between sequence, structure, and function (all encapsulated in the term relatedness), which is far from being well understood. The potential relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This chapter presents a summary of recent advances in accurately estimating statistical significance of pairwise local alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence specific. Comparison of using pairwise statistical significance to rank database sequences, with well-known database search programs like BLAST, PSI-BLAST, and SSEARCH, is also presented. As expected, the sequence-comparison performance (evaluated in terms of retrieval accuracy) improves significantly as the sequence comparison process is made more and more sequence specific. Shortcomings of currently used approaches and some potentially useful directions for future work are also presented.
生物序列数据在公共领域中呈爆炸式增长,这使得序列比对成为生物信息学中最基本的计算问题之一。生物学家通常使用两两比对程序来识别相似的序列,更具体地说,是具有共同祖先的相关序列。众所周知,生物信息学中的几乎所有内容都取决于序列、结构和功能之间的相互关系(都包含在相关关系这个术语中),而这一点远未得到很好的理解。通过使序列比对过程更具序列特异性,可以通过比对得分的统计显著性来更好地判断两个序列的潜在相关性,而不仅仅是通过比对得分。本章通过使序列比对过程更具序列特异性,总结了最近在准确估计两两局部比对统计显著性以识别相关序列方面的进展,目的是为了识别相关序列。还介绍了使用两两统计显著性来对数据库序列进行排序的方法,以及与 BLAST、PSI-BLAST 和 SSEARCH 等知名数据库搜索程序的比较。正如预期的那样,随着序列比对过程变得越来越具有序列特异性,序列比较性能(以检索准确性来评估)显著提高。还介绍了当前方法的缺点以及未来工作的一些潜在有用方向。