Altschul S F, Lipman D J
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.
Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509-13. doi: 10.1073/pnas.87.14.5509.
Protein database searches frequently can reveal biologically significant sequence relationships useful in understanding structure and function. Weak but meaningful sequence patterns can be obscured, however, by other similarities due only to chance. By searching a database for multiple as opposed to pairwise alignments, distant relationships are much more easily distinguished from background noise. Recent statistical results permit the power of this approach to be analyzed. Given a typical query sequence, an algorithm described here permits the current protein database to be searched for three-sequence alignments in less than 4 min. Such searches have revealed a variety of subtle relationships that pairwise search methods would be unable to detect.
蛋白质数据库搜索常常能够揭示出在理解结构和功能方面有用的生物学上显著的序列关系。然而,微弱但有意义的序列模式可能会被仅因偶然因素产生的其他相似性所掩盖。通过在数据库中搜索多重比对而非两两比对,远缘关系能更轻易地从背景噪声中区分出来。最近的统计结果使得这种方法的效能得以分析。给定一个典型的查询序列,这里描述的一种算法能在不到4分钟的时间内对当前蛋白质数据库进行三序列比对搜索。这样的搜索已经揭示出了多种两两搜索方法无法检测到的微妙关系。