Karlin S, Altschul S F
Department of Mathematics, Stanford University, CA 94305.
Proc Natl Acad Sci U S A. 1993 Jun 15;90(12):5873-7. doi: 10.1073/pnas.90.12.5873.
Score-based measures of molecular-sequence features provide versatile aids for the study of proteins and DNA. They are used by many sequence data base search programs, as well as for identifying distinctive properties of single sequences. For any such measure, it is important to know what can be expected to occur purely by chance. The statistical distribution of high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several high-scoring segments for which some combined assessment is in order. This paper describes the statistical distribution for the sum of the scores of multiple high-scoring segments and illustrates its application to the identification of possible transmembrane segments and the evaluation of sequence similarity.
基于分数的分子序列特征度量为蛋白质和DNA研究提供了多功能辅助工具。许多序列数据库搜索程序都使用它们,同时也用于识别单序列的独特属性。对于任何此类度量,了解纯粹偶然情况下可能发生的情况很重要。高分片段的统计分布已在其他地方描述过。然而,分子序列经常会产生几个高分片段,对此需要进行一些综合评估。本文描述了多个高分片段得分总和的统计分布,并说明了其在识别可能的跨膜片段和评估序列相似性方面的应用。