Karlin S
Department of Mathematics, Stanford University, California 94305-2125.
Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29;344(1310):391-402. doi: 10.1098/rstb.1994.0078.
The massive accumulation of DNA and protein sequence data poses challenges and opportunities in terms of interpretation and analysis. This presentation reviews the method of score-based sequence analysis with the objectives of discerning distinctive segments in single sequences and identifying significant common segments in sequence comparisons. A number of new results are described here for both the theory and its applications. These include distributional theory involving several high scoring segments in single sequences, distribution formulas for general scoring regimes in multiple sequence comparisons, bounds for periodic scoring assignments, sensitivity analysis of genome composition and refinements on predicting exons and genes in DNA sequences.
DNA和蛋白质序列数据的大量积累在解释和分析方面带来了挑战和机遇。本报告回顾了基于得分的序列分析方法,目的是识别单序列中的独特片段,并在序列比较中识别显著的共同片段。这里描述了该理论及其应用的一些新结果。这些结果包括涉及单序列中多个高分片段的分布理论、多序列比较中一般评分规则的分布公式、周期性评分分配的界限、基因组组成的敏感性分析以及对DNA序列中外显子和基因预测的改进。