Karlin S, Brendel V
Department of Mathematics, Stanford University, CA 94305.
Science. 1992 Jul 3;257(5066):39-49. doi: 10.1126/science.1621093.
Statistical approaches help in the determination of significant configurations in protein and nucleic acid sequence data. Three recent statistical methods are discussed: (i) score-based sequence analysis that provides a means for characterizing anomalies in local sequence text and for evaluating sequence comparisons; (ii) quantile distributions of amino acid usage that reveal general compositional biases in proteins and evolutionary relations; and (iii) r-scan statistics that can be applied to the analysis of spacings of sequence markers.
统计方法有助于确定蛋白质和核酸序列数据中的重要结构。本文讨论了三种最新的统计方法:(i)基于分数的序列分析,它提供了一种表征局部序列文本异常和评估序列比较的方法;(ii)氨基酸使用的分位数分布,揭示了蛋白质中的一般组成偏差和进化关系;(iii)r扫描统计,可应用于序列标记间距的分析。