Weir B S
Department of Statistics, North Carolina State University, Raleigh 27695-8203.
J Natl Cancer Inst. 1988 May 18;80(6):395-406. doi: 10.1093/jnci/80.6.395.
Developments in the statistical analysis of DNA sequence data since 1984 are reviewed. Mathematical methods employing dynamic programming or incorporating Markov chain theory have been developed to search sequences for regions of similarity and to align sequences. When the biological forces of mutation and genetic drift are included in models, distances between aligned sequences allow the construction of evolutionary trees. Theory based on models may lead to estimates of variation of parameter estimates and so give a means of assessing the statistical significance of observed patterns and relationships. The complexity of DNA sequences, however, suggests that most statistical inferences will rest on random permutations of sequences.
回顾了自1984年以来DNA序列数据统计分析的进展。已开发出采用动态规划或纳入马尔可夫链理论的数学方法,用于在序列中搜索相似区域并对序列进行比对。当模型中纳入突变和遗传漂变的生物学因素时,比对序列之间的距离可用于构建进化树。基于模型的理论可能会导致对参数估计变异的估计,从而提供一种评估观察到的模式和关系的统计显著性的方法。然而,DNA序列的复杂性表明,大多数统计推断将基于序列的随机排列。