Goldman N
Department of Genetics, University of Cambridge, UK.
Proc Biol Sci. 1998 Sep 22;265(1407):1779-86. doi: 10.1098/rspb.1998.0502.
Despite the widespread perception that evolutionary inference from molecular sequences is a statistical problem, there has been very little attention paid to questions of experimental design. Previous consideration of this topic has led to little more than an empirical folklore regarding the choice of suitable genes for analysis, and to dispute over the best choice of taxa for inclusion in data sets. I introduce what I believe are new methods that permit the quantification of phylogenetic information in a sequence alignment. The methods use likelihood calculations based on Markov-process models of nucleotide substitution allied with phylogenetic trees, and allow a general approach to optimal experimental design. Two examples are given, illustrating realistic problems in experimental design in molecular phylogenetics and suggesting more general conclusions about the choice of genomic regions, sequence lengths and taxa for evolutionary studies.
尽管人们普遍认为从分子序列进行进化推断是一个统计问题,但实验设计问题却很少受到关注。此前对该主题的考虑,只不过产生了一些关于选择合适基因进行分析的经验性说法,以及关于数据集中纳入最佳分类群选择的争议。我介绍了一些我认为是新的方法,这些方法能够量化序列比对中的系统发育信息。这些方法使用基于核苷酸替换的马尔可夫过程模型与系统发育树的似然计算,并提供了一种进行最优实验设计的通用方法。给出了两个例子,说明了分子系统发育学实验设计中的实际问题,并就进化研究中基因组区域、序列长度和分类群的选择提出了更一般性的结论。