Pearson Talima, Busch Joseph D, Ravel Jacques, Read Timothy D, Rhoton Shane D, U'Ren Jana M, Simonson Tatum S, Kachur Sergey M, Leadem Rebecca R, Cardon Michelle L, Van Ert Matthew N, Huynh Lynn Y, Fraser Claire M, Keim Paul
Department of Biology, Northern Arizona University, Flagstaff, AZ 86011-5640, USA.
Proc Natl Acad Sci U S A. 2004 Sep 14;101(37):13536-41. doi: 10.1073/pnas.0403844101. Epub 2004 Sep 3.
Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.
使用分子数据进行系统发育重建常常会受到平行进化的影响,从而导致关于操作分类单元之间系统发育关系的结论不准确。与其他分子标记相比,单核苷酸多态性(SNP)的突变率极低,这使得它们在新出现的病原体中很少见,但它们不易受到平行进化的影响,因此对于系统发育分析极具价值。尽管SNP具有系统发育潜力,但当通过有偏差的分类抽样发现SNP特征时,就会出现确定偏差;通过对五株不同的炭疽芽孢杆菌菌株进行全基因组比较以促进SNP发现,我们发现只有位于参考菌株之间进化路径上的多态性才会被观察到。我们在理论和模拟数据集中对此进行了说明,在这些数据集中,复杂的系统发育拓扑结构被简化为线性进化模型。使用一组990个SNP标记,我们还展示了我们拓扑结构中不同的分支如何汇聚为单个点,但能提供关于节点间距离和祖先分支起源点的准确信息。这些数据使我们能够确定炭疽芽孢杆菌的祖先根源,表明它更接近新描述的“C”分支,而不是之前描述的两个“A”或“B”分支中的任何一个。此外,C分支的亚分支生根揭示了不平等的进化速率,这似乎与生态参数和菌株属性相关。我们使用非平行进化的全基因组SNP特征能够非常精确地估计分支点和分支成员,从而更深入地了解流行病学、生态学和法医学问题。