Hall Barry G
Bellingham Research Institute, Bellingham, WA, 98229, USA.
Cladistics. 2016 Feb;32(1):90-99. doi: 10.1111/cla.12113. Epub 2015 Mar 10.
kSNP v2 is a powerful tool for single nucleotide polymorphism (SNP) identification from complete microbial genomes and for estimating phylogenetic trees from the identified SNPs. kSNP can analyse finished genomes, genome assemblies, raw reads or any combination of those and does not require either genome alignment or reference genomes. This study uses sequence evolution simulations to evaluate the topological accuracy of kSNP trees and to assess the effects of diversity and recombination on that accuracy. The accuracies of kSNP trees are strongly affected by increasing diversity, with parsimony accuracy > maximum-likelihood accuracy > neighbour-joining accuracy. Accuracy is also strongly influenced by recombination; as recombination increases accuracy decreases. Reliable trees are arbitrarily defined as those that have ≥ 90% topological accuracy. It is determined that the best predictor of topological accuracy is the ratio of r/m, a measure of the effect of recombination, to FCK (the fraction of core kmers), a measure of diversity. Tools are available to allow investigators to determine both r/m and FCK, and the relationship between topological accuracy and the ratio of r/m to FCK is determined. The practical implication of this study is that kSNP is an effective tool for estimating phylogenetic trees from microbial genome sequences provided that both recombination and sequence diversity are within acceptable ranges.
kSNP v2是一种强大的工具,可用于从完整的微生物基因组中识别单核苷酸多态性(SNP),并根据识别出的SNP估计系统发育树。kSNP可以分析已完成的基因组、基因组组装、原始读数或它们的任何组合,并且不需要基因组比对或参考基因组。本研究使用序列进化模拟来评估kSNP树的拓扑准确性,并评估多样性和重组对该准确性的影响。kSNP树的准确性受多样性增加的强烈影响,简约法准确性>最大似然法准确性>邻接法准确性。准确性也受到重组的强烈影响;随着重组增加,准确性降低。可靠的树被任意定义为拓扑准确性≥90%的树。确定拓扑准确性的最佳预测指标是r/m(重组效应的一种度量)与FCK(核心kmer的比例,一种多样性度量)的比值。有工具可供研究人员确定r/m和FCK,并确定拓扑准确性与r/m与FCK比值之间的关系。本研究的实际意义在于,只要重组和序列多样性在可接受范围内,kSNP就是从微生物基因组序列估计系统发育树的有效工具。