Bellingham Research Institute, Bellingham, Washington, USA.
J Clin Microbiol. 2011 Oct;49(10):3568-75. doi: 10.1128/JCM.00919-11. Epub 2011 Aug 17.
Minimum spanning trees (MSTs) are frequently used in molecular epidemiology research to estimate relationships among individual strains or isolates. Nevertheless, there are significant caveats to MST algorithms that have been largely ignored in molecular epidemiology studies and that have the potential to confound or alter the interpretation of the results of those analyses. Specifically, (i) presenting a single, arbitrarily selected MST illustrates only one of potentially many equally optimal solutions, and (ii) statistical metrics are not used to assess the credibility of MST estimations. Here, we survey published MSTs previously used to infer microbial population structure in order to determine the effect of these factors. We propose a technique to estimate the number of alternative MSTs for a data set and find that multiple MSTs exist for each case in our survey. By implementing a bootstrapping metric to evaluate the reliability of alternative MST solutions, we discover that they encompass a wide range of credibility values. On the basis of these observations, we conclude that current approaches to studying population structure using MSTs are inadequate. We instead propose a systematic approach to MST estimation that bases analyses on the optimal computation of an input distance matrix, provides information about the number and configurations of alternative MSTs, and allows identification of the most credible MST or MSTs by using a bootstrapping metric. It is our hope this algorithm will become the new "gold standard" approach for analyzing MSTs for molecular epidemiology so that this generally useful computational approach can be used informatively and to its full potential.
最小生成树 (MSTs) 经常用于分子流行病学研究,以估计个体菌株或分离株之间的关系。然而,在分子流行病学研究中,MST 算法存在一些重大的注意事项,但这些注意事项在很大程度上被忽视了,并且有可能混淆或改变这些分析结果的解释。具体来说,(i) 呈现一个单一的、任意选择的 MST 只说明了潜在的许多同样最佳解决方案之一,(ii) 没有使用统计指标来评估 MST 估计的可信度。在这里,我们调查了以前用于推断微生物种群结构的已发表的 MST,以确定这些因素的影响。我们提出了一种估计数据集的替代 MST 数量的技术,并发现我们调查中的每个案例都存在多个 MST。通过实现一种用于评估替代 MST 解决方案可靠性的自举度量,我们发现它们涵盖了广泛的可信度值。基于这些观察结果,我们得出结论,目前使用 MST 研究种群结构的方法是不够的。相反,我们提出了一种系统的 MST 估计方法,该方法基于输入距离矩阵的最佳计算,提供有关替代 MST 数量和配置的信息,并允许通过自举度量识别最可信的 MST 或 MST。我们希望这个算法将成为分析分子流行病学 MST 的新的“黄金标准”方法,以便能够以有意义和充分发挥其潜力的方式使用这种普遍有用的计算方法。