最小生成树在分子流行病学中的不足。

Inadequacies of minimum spanning trees in molecular epidemiology.

机构信息

Bellingham Research Institute, Bellingham, Washington, USA.

出版信息

J Clin Microbiol. 2011 Oct;49(10):3568-75. doi: 10.1128/JCM.00919-11. Epub 2011 Aug 17.

DOI:10.1128/JCM.00919-11

PMID:21849692

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3187300/

Abstract

Minimum spanning trees (MSTs) are frequently used in molecular epidemiology research to estimate relationships among individual strains or isolates. Nevertheless, there are significant caveats to MST algorithms that have been largely ignored in molecular epidemiology studies and that have the potential to confound or alter the interpretation of the results of those analyses. Specifically, (i) presenting a single, arbitrarily selected MST illustrates only one of potentially many equally optimal solutions, and (ii) statistical metrics are not used to assess the credibility of MST estimations. Here, we survey published MSTs previously used to infer microbial population structure in order to determine the effect of these factors. We propose a technique to estimate the number of alternative MSTs for a data set and find that multiple MSTs exist for each case in our survey. By implementing a bootstrapping metric to evaluate the reliability of alternative MST solutions, we discover that they encompass a wide range of credibility values. On the basis of these observations, we conclude that current approaches to studying population structure using MSTs are inadequate. We instead propose a systematic approach to MST estimation that bases analyses on the optimal computation of an input distance matrix, provides information about the number and configurations of alternative MSTs, and allows identification of the most credible MST or MSTs by using a bootstrapping metric. It is our hope this algorithm will become the new "gold standard" approach for analyzing MSTs for molecular epidemiology so that this generally useful computational approach can be used informatively and to its full potential.

摘要

最小生成树 (MSTs) 经常用于分子流行病学研究，以估计个体菌株或分离株之间的关系。然而，在分子流行病学研究中，MST 算法存在一些重大的注意事项，但这些注意事项在很大程度上被忽视了，并且有可能混淆或改变这些分析结果的解释。具体来说，(i) 呈现一个单一的、任意选择的 MST 只说明了潜在的许多同样最佳解决方案之一，(ii) 没有使用统计指标来评估 MST 估计的可信度。在这里，我们调查了以前用于推断微生物种群结构的已发表的 MST，以确定这些因素的影响。我们提出了一种估计数据集的替代 MST 数量的技术，并发现我们调查中的每个案例都存在多个 MST。通过实现一种用于评估替代 MST 解决方案可靠性的自举度量，我们发现它们涵盖了广泛的可信度值。基于这些观察结果，我们得出结论，目前使用 MST 研究种群结构的方法是不够的。相反，我们提出了一种系统的 MST 估计方法，该方法基于输入距离矩阵的最佳计算，提供有关替代 MST 数量和配置的信息，并允许通过自举度量识别最可信的 MST 或 MST。我们希望这个算法将成为分析分子流行病学 MST 的新的“黄金标准”方法，以便能够以有意义和充分发挥其潜力的方式使用这种普遍有用的计算方法。

相似文献

Inadequacies of minimum spanning trees in molecular epidemiology.最小生成树在分子流行病学中的不足。

J Clin Microbiol. 2011 Oct;49(10):3568-75. doi: 10.1128/JCM.00919-11. Epub 2011 Aug 17.

Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.只见树木不见森林：基于最小生成树（MST）的系统发育分析中最小生成树森林的大小及分支意义

PLoS One. 2015 Mar 23;10(3):e0119315. doi: 10.1371/journal.pone.0119315. eCollection 2015.

Hierarchical clustering in minimum spanning trees.最小生成树中的层次聚类。

Chaos. 2015 Feb;25(2):023107. doi: 10.1063/1.4908014.

Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees.用雷尼互信息和最小生成树量化数据依赖关系。

Entropy (Basel). 2019 Jan 22;21(2):100. doi: 10.3390/e21020100.

Molecular epidemiology of Neisseria gonorrhoeae strains circulating in Indonesia using multi-locus variable number tandem repeat analysis (MLVA) and Neisseria gonorrhoeae multi-antigen sequence typing (NG-MAST) techniques.采用多位点可变数目串联重复分析（MLVA）和淋病奈瑟菌多抗原序列分型（NG-MAST）技术对印度尼西亚流行的淋病奈瑟菌菌株进行分子流行病学研究。

BMC Infect Dis. 2018 Jan 5;18(1):7. doi: 10.1186/s12879-017-2940-5.

Subspecies differentiation and genotyping of Francisella tularensis strains isolated from clinical and environmental samples.从临床和环境样本中分离出的土拉弗朗西斯菌菌株的亚种分化和基因分型。

Lett Appl Microbiol. 2018 Dec;67(6):550-556. doi: 10.1111/lam.13063. Epub 2018 Nov 13.

Application of whole genome sequence data in analyzing the molecular epidemiology of Shiga toxin-producing Escherichia coli O157:H7/H.全基因组序列数据在分析产志贺毒素大肠杆菌O157:H7分子流行病学中的应用

Int J Food Microbiol. 2018 Jan 2;264:39-45. doi: 10.1016/j.ijfoodmicro.2017.10.019. Epub 2017 Oct 17.

Multispacer typing to study the genotypic distribution of Bartonella henselae populations.采用多间隔区分型研究亨氏巴尔通体群体的基因型分布。

J Clin Microbiol. 2006 Jul;44(7):2499-506. doi: 10.1128/JCM.00498-06.

From types to trees: reconstructing the spatial spread of Staphylococcus aureus based on DNA variation.从类型到树：基于 DNA 变异重建金黄色葡萄球菌的空间传播。

Int J Med Microbiol. 2011 Dec;301(8):614-8. doi: 10.1016/j.ijmm.2011.09.007. Epub 2011 Oct 8.

Theory of minimum spanning trees. I. Mean-field theory and strongly disordered spin-glass model.最小生成树理论。I. 平均场理论与强无序自旋玻璃模型。

Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Feb;81(2 Pt 1):021130. doi: 10.1103/PhysRevE.81.021130. Epub 2010 Feb 25.

引用本文的文献

Analyses of the Genetic Diversity and Population Structures of spp. Clinical Isolates from Paraíba, Brazil.巴西帕拉伊巴州 spp. 临床分离株的遗传多样性和种群结构分析。

J Fungi (Basel). 2024 Dec 9;10(12):848. doi: 10.3390/jof10120848.

Identification and epidemiological analysis of a putative novel hantavirus in Australian flying foxes.澳大利亚狐蝠中一种假定新型汉坦病毒的鉴定与流行病学分析。

Virus Genes. 2025 Feb;61(1):71-80. doi: 10.1007/s11262-024-02113-3. Epub 2024 Oct 11.

Outbreak of Typhimurium linked to Swedish pre-washed rocket salad, Sweden, September to November 2022.2022 年 9 月至 11 月，瑞典与预洗火箭沙拉有关的肠炎沙门氏菌暴发。

Euro Surveill. 2024 Mar;29(10). doi: 10.2807/1560-7917.ES.2024.29.10.2300299.

Investigation of an international water polo tournament in Czechia as a potential source for early introduction of the SARS-CoV-2 Omicron variant into Belgium, Switzerland and Germany, November 2021.调查 2021 年 11 月在捷克举行的一项国际水球锦标赛，以确定其是否为 SARS-CoV-2 奥密克戎变异株传入比利时、瑞士和德国的早期源头。

Euro Surveill. 2023 Nov;28(45). doi: 10.2807/1560-7917.ES.2023.28.45.2300018.

Genotyping and biofilm formation of Mycoplasma hyopneumoniae and their association with virulence.猪肺炎支原体的基因分型和生物膜形成及其与毒力的关系。

Vet Res. 2022 Nov 17;53(1):95. doi: 10.1186/s13567-022-01109-x.

Molecular epidemiology of community- and hospital-associated Clostridioides difficile infections in Jönköping, Sweden, October 2017 - March 2018.2017 年 10 月至 2018 年 3 月瑞典延雪平社区和医院相关艰难梭菌感染的分子流行病学研究。

APMIS. 2022 Nov;130(11):661-670. doi: 10.1111/apm.13270. Epub 2022 Sep 6.

Outbreak of Newport linked to imported frozen cooked crayfish in dill brine, Sweden, July to November 2019.2019 年 7 月至 11 月，瑞典因进口的用莳萝腌制的冷冻熟小龙虾导致新堡型肝炎暴发。

Euro Surveill. 2022 Jun;27(22). doi: 10.2807/1560-7917.ES.2022.27.22.2100918.

Trends in the molecular epidemiology and population genetics of emerging species.新出现物种的分子流行病学和群体遗传学趋势

Stud Mycol. 2021 Dec 17;100:100129. doi: 10.1016/j.simyco.2021.100129. eCollection 2021 Sep.

Exploring genetic diversity, population structure, and phylogeography in species using AFLP markers.利用扩增片段长度多态性（AFLP）标记探索物种的遗传多样性、种群结构和系统地理学。

Stud Mycol. 2021 Nov 30;100:100131. doi: 10.1016/j.simyco.2021.100131. eCollection 2021 Sep.

Identification of a canine coronavirus in Australian racing Greyhounds.鉴定澳大利亚赛狗中的犬冠状病毒。

J Vet Diagn Invest. 2022 Jan;34(1):77-81. doi: 10.1177/10406387211054819. Epub 2021 Oct 26.

本文引用的文献

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间：一种使用自展法的方法。

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

Genetic diversity of human isolates of Mycobacterium bovis assessed by spoligotyping and Variable Number Tandem Repeat genotyping.应用 spoligotyping 和可变数目串联重复基因分型评估牛分枝杆菌人分离株的遗传多样性。

Infect Genet Evol. 2011 Jan;11(1):175-80. doi: 10.1016/j.meegid.2010.09.004. Epub 2010 Sep 22.

Spoligotype-based comparative population structure analysis of multidrug-resistant and isoniazid-monoresistant Mycobacterium tuberculosis complex clinical isolates in Poland.基于 spoligotype 的波兰耐多药和异烟肼单耐药结核分枝杆菌复合临床分离株的比较种群结构分析。

J Clin Microbiol. 2010 Nov;48(11):3899-909. doi: 10.1128/JCM.00572-10. Epub 2010 Sep 1.

Genetic epidemiology of the sudden oak death pathogen Phytophthora ramorum in California.加利福尼亚的突然橡树死亡病原体拟枝孢菌的遗传流行病学。

Mol Ecol. 2009 Nov;18(22):4577-90. doi: 10.1111/j.1365-294X.2009.04379.x. Epub 2009 Oct 16.

Phylogeographical and molecular characterization of an emerging Mycobacterium tuberculosis clone in Trinidad and Tobago.特立尼达和多巴哥新兴结核分枝杆菌克隆的系统地理学和分子特征。

Infect Genet Evol. 2009 Dec;9(6):1336-44. doi: 10.1016/j.meegid.2009.09.006. Epub 2009 Sep 22.

Arlequin (version 3.0): an integrated software package for population genetics data analysis.Arlequin（版本 3.0）：一个用于群体遗传学数据分析的集成软件包。

Evol Bioinform Online. 2007 Feb 23;1:47-50.

Inference of population structure using multilocus genotype data: dominant markers and null alleles.利用多位点基因型数据推断群体结构：显性标记和无效等位基因。

Mol Ecol Notes. 2007 Jul 1;7(4):574-578. doi: 10.1111/j.1471-8286.2007.01758.x.

Frequent emergence and limited geographic dispersal of methicillin-resistant Staphylococcus aureus.耐甲氧西林金黄色葡萄球菌的频繁出现及有限的地域传播。

Proc Natl Acad Sci U S A. 2008 Sep 16;105(37):14130-5. doi: 10.1073/pnas.0804178105. Epub 2008 Sep 4.

Molecular epidemiology of pneumococci obtained from Gambian children aged 2-29 months with invasive pneumococcal disease during a trial of a 9-valent pneumococcal conjugate vaccine.在一项9价肺炎球菌结合疫苗试验期间，从冈比亚2至29个月大患侵袭性肺炎球菌疾病的儿童身上获取的肺炎球菌的分子流行病学研究。

BMC Infect Dis. 2008 Jun 11;8:81. doi: 10.1186/1471-2334-8-81.

Reconstruction of the Sudden Oak Death epidemic in California through microsatellite analysis of the pathogen Phytophthora ramorum.通过对致病疫霉（Phytophthora ramorum）的微卫星分析重建加利福尼亚州橡树猝死疫情。

Mol Ecol. 2008 Jun;17(11):2755-68. doi: 10.1111/j.1365-294X.2008.03773.x. Epub 2008 Apr 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验