基于细菌全基因组的系统发育：构建一个新的基准数据集并评估一些现有方法。

Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

作者信息

Ahrenfeldt Johanne, Skaarup Carina, Hasman Henrik, Pedersen Anders Gorm, Aarestrup Frank Møller, Lund Ole

机构信息

Center for Biological Sequence Analysis, DTU Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark.

Department of Microbiology and Infection Control, Statens Serum Institute, Copenhagen, Denmark.

出版信息

BMC Genomics. 2017 Jan 5;18(1):19. doi: 10.1186/s12864-016-3407-6.

DOI:10.1186/s12864-016-3407-6

PMID:28056767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5217230/

Abstract

BACKGROUND

Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods.

RESULTS

Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes.

CONCLUSIONS

Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades. We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php .

摘要

背景

全基因组测序（WGS）在传染病的诊断和监测中应用越来越广泛。WGS的一个主要应用是利用数据识别疫情集群，因此需要能够从测序读数中准确、高效地推断系统发育的方法。在本研究中，我们描述了一个新创建的数据集，用于对基于WGS的流行病学数据方法进行基准测试，并展示了一项分析，其中我们使用该数据比较了一些现有方法的性能。

结果

我们的目标是创建一个模拟传染病爆发期间可能收集的测序数据类型的基准数据集。这是通过让一株大肠杆菌高突变株在实验室连续培养8天来实现的，每天将培养物分成两份，同时收集样本进行测序。结果得到了一个由101个具有已知系统发育关系的全基因组序列组成的数据集。在测序样本中，51个对应于系统发育中的内部节点，因为它们是祖先序列，而其余50个对应于叶节点。我们还使用新创建的数据集比较了三种不同的在线可用方法，这些方法从全基因组测序读数中推断系统发育：NDtree、CSI Phylogeny和REALPHY。将这些方法的输出与已知系统发育进行比较时的一个复杂情况是，系统发育方法通常构建的树中所有观察到的序列都被放置为叶节点，即使其中一些实际上是祖先序列。因此，我们设计了一种通过合并短分支（从而将一些叶节点重新定位到内部节点）对推断树进行后处理的方法，并提出了两种新的树相似性度量，同时考虑了内部节点和叶节点的身份。

结论

基于此分析，我们发现，在所研究的方法中，CSI Phylogeny性能最佳，正确识别了树中所有分支的73%和所有进化枝的71%。我们已将该实验的所有数据（原始测序读数、一致全基因组序列以及各种格式的已知系统发育描述）公开，希望其他研究团队会发现这些数据对基准测试和探索流行病学方法的性能有用。所有数据均可在以下网址免费获取：https://cge.cbs.dtu.dk/services/evolution_data.php 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6c5/5217230/d9b46ceefeb5/12864_2016_3407_Fig1_HTML.jpg

相似文献

Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.基于细菌全基因组的系统发育：构建一个新的基准数据集并评估一些现有方法。

BMC Genomics. 2017 Jan 5;18(1):19. doi: 10.1186/s12864-016-3407-6.

Automated reconstruction of whole-genome phylogenies from short-sequence reads.从短序列读段自动重建全基因组系统发育树。

Mol Biol Evol. 2014 May;31(5):1077-88. doi: 10.1093/molbev/msu088. Epub 2014 Mar 5.

snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.snpTree——一个从全基因组序列数据中识别和构建 SNP 树的网络服务器。

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2164-13-S7-S6. Epub 2012 Dec 13.

Failure of phylogeny inferred from multilocus sequence typing to represent bacterial phylogeny.基于多位点序列分型推断的系统发育失败无法代表细菌的系统发育。

Sci Rep. 2017 Jul 3;7(1):4536. doi: 10.1038/s41598-017-04707-4.

Development of a Web Tool for Escherichia coli Subtyping Based on Alleles.基于等位基因的大肠杆菌亚型分型网络工具的开发

J Clin Microbiol. 2017 Aug;55(8):2538-2543. doi: 10.1128/JCM.00737-17. Epub 2017 Jun 7.

Comparative genomics of European avian pathogenic E. Coli (APEC).欧洲禽致病性大肠杆菌（APEC）的比较基因组学

BMC Genomics. 2016 Nov 22;17(1):960. doi: 10.1186/s12864-016-3289-7.

Phylogenetic understanding of clonal populations in an era of whole genome sequencing.全基因组测序时代克隆群体的系统发育理解

Infect Genet Evol. 2009 Sep;9(5):1010-9. doi: 10.1016/j.meegid.2009.05.014. Epub 2009 May 27.

Inferring Core Genome Phylogenies for Bacteria.推断细菌的核心基因组系统发育。

Methods Mol Biol. 2021;2242:59-68. doi: 10.1007/978-1-0716-1099-2_4.

Genome-based phylogenetic analysis of Streptomyces and its relatives.基于基因组的链霉菌及其相关菌的系统发育分析。

Mol Phylogenet Evol. 2010 Mar;54(3):763-72. doi: 10.1016/j.ympev.2009.11.019. Epub 2009 Dec 3.

Reconstructing the Ancestral Relationships Between Bacterial Pathogen Genomes.重建细菌病原体基因组之间的祖先关系。

Methods Mol Biol. 2017;1535:109-137. doi: 10.1007/978-1-4939-6673-8_8.

引用本文的文献

Genomic Insights of Antibiotic-Resistant Isolated from Intensive Pig Farming in South Africa Using 'Farm-to-Fork' Approach.采用“从农场到餐桌”方法对南非集约化养猪场分离出的抗生素抗性菌进行基因组学洞察。

Antibiotics (Basel). 2025 Apr 28;14(5):446. doi: 10.3390/antibiotics14050446.

NanoCore: core-genome-based bacterial genomic surveillance and outbreak detection in healthcare facilities from Nanopore and Illumina data.NanoCore：基于核心基因组的细菌基因组监测和爆发检测，用于从 Nanopore 和 Illumina 数据的医疗保健设施中。

mSystems. 2024 Nov 19;9(11):e0108024. doi: 10.1128/msystems.01080-24. Epub 2024 Oct 7.

NanoMGT: Marker gene typing of low complexity mono-species metagenomic samples using noisy long reads.NanoMGT：利用有噪声的长读段对低复杂度单物种宏基因组样本进行标记基因分型

Biol Methods Protoc. 2024 Aug 6;9(1):bpae057. doi: 10.1093/biomethods/bpae057. eCollection 2024.

Pathogenomic profile and clonal diversity of potential zoonotic MRSA-CC7-ST789-t091-SCCmecV from human skin and soft tissue infections.从人类皮肤和软组织感染中分离出的潜在人畜共患病 MRSA-CC7-ST789-t091-SCCmecV 的病原体基因组特征和克隆多样性。

Sci Rep. 2024 Aug 20;14(1):19326. doi: 10.1038/s41598-024-67388-w.

American crocodiles (: Reptilia: Crocodilidae) visiting the facilities of a freshwater aquaculture of the Northern Pacific region, Costa Rica, carry tetracycline-resistant .造访哥斯达黎加北太平洋地区淡水水产养殖设施的美洲鳄（: 爬行纲: 鳄科）携带耐四环素的…… （原文此处似乎不完整）

Front Vet Sci. 2024 Apr 5;11:1374677. doi: 10.3389/fvets.2024.1374677. eCollection 2024.

Closing the gap: Oxford Nanopore Technologies R10 sequencing allows comparable results to Illumina sequencing for SNP-based outbreak investigation of bacterial pathogens.缩小差距：牛津纳米孔技术 R10 测序能够与 Illumina 测序相媲美，可用于基于 SNP 的细菌病原体暴发调查。

J Clin Microbiol. 2024 May 8;62(5):e0157623. doi: 10.1128/jcm.01576-23. Epub 2024 Mar 5.

Epidemiology of subspecies serotypes, isolated from imported, farmed and feral poultry in the Cayman Islands.从开曼群岛的进口家禽、养殖家禽和野生家禽中分离出的亚种血清型的流行病学。

Front Vet Sci. 2024 Feb 9;11:1331916. doi: 10.3389/fvets.2024.1331916. eCollection 2024.

Comparative genomic analysis of genomes associated with spotty liver disease, Georgia, United States.美国佐治亚州与斑点状肝病相关基因组的比较基因组分析。

Front Microbiol. 2023 Jun 29;14:1215769. doi: 10.3389/fmicb.2023.1215769. eCollection 2023.

Phylogenetic analyses of antimicrobial resistant Corynebacterium striatum strains isolated from a nosocomial outbreak in a tertiary hospital in China.中国一家三级医院医院感染爆发中分离的抗微生物药物耐药棒状杆菌的系统发育分析。

Antonie Van Leeuwenhoek. 2023 Sep;116(9):907-918. doi: 10.1007/s10482-023-01855-8. Epub 2023 Jun 27.

A Genomic Snapshot of Antibiotic-Resistant within Public Hospital Environments in South Africa.南非公立医院环境中抗生素耐药性的基因组快照。

Glob Health Epidemiol Genom. 2023 Jun 12;2023:6639983. doi: 10.1155/2023/6639983. eCollection 2023.

本文引用的文献

Whole-genome Sequencing Used to Investigate a Nationwide Outbreak of Listeriosis Caused by Ready-to-eat Delicatessen Meat, Denmark, 2014.应用全基因组测序调查 2014 年丹麦一起由即食熟食肉导致的李斯特菌病全国性暴发

Clin Infect Dis. 2016 Jul 1;63(1):64-70. doi: 10.1093/cid/ciw192. Epub 2016 Mar 29.

Solving the problem of comparing whole bacterial genomes across different sequencing platforms.解决跨不同测序平台比较完整细菌基因组的问题。

PLoS One. 2014 Aug 11;9(8):e104984. doi: 10.1371/journal.pone.0104984. eCollection 2014.

Automated reconstruction of whole-genome phylogenies from short-sequence reads.从短序列读段自动重建全基因组系统发育树。

Mol Biol Evol. 2014 May;31(5):1077-88. doi: 10.1093/molbev/msu088. Epub 2014 Mar 5.

Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.用于产志贺毒素大肠杆菌常规分型、监测和疫情检测的实时全基因组测序

J Clin Microbiol. 2014 May;52(5):1501-10. doi: 10.1128/JCM.03617-13. Epub 2014 Feb 26.

Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica.用于检测肠炎沙门氏菌暴发的全基因组测序评估

PLoS One. 2014 Feb 4;9(2):e87991. doi: 10.1371/journal.pone.0087991. eCollection 2014.

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.RAxML 版本 8：用于系统发育分析和大型系统发育后分析的工具。

Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.

snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.snpTree——一个从全基因组序列数据中识别和构建 SNP 树的网络服务器。

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2164-13-S7-S6. Epub 2012 Dec 13.

Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response.整合基于基因组的信息学，以实现全球疾病监测、信息共享和应对的现代化。

Emerg Infect Dis. 2012 Nov;18(11):e1. doi: 10.3201/eid/1811.120453.

Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing.通过全基因组测序确定的细菌大肠杆菌中自发突变的速率和分子谱。

Proc Natl Acad Sci U S A. 2012 Oct 9;109(41):E2774-83. doi: 10.1073/pnas.1210309109. Epub 2012 Sep 18.

Insights from genomics into bacterial pathogen populations.从基因组学角度洞察细菌病原体群体。

PLoS Pathog. 2012 Sep;8(9):e1002874. doi: 10.1371/journal.ppat.1002874. Epub 2012 Sep 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于细菌全基因组的系统发育：构建一个新的基准数据集并评估一些现有方法。

Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献