基因组分类方法的基准测试

Benchmarking of methods for genomic taxonomy.

作者信息

Larsen Mette V, Cosentino Salvatore, Lukjancenko Oksana, Saputra Dhany, Rasmussen Simon, Hasman Henrik, Sicheritz-Pontén Thomas, Aarestrup Frank M, Ussery David W, Lund Ole

机构信息

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.

出版信息

J Clin Microbiol. 2014 May;52(5):1529-39. doi: 10.1128/JCM.02981-13. Epub 2014 Feb 26.

DOI:10.1128/JCM.02981-13

PMID:24574292

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3993634/

Abstract

One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

摘要

当遇到感兴趣的原核生物时，首先出现的问题之一是它是什么——也就是说，它属于哪个物种。16S rRNA基因构成了基于序列的分类学第一种方法的基础，并且对微生物学领域产生了巨大影响。然而，该方法已被发现存在一些缺点。在当前的研究中，我们在一个完整基因组的公共数据集上对五种基于全基因组序列的原核生物物种鉴定方法进行了训练和基准测试：（i）基于完整16S rRNA基因的SpeciesFinder；（ii）在16S rRNA基因或gyrB基因（针对肠杆菌科）中搜索物种特异性50聚体的Reads2Type；（iii）对多达53个核糖体基因进行采样的核糖体多位点序列分型（rMLST）方法；（iv）基于物种特异性功能蛋白结构域图谱的TaxonomyFinder；最后（v）检查共现k聚体（DNA序列数据中k个核苷酸的子串）数量的KmerFinder。随后在来自公共数据库的三个短序列读数或草图基因组数据集上评估了这些方法的性能。总的来说，评估集构成了来自超过11,000个分离株的序列数据，涵盖159个属和243个物种。我们的结果表明，仅对染色体核心基因进行采样的方法在区分最近才分化的密切相关物种方面存在困难。KmerFinder方法总体准确率最高，在评估集中正确鉴定了93%至97%的分离株。

相似文献

Benchmarking of methods for genomic taxonomy.基因组分类方法的基准测试

J Clin Microbiol. 2014 May;52(5):1529-39. doi: 10.1128/JCM.02981-13. Epub 2014 Feb 26.

Reads2Type: a web application for rapid microbial taxonomy identification.Reads2Type：一款用于快速微生物分类鉴定的网络应用程序。

BMC Bioinformatics. 2015 Nov 25;16:398. doi: 10.1186/s12859-015-0829-0.

Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain.核糖体多位点序列分型：从域到株对细菌的全面特征分析。

Microbiology (Reading). 2012 Apr;158(Pt 4):1005-1015. doi: 10.1099/mic.0.055459-0. Epub 2012 Jan 27.

A Genus Definition for and Based on a Standard Genome Relatedness Index.基于标准基因组相关性指数的和属定义。

mBio. 2020 Jan 14;11(1):e02475-19. doi: 10.1128/mBio.02475-19.

Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.原核生物分类学中的多位点序列分析（MLSA）。

Syst Appl Microbiol. 2015 Jun;38(4):237-45. doi: 10.1016/j.syapm.2015.03.007. Epub 2015 Apr 11.

Ribosomal MLST nucleotide identity (rMLST-NI), a rapid bacterial species identification method: application to and genomic species validation.核糖体 MLST 核苷酸同一性（rMLST-NI），一种快速的细菌种属鉴定方法：在和基因组种属验证中的应用。

Microb Genom. 2022 Sep;8(9). doi: 10.1099/mgen.0.000849.

16S-gyrB-rpoB multilocus sequence analysis for species identification in the genus Microbispora.用于微小双孢菌属物种鉴定的16S-gyrB-rpoB多位点序列分析

Antonie Van Leeuwenhoek. 2016 Jun;109(6):801-15. doi: 10.1007/s10482-016-0680-y. Epub 2016 Mar 17.

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea.将基因组学纳入细菌和古菌的分类学和系统学中。

Int J Syst Evol Microbiol. 2014 Feb;64(Pt 2):316-324. doi: 10.1099/ijs.0.054171-0.

Multilocus sequence analysis of the family Halomonadaceae.多基因序列分析家族盐单胞菌科。

Int J Syst Evol Microbiol. 2012 Mar;62(Pt 3):520-538. doi: 10.1099/ijs.0.032938-0. Epub 2011 Apr 8.

MLST revisited: the gene-by-gene approach to bacterial genomics.重新审视 MLST：基于基因的细菌基因组学研究方法。

Nat Rev Microbiol. 2013 Oct;11(10):728-36. doi: 10.1038/nrmicro3093. Epub 2013 Sep 2.

引用本文的文献

Bacteremia Caused by a Putative Novel Species in the Genus : A Case Report and Genomic Analysis.由属内一种假定新物种引起的菌血症：一例报告及基因组分析

Life (Basel). 2025 Aug 3;15(8):1227. doi: 10.3390/life15081227.

Surviving antibiotic treatment as a gut bacterium: genomic characterization of an Enterobacter cloacae.作为肠道细菌在抗生素治疗下存活：阴沟肠杆菌的基因组特征

BMC Genom Data. 2025 Aug 12;26(1):56. doi: 10.1186/s12863-025-01346-x.

LC-MS/MS metabolomics unravels the resistant phenotype of carbapenemase-producing Enterobacterales.液相色谱-串联质谱代谢组学揭示产碳青霉烯酶肠杆菌科细菌的耐药表型。

Metabolomics. 2025 Aug 12;21(5):115. doi: 10.1007/s11306-025-02300-9.

In Vitro Susceptibility to Imipenem/Relebactam and Comparators in a Multicentre Collection of Complex Isolates.多中心复杂分离株集合中对亚胺培南/瑞来巴坦及对照药物的体外敏感性

Antibiotics (Basel). 2025 Jul 5;14(7):682. doi: 10.3390/antibiotics14070682.

Integrating Nanopore MinION Sequencing into National Animal Health AMR Surveillance Programs: An Indonesian Pilot Study of Chicken Slaughterhouse Effluent and Rivers.将纳米孔MinION测序技术整合到国家动物卫生抗菌药物耐药性监测计划中：印度尼西亚对鸡肉屠宰场废水和河流的一项试点研究

Antibiotics (Basel). 2025 Jun 20;14(7):624. doi: 10.3390/antibiotics14070624.

Broad spectrum of β-lactamase coverage and potent antimicrobial activity of xeruborbactam in combination with meropenem against carbapenemase-producing Enterobacterales, including strains resistant to new β-lactam/β-lactamase inhibitor combinations.西鲁巴坦与美罗培南联合使用时，对产碳青霉烯酶肠杆菌科细菌具有广谱β-内酰胺酶覆盖范围和强大的抗菌活性，包括对新型β-内酰胺/β-内酰胺酶抑制剂组合耐药的菌株。

Antimicrob Agents Chemother. 2025 Sep 3;69(9):e0053325. doi: 10.1128/aac.00533-25. Epub 2025 Jul 25.

The impact of zinc supplementation on carbapenem MICs among bacteria expressing IMP metallo-beta-lactamase.补充锌对表达IMP金属β-内酰胺酶的细菌中碳青霉烯类最低抑菌浓度的影响。

Access Microbiol. 2025 Jun 26;7(6). doi: 10.1099/acmi.0.000972.v4. eCollection 2025.

Whole-Genome Sequencing of Blood-Isolated Lactobacillus johnsonii in Thailand: Clinical Implications and Public Health Relevance.泰国血液分离的约氏乳杆菌全基因组测序：临床意义与公共卫生相关性

Am J Case Rep. 2025 Jul 5;26:e947564. doi: 10.12659/AJCR.947564.

Comparative genomics of Acinetobacter baumannii from Egyptian healthcare settings reveals high-risk clones and resistance gene mobilization.来自埃及医疗机构的鲍曼不动杆菌比较基因组学揭示了高风险克隆和耐药基因的转移。

BMC Infect Dis. 2025 Jun 11;25(1):803. doi: 10.1186/s12879-025-11185-x.

Reference Whole Genome Sequence Analyses and Characterization of a Novel Distinct Sequence Type Isolated from a North American Gray Wolf () Gastrointestinal Tract.北美灰狼胃肠道分离出的一种新型独特序列型的参考全基因组序列分析与特征描述

Vet Sci. 2025 Apr 27;12(5):410. doi: 10.3390/vetsci12050410.

本文引用的文献

Rapid whole-genome sequencing for detection and characterization of microorganisms directly from clinical samples.直接从临床样本中检测和鉴定微生物的快速全基因组测序。

J Clin Microbiol. 2014 Jan;52(1):139-46. doi: 10.1128/JCM.02452-13. Epub 2013 Oct 30.

Description of an unusual Neisseria meningitidis isolate containing and expressing Neisseria gonorrhoeae-Specific 16S rRNA gene sequences.描述一种含有并表达淋病奈瑟菌特异性 16S rRNA 基因序列的不寻常脑膜炎奈瑟菌分离株。

J Clin Microbiol. 2013 Oct;51(10):3199-206. doi: 10.1128/JCM.00309-13. Epub 2013 Jul 17.

Description of Bacillus toyonensis sp. nov., a novel species of the Bacillus cereus group, and pairwise genome comparisons of the species of the group by means of ANI calculations.藤黄芽胞杆菌的描述，芽胞杆菌蜡样芽胞杆菌群的一个新种，以及通过ANI 计算对该种群的种进行成对基因组比较。

Syst Appl Microbiol. 2013 Sep;36(6):383-91. doi: 10.1016/j.syapm.2013.04.008. Epub 2013 Jun 19.

Real-time genomic epidemiological evaluation of human Campylobacter isolates by use of whole-genome multilocus sequence typing.应用全基因组多位点序列分型技术实时进行人类弯曲杆菌分离株的基因组流行病学评估。

J Clin Microbiol. 2013 Aug;51(8):2526-34. doi: 10.1128/JCM.00066-13. Epub 2013 May 22.

Highlights on molecular identification of closely related species.密切相关物种的分子鉴定要点。

Infect Genet Evol. 2013 Jan;13:67-75. doi: 10.1016/j.meegid.2012.08.011. Epub 2012 Sep 12.

Routine use of microbial whole genome sequencing in diagnostic and public health microbiology.微生物全基因组测序在诊断和公共卫生微生物学中的常规应用。

PLoS Pathog. 2012;8(8):e1002824. doi: 10.1371/journal.ppat.1002824. Epub 2012 Aug 2.

A genomic approach to bacterial taxonomy: an examination and proposed reclassification of species within the genus Neisseria.基因组方法在细菌分类学中的应用：对奈瑟氏菌属内各物种的研究与重新分类建议。

Microbiology (Reading). 2012 Jun;158(Pt 6):1570-1580. doi: 10.1099/mic.0.056077-0. Epub 2012 Mar 15.

Ribosomal multilocus sequence typing: universal characterization of bacteria from domain to strain.核糖体多位点序列分型：从域到株对细菌的全面特征分析。

Microbiology (Reading). 2012 Apr;158(Pt 4):1005-1015. doi: 10.1099/mic.0.055459-0. Epub 2012 Jan 27.

Multilocus sequence typing of total-genome-sequenced bacteria.全基因组测序细菌的多位点序列分型。

J Clin Microbiol. 2012 Apr;50(4):1355-61. doi: 10.1128/JCM.06094-11. Epub 2012 Jan 11.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验