基于基因组的肠杆菌目种快速划分。

Fast genome-based delimitation of Enterobacterales species.

机构信息

Department of Biology, Wilfrid Laurier University, Waterloo, ON, Canada.

出版信息

PLoS One. 2023 Sep 14;18(9):e0291492. doi: 10.1371/journal.pone.0291492. eCollection 2023.

DOI:10.1371/journal.pone.0291492

PMID:37708115

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10501659/

Abstract

Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. However, their variable correlation with ANI has suggested that they might not be as accurate. For a where-the-rubber-meets-the-road assessment, we compared two sketching programs, mash and dashing, against ANI, in delimiting species among Esterobacterales genomes. Receiver Operating Characteristic (ROC) analysis found Area Under the Curve (AUC) values of 0.99, almost perfect species discrimination for all three measures. Subsampling to avoid over-represented species reduced these AUC values to 0.92, still highly accurate. Focused tests with ten genera, each represented by more than three species, also showed almost identical results for all methods. Shigella showed the lowest AUC values (0.68), followed by Citrobacter (0.80). All other genera, Dickeya, Enterobacter, Escherichia, Klebsiella, Pectobacterium, Proteus, Providencia and Yersinia, produced AUC values above 0.90. The species delimitation thresholds varied, with species distance ranges in a few genera overlapping the genus ranges of other genera. Mash was able to separate the E. coli + Shigella complex into 25 apparent phylogroups, four of them corresponding, roughly, to the four Shigella species represented in the data. Our results suggest that fast estimates of genome similarity are as good as ANI for species delimitation. Therefore, these estimates might suffice for covering the role of genomic similarity in bacterial taxonomy, and should increase confidence in their use for efficient bacterial identification and clustering, from epidemiological to genome-based detection of potential contaminants in farming and industry settings.

摘要

平均核苷酸同一性 (ANI) 正成为细菌物种划分的标准衡量标准。然而，它的计算时间可能比基于短核苷酸采样的相似性估计长得多，这些估计被汇编成所谓的草图。这些估计被广泛使用。然而，它们与 ANI 的可变相关性表明，它们可能并不那么准确。为了进行实地评估，我们比较了两种草图程序 mash 和 dashing 与 ANI 之间的差异，以确定 Esterobacterales 基因组中的物种界限。接收者操作特征 (ROC) 分析发现，所有三种方法的曲线下面积 (AUC) 值均为 0.99，几乎完美地实现了物种区分。通过避免过度代表物种的抽样，这些 AUC 值降低到 0.92，但仍然非常准确。对十个属进行的重点测试，每个属由三个以上的物种代表，所有方法也显示出几乎相同的结果。志贺氏菌的 AUC 值最低 (0.68)，其次是柠檬酸杆菌 (0.80)。所有其他属，如狄克氏菌、肠杆菌、大肠杆菌、克雷伯氏菌、果胶杆菌、变形杆菌、普罗维登斯菌和耶尔森氏菌，其 AUC 值均高于 0.90。物种划分阈值有所不同，一些属的物种距离范围与其他属的属范围重叠。mash 能够将大肠杆菌+志贺氏菌复合体分为 25 个明显的系统发育群，其中四个大致对应于数据中代表的四个志贺氏菌种。我们的结果表明，快速估计基因组相似性与 ANI 一样适用于物种划分。因此，这些估计可能足以涵盖基因组相似性在细菌分类学中的作用，并且应该增加对其在细菌识别和聚类中的使用的信心，从流行病学到基于基因组的对农业和工业环境中潜在污染物的检测。