National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada.
Microb Genom. 2020 Oct;6(10). doi: 10.1099/mgen.0.000435.
Bacterial plasmids play a large role in allowing bacteria to adapt to changing environments and can pose a significant risk to human health if they confer virulence and antimicrobial resistance (AMR). Plasmids differ significantly in the taxonomic breadth of host bacteria in which they can successfully replicate, this is commonly referred to as 'host range' and is usually described in qualitative terms of 'narrow' or 'broad'. Understanding the host range potential of plasmids is of great interest due to their ability to disseminate traits such as AMR through bacterial populations and into human pathogens. We developed the MOB-suite to facilitate characterization of plasmids and introduced a whole-sequence-based classification system based on clustering complete plasmid sequences using Mash distances (https://github.com/phac-nml/mob-suite). We updated the MOB-suite database from 12 091 to 23 671 complete sequences, representing 17 779 unique plasmids. With advances in new algorithms for rapidly calculating average nucleotide identity (ANI), we compared clustering characteristics using two different distance measures - Mash and ANI - and three clustering algorithms on the unique set of plasmids. The plasmid nomenclature is designed to group highly similar plasmids together that are unlikely to have multiple representatives within a single cell. Based on our results, we determined that clusters generated using Mash and complete-linkage clustering at a Mash distance of 0.06 resulted in highly homogeneous clusters while maintaining cluster size. The taxonomic distribution of plasmid biomarker sequences for replication and relaxase typing, in combination with MOB-suite whole-sequence-based clusters have been examined in detail for all high-quality publicly available plasmid sequences. We have incorporated prediction of plasmid replication host range into the MOB-suite based on observed distributions of these sequence features in combination with known plasmid hosts from the literature. Host range is reported as the highest taxonomic rank that covers all of the plasmids which share replicon or relaxase biomarkers or belong to the same MOB-suite cluster code. Reporting host range based on these criteria allows for comparisons of host range between studies and provides information for plasmid surveillance.
细菌质粒在使细菌适应不断变化的环境方面发挥着重要作用,如果它们赋予细菌毒性和抗微生物药物耐药性(AMR),则会对人类健康构成重大威胁。质粒在能够成功复制的宿主细菌的分类广度上有很大的差异,这通常被称为“宿主范围”,并且通常用“窄”或“宽”来定性描述。由于质粒能够通过细菌种群传播 AMR 等特征并进入人类病原体,因此了解质粒的宿主范围潜力具有重要意义。我们开发了 MOB-suite 来方便质粒的特征描述,并引入了一种基于使用 Mash 距离对完整质粒序列进行聚类的全序列分类系统(https://github.com/phac-nml/mob-suite)。我们将 MOB-suite 数据库从 12091 个更新到 23671 个完整序列,代表 17779 个独特质粒。随着用于快速计算平均核苷酸同一性(ANI)的新算法的进步,我们比较了使用两种不同距离度量(Mash 和 ANI)和三种聚类算法对独特质粒集的聚类特征。质粒命名法旨在将高度相似的质粒分组在一起,这些质粒不太可能在单个细胞内有多个代表。根据我们的结果,我们确定使用 Mash 和完整链接聚类在 Mash 距离为 0.06 时生成的聚类具有高度均匀的聚类,同时保持聚类大小。复制和松弛酶分型的质粒生物标志物序列的分类分布,结合 MOB-suite 基于全序列的聚类,已详细检查了所有高质量的公开可用质粒序列。我们已经将质粒复制宿主范围的预测纳入了 MOB-suite,方法是结合文献中的已知质粒宿主观察到这些序列特征的分布,并结合已知质粒宿主。宿主范围报告为覆盖共享复制子或松弛酶生物标志物或属于同一 MOB-suite 聚类代码的所有质粒的最高分类等级。根据这些标准报告宿主范围允许在研究之间进行宿主范围的比较,并提供质粒监测信息。