2Université Paris Diderot, INSERM, IAME, UMR 1137, Sorbonne Paris Cité, F-75018 Paris, France.
3LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France.
Microb Genom. 2018 Sep;4(9). doi: 10.1099/mgen.0.000211.
Plasmid prediction may be of great interest when studying bacteria of medical importance such as Enterobacteriaceae as well as Staphylococcus aureus or Enterococcus. Indeed, many resistance and virulence genes are located on such replicons with major impact in terms of pathogenicity and spreading capacities. Beyond strain outbreak, plasmid outbreaks have been reported in particular for some extended-spectrum beta-lactamase- or carbapenemase-producing Enterobacteriaceae. Several tools are now available to explore the 'plasmidome' from whole-genome sequences with various approaches, but none of them are able to combine high sensitivity and specificity. With this in mind, we developed PlaScope, a targeted approach to recover plasmidic sequences in genome assemblies at the species or genus level. Based on Centrifuge, a metagenomic classifier, and a custom database containing complete sequences of chromosomes and plasmids from various curated databases, PlaScope classifies contigs from an assembly according to their predicted location. Compared to other plasmid classifiers, PlasFlow and cBar, it achieves better recall (0.87), specificity (0.99), precision (0.96) and accuracy (0.98) on a dataset of 70 genomes of Escherichia coli containing plasmids. In a second part, we identified 20 of the 21 chromosomal integrations of the extended-spectrum beta-lactamase coding gene in a clinical dataset of E. coli strains. In addition, we predicted virulence gene and operon locations in agreement with the literature. We also built a database for Klebsiella and correctly assigned the location for the majority of resistance genes from a collection of 12 Klebsiella pneumoniae strains. Similar approaches could also be developed for other well-characterized bacteria.
质粒预测在研究具有医学重要性的细菌(如肠杆菌科以及金黄色葡萄球菌或肠球菌)时可能非常有趣。事实上,许多耐药性和毒力基因位于这些复制子上,对致病性和传播能力有重大影响。除了菌株爆发外,质粒爆发尤其在一些产生超广谱β-内酰胺酶或碳青霉烯酶的肠杆菌科中报告。现在有几种工具可用于从全基因组序列中探索“质粒组”,采用不同的方法,但没有一种方法能够将高灵敏度和特异性结合起来。考虑到这一点,我们开发了 PlaScope,这是一种针对物种或属水平基因组组装中质粒序列的靶向方法。基于 Centrifuge,一种宏基因组分类器,以及一个包含来自各种经过精心整理的数据库的染色体和质粒完整序列的自定义数据库,PlaScope 根据其预测的位置对组装体中的contigs 进行分类。与其他质粒分类器 PlasFlow 和 cBar 相比,它在包含质粒的 70 个大肠杆菌基因组数据集上实现了更好的召回率(0.87)、特异性(0.99)、精度(0.96)和准确性(0.98)。在第二部分中,我们在一组大肠杆菌菌株的临床数据集中共确定了 21 个编码扩展谱β-内酰胺酶的基因的染色体整合中的 20 个。此外,我们预测了毒力基因和操纵子的位置,与文献一致。我们还为克雷伯氏菌构建了一个数据库,并正确分配了来自 12 株肺炎克雷伯菌的大多数耐药基因的位置。类似的方法也可以为其他特征明确的细菌开发。