Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA.
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
Nat Commun. 2023 Nov 24;14(1):7690. doi: 10.1038/s41467-023-43549-9.
Surveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates. Validation of two candidates in E. coli BW25113 reveals cases of conditional resistance: ΔcycA confers ciprofloxacin resistance in minimal media with D-serine, and frdD V111D confers ampicillin resistance in the presence of ampC by modifying the overlapping promoter. We expect this approach to be adaptable to other species and phenotypes.
用于管理抗生素耐药性 (AMR) 的监测计划已经产生了数千个适合数据驱动机制发现的基因组。我们提出了一个整合泛基因组学、基因注释和机器学习的工作流程,以大规模识别 AMR 基因。当应用于 12 个物种、27155 个基因组和 69 种药物时,我们 1)发现 AMR 基因转移主要局限于相关物种,在多个物种中有 925 个基因,但在多个系统发育类群中只有 8 个;2)表明面向发现的支持向量机在恢复已知 AMR 基因方面优于当代方法,与 Pyseer 相比,恢复了 263 个基因,而恢复了 145 个基因;3)鉴定了 142 个 AMR 候选基因。对大肠杆菌 BW25113 中的两个候选基因的验证揭示了条件抗性的情况:ΔcycA 在含有 D-丝氨酸的最小培养基中赋予环丙沙星抗性,而 frdD V111D 通过修饰重叠启动子在存在 ampC 时赋予氨苄西林抗性。我们期望这种方法能够适应其他物种和表型。