Nuffield Department of Medicine, University of Oxford, Oxford, UK; Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
Nuffield Department of Medicine, University of Oxford, Oxford, UK; Oxford University Hospitals NHS Foundation Trust, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, John Radcliffe Hospital, Oxford, UK; NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with UKHSA, Oxford, UK.
Lancet Microbe. 2024 Nov;5(11):100913. doi: 10.1016/S2666-5247(24)00152-6. Epub 2024 Oct 5.
Antimicrobial resistance (AMR) in Escherichia coli is a global problem associated with substantial morbidity and mortality. AMR-associated genes are typically annotated based on similarity to variants in a curated reference database, with the implicit assumption that uncatalogued genetic variation within these is phenotypically unimportant. In this study, we evaluated the performance of the AMRFinder tool and, subsequently, the potential for discovering new AMR-associated gene families and characterising variation within existing ones to improve genotype-to-susceptibility phenotype predictions in E coli.
In this cross-sectional study of international genome sequence data, we assembled a global dataset of 9001 E coli sequences from five publicly available data collections predominantly deriving from human bloodstream infections from: Norway, Oxfordshire (UK), Thailand, the UK, and Sweden. 8555 of these sequences had linked antibiotic susceptibility data. Raw reads were assembled using Shovill and AMR genes (relevant to amoxicillin-clavulanic acid, ampicillin, ceftriaxone, ciprofloxacin, gentamicin, piperacillin-tazobactam, and trimethoprim) extracted using the National Center for Biotechnology Information AMRFinder tool (using both default and strict [100%] coverage and identity filters). We assessed the predictive value of the presence of these genes for predicting resistance or susceptibility against US Food and Drug Administration thresholds for major and very major errors. Mash was used to calculate the similarity between extracted genes using Jaccard distances. We empirically reclustered extracted gene sequences into AMR-associated gene families (≥70% match) and antibiotic-resistance genes (ARGs; 100% match) and categorised these according to their frequency in the dataset. Accumulation curves were simulated and correlations between gene frequency in the Oxfordshire and other datasets calculated using the Spearman coefficient. Firth regression was used to model the association between the presence of bla variants and amoxicillin-clavulanic acid or piperacillin-tazobactam resistance, adjusted for the presence of other relevant ARGs.
The performance of the AMRFinder database for genotype-to-phenotype predictions using strict 100% identity and coverage thresholds did not meet US Food and Drug Administration thresholds for any of the seven antibiotics evaluated. Relaxing filters to default settings improved sensitivity with a specificity cost. For all antibiotics, most explainable resistance was associated with the presence of a small number of genes. There was a proportion of resistance that could not be explained by known ARGs; this ranged from 75·1% for amoxicillin-clavulanic acid to 3·4% for ciprofloxacin. Only 18 199 (51·5%) of the 35 343 ARGs detected had a 100% identity and coverage match in the AMRFinder database. After empirically reclassifying genes at 100% nucleotide sequence identity, we identified 1042 unique ARGs, of which 126 (12·1%) were present ten times or more, 313 (30·0%) were present between two and nine times, and 603 (57·9%) were present only once. Simulated accumulation curves revealed that discovery of new (100% match) ARGs present more than once in the dataset plateaued relatively quickly, whereas new singleton ARGs were discovered even after many thousands of isolates had been included. We identified a strong correlation (Spearman coefficient 0·76 [95% CI 0·73-0·80], p<0·0001) between the number of times an ARG was observed in Oxfordshire and the number of times it was seen internationally, with ARGs that were observed six times in Oxfordshire always being found elsewhere. Finally, using the example of bla, we showed that uncatalogued variation, including synonymous variation, is associated with potentially important phenotypic differences; for example, two common, uncatalogued bla alleles with only synonymous mutations compared with the known reference were associated with reduced resistance to amoxicillin-clavulanic acid (adjusted odds ratio 0·58 [95% CI 0·35-0·95], p=0·031) and piperacillin-tazobactam (0·50 [95% CI 0·29-0·82], p=0·005).
We highlight substantial uncatalogued genetic variation with respect to known ARGs, although a relatively small proportion of these alleles are repeatedly observed in a large international dataset suggesting strong selection pressures. The current approach of using fuzzy matching for ARG detection, ignoring the unknown effects of uncatalogued variation, is unlikely to be acceptable for future clinical deployment. The association of synonymous mutations with potentially important phenotypic differences suggests that relying solely on amino acid-based gene detection to predict resistance is unlikely to be sufficient. Finally, the inability to explain all resistance using existing knowledge highlights the importance of new target gene discovery.
National Institute for Health and Care Research, Wellcome, and UK Medical Research Council.
大肠杆菌中的抗菌药物耐药性(AMR)是一个全球性问题,与大量发病率和死亡率密切相关。AMR 相关基因通常是基于与经过精心整理的参考数据库中的变体相似性进行注释的,隐含的假设是这些变体中的未编目的遗传变异在表型上不重要。在这项研究中,我们评估了 AMRFinder 工具的性能,随后评估了发现新的 AMR 相关基因家族并对现有家族中的变异进行特征描述以改善大肠杆菌中基因型-药敏表型预测的潜力。
在这项针对国际基因组序列数据的横断面研究中,我们从挪威、牛津郡(英国)、泰国、英国和瑞典的五个主要来源于人血流感染的公开可用数据集中组装了一个包含 9001 株大肠杆菌序列的全球数据集。其中 8555 个序列与抗生素药敏数据相关联。使用 Shovill 对原始读数进行组装,使用美国国家生物技术信息中心的 AMRFinder 工具提取 AMR 基因(与阿莫西林-克拉维酸、氨苄西林、头孢曲松、环丙沙星、庆大霉素、哌拉西林-他唑巴坦和甲氧苄啶有关)(使用默认和严格[100%]覆盖和身份过滤器)。我们评估了这些基因的存在对预测对美国食品和药物管理局主要和非常大错误阈值的耐药性或敏感性的预测价值。使用 Jaccard 距离使用 Mash 计算提取基因之间的相似性。我们根据在数据集出现的频率,对提取的基因序列经验性地重新聚类到 AMR 相关基因家族(匹配度≥70%)和抗生素耐药基因(100%匹配)中,并根据其进行分类。使用模拟的累积曲线和 Spearman 系数计算牛津郡和其他数据集之间基因频率的相关性。使用 Firth 回归模型,调整其他相关抗生素耐药基因(ARGs)的存在,来拟合 bla 变体与阿莫西林-克拉维酸或哌拉西林-他唑巴坦耐药之间的存在关系。
使用严格的 100%身份和覆盖阈值的 AMRFinder 数据库进行基因型-表型预测的性能未能达到美国食品和药物管理局对评估的七种抗生素的任何一种的阈值。放宽过滤器至默认设置会以特异性为代价提高敏感性。对于所有抗生素,大多数可解释的耐药性都与少数几个基因的存在有关。有一部分耐药性无法用已知的 ARGs 来解释;这一比例范围从阿莫西林-克拉维酸的 75.1%到环丙沙星的 3.4%。在 AMRFinder 数据库中,检测到的 35343 个 ARGs 中只有 18199 个(51.5%)具有 100%的身份和覆盖匹配。在经验性地将基因按 100%核苷酸序列同一性重新分类后,我们确定了 1042 个独特的 ARGs,其中 126 个(12.1%)出现了 10 次或更多次,313 个(30.0%)出现了 2 到 9 次,603 个(57.9%)只出现了一次。模拟的累积曲线表明,发现数据集内多次出现的新(100%匹配)ARGs 相对较快地达到了稳定状态,而即使包含了数千个分离株,也可以发现新的单一样本 ARGs。我们发现,一个 ARG 在牛津郡出现的次数与它在国际上出现的次数之间存在很强的相关性(Spearman 系数 0.76 [95%CI 0.73-0.80],p<0.0001),在牛津郡观察到六次的 ARG 总是在其他地方被发现。最后,我们以 bla 为例,表明未编目的变异,包括同义变异,与潜在的重要表型差异有关;例如,与已知参考相比,两个常见的、具有同义突变的未编目的 bla 等位基因与阿莫西林-克拉维酸(调整后的优势比 0.58 [95%CI 0.35-0.95],p=0.031)和哌拉西林-他唑巴坦(0.50 [95%CI 0.29-0.82],p=0.005)的耐药性降低有关。
我们强调了与已知 ARGs 相比存在大量未编目的遗传变异,尽管在一个大型国际数据集中,这些等位基因的相对较小比例被反复观察到,这表明存在强烈的选择压力。目前使用模糊匹配检测 ARG 的方法忽略了未编目变异的未知影响,不太可能满足未来临床应用的要求。同义突变与潜在重要表型差异之间的关联表明,仅依靠基于氨基酸的基因检测来预测耐药性是不够的。最后,无法用现有知识解释所有耐药性表明需要新的靶基因发现。
英国国家卫生与保健研究所、惠康基金会和英国医学研究理事会。