AMR-meta：一种基于 k-mer 和元特征的方法，用于从高通量短读宏基因组数据中分类抗生素耐药性。

AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data.

机构信息

Department of Computer and Information Science and Engineering, University of Florida, 2004 Mowry Road Gainesville, FL 32610, USA.

Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32611, USA.

出版信息

Gigascience. 2022 May 18;11. doi: 10.1093/gigascience/giac029.

DOI:10.1093/gigascience/giac029

PMID:35583675

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9116207/

Abstract

BACKGROUND

Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples.

RESULTS

We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2-0.9). On semi-synthetic metagenomic data-external test-on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols.

CONCLUSIONS

AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.

摘要

背景

抗菌药物耐药性（AMR）是一个全球性的健康问题。通过对微生物样本进行高通量宏基因组测序，可以通过与经过整理的 AMR 数据库进行比较来分析 AMR 基因。然而，目前的方法往往受到数据库不完整和测序样本中同源/同系物与其他非 AMR 基因的存在的影响。

结果

我们提出了一种基于 k-mer 的无数据库和无比对方法 AMR-meta，该方法结合了代数矩阵分解和正则化回归的元特征。元特征捕获了主要抗生素类别中基因的多层次多样性。AMR-meta 从宏基因组鸟枪法测序的读取中获取信息，并输出关于这些读取是否有助于对抗生素特定类别产生抗性的预测。此外，AMR-meta 使用了一种增强的训练策略，将 AMR 基因数据库与非 AMR 基因（用作负例）结合起来。我们将 AMR-meta 与 AMRPlusPlus、DeepARG 和 Meta-MARC 进行了比较，并进一步通过投票系统对它们的集成进行了测试。在交叉验证中，AMR-meta 的中位数 f-score 为 0.7（四分位距，0.2-0.9）。在半合成宏基因组数据-外部测试中，AMR-meta 的命中率平均比现有方法提高了 1.3 倍。在运行时间方面，AMR-meta 比 DeepARG 快 3 倍，比 Meta-MARC 快 30 倍，与 AMRPlusPlus 一样快。最后，我们注意到 AMR 本体论的差异以及所有工具在分类输出中的可变性都需要进一步开发基准数据和协议的标准化。