School of Civil and Environmental Engineering, Georgia Institute of Technologygrid.213917.f, Atlanta, Georgia, USA.
School of Ecological and Environmental Sciences, East China Normal University, Shanghai, China.
mSystems. 2022 Jun 28;7(3):e0128121. doi: 10.1128/msystems.01281-21. Epub 2022 May 31.
Identification of genes encoding β-lactamases (BLs) from short-read sequences remains challenging due to the high frequency of shared amino acid functional domains and motifs in proteins encoded by BL genes and related non-BL gene sequences. Divergent BL homologs can be frequently missed during similarity searches, which has important practical consequences for monitoring antibiotic resistance. To address this limitation, we built ROCker models that targeted broad classes (e.g., class A, B, C, and D) and individual families (e.g., TEM) of BLs and challenged them with mock 150-bp- and 250-bp-read data sets of known composition. ROCker identifies most-discriminant bit score thresholds in sliding windows along the sequence of the target protein sequence and hence can account for nondiscriminative domains shared by unrelated proteins. BL ROCker models showed a 0% false-positive rate (FPR), a 0% to 4% false-negative rate (FNR), and an up-to-50-fold-higher F1 score [2 × precision × recall/(precision + recall)] compared to alternative methods, such as similarity searches using BLASTx with various e-value thresholds and BL hidden Markov models, or tools like DeepARG, ShortBRED, and AMRFinder. The ROCker models and the underlying protein sequence reference data sets and phylogenetic trees for read placement are freely available through http://enve-omics.ce.gatech.edu/data/rocker-bla. Application of these BL ROCker models to metagenomics, metatranscriptomics, and high-throughput PCR gene amplicon data should facilitate the reliable detection and quantification of BL variants encoded by environmental or clinical isolates and microbiomes and more accurate assessment of the associated public health risk, compared to the current practice. Resistance genes encoding β-lactamases (BLs) confer resistance to the widely prescribed antibiotic class β-lactams. Therefore, it is important to assess the prevalence of BL genes in clinical or environmental samples for monitoring the spreading of these genes into pathogens and estimating public health risk. However, detecting BLs in short-read sequence data is technically challenging. Our ROCker model-based bioinformatics approach showcases the reliable detection and typing of BLs in complex data sets and thus contributes toward solving an important problem in antibiotic resistance surveillance. The ROCker models developed substantially expand the toolbox for monitoring antibiotic resistance in clinical or environmental settings.
由于 BL 基因和相关非 BL 基因序列编码的蛋白质中具有高度共享的氨基酸功能结构域和基序,因此从短读序列中鉴定β-内酰胺酶(BL)基因仍然具有挑战性。在相似性搜索过程中,经常会错过不同的 BL 同源物,这对监测抗生素耐药性具有重要的实际意义。为了解决这个限制,我们构建了 ROCker 模型,该模型针对 BL 的广泛类别(例如 A、B、C 和 D 类)和个别家族(例如 TEM),并用已知组成的 150 个碱基对和 250 个碱基对的模拟读段数据集对其进行了挑战。ROCker 在目标蛋白序列的滑动窗口中确定最具区分力的位得分阈值,因此可以解释不相关蛋白共享的非区分域。BL ROCker 模型的假阳性率(FPR)为 0%,假阴性率(FNR)为 0%至 4%,F1 评分高达 50 倍[2×精度×召回率/(精度+召回率)],与替代方法(例如使用 BLASTx 进行相似性搜索,具有各种 e 值阈值和 BL 隐马尔可夫模型)或工具(如 DeepARG、ShortBRED 和 AMRFinder)相比,BL ROCker 模型和底层蛋白质序列参考数据集以及用于读段放置的系统发育树均可通过 http://enve-omics.ce.gatech.edu/data/rocker-bla 免费获得。将这些 BL ROCker 模型应用于宏基因组学、宏转录组学和高通量 PCR 基因扩增子数据,应能促进对环境或临床分离物和微生物组中编码 BL 变体的可靠检测和定量,并更准确地评估相关公共卫生风险,与当前的做法相比。编码β-内酰胺酶(BL)的耐药基因对广泛使用的β-内酰胺类抗生素具有耐药性。因此,评估临床或环境样本中 BL 基因的流行率对于监测这些基因进入病原体并评估相关公共卫生风险非常重要。然而,在短读序列数据中检测 BL 具有技术挑战性。我们基于 ROCker 模型的生物信息学方法可可靠地检测和分型复杂数据集的 BL,从而有助于解决抗生素耐药性监测中的一个重要问题。开发的 ROCker 模型大大扩展了在临床或环境环境中监测抗生素耐药性的工具包。