利用蛋白质语言模型预测抗生素耐药机制。

Prediction of antibiotic resistance mechanisms using a protein language model.

机构信息

Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan.

Center for Exploratory Research, Research and Development Group, Hitachi, Ltd, Tokyo 185-8601, Japan.

出版信息

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae550.

DOI:10.1093/bioinformatics/btae550

PMID:39254573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11464418/

Abstract

MOTIVATION

Antibiotic resistance has emerged as a major global health threat, with an increasing number of bacterial infections becoming difficult to treat. Predicting the underlying resistance mechanisms of antibiotic resistance genes (ARGs) is crucial for understanding and combating this problem. However, existing methods struggle to accurately predict resistance mechanisms for ARGs with low similarity to known sequences and lack sufficient interpretability of the prediction models.

RESULTS

In this study, we present a novel approach for predicting ARG resistance mechanisms using ProteinBERT, a protein language model (pLM) based on deep learning. Our method outperforms state-of-the-art techniques on diverse ARG datasets, including those with low homology to the training data, highlighting its potential for predicting the resistance mechanisms of unknown ARGs. Attention analysis of the model reveals that it considers biologically relevant features, such as conserved amino acid residues and antibiotic target binding sites, when making predictions. These findings provide valuable insights into the molecular basis of antibiotic resistance and demonstrate the interpretability of pLMs, offering a new perspective on their application in bioinformatics.

AVAILABILITY AND IMPLEMENTATION

The source code is available for free at https://github.com/hmdlab/ARG-BERT. The output results of the model are published at https://waseda.box.com/v/ARG-BERT-suppl.

摘要

动机

抗生素耐药性已成为一个主要的全球健康威胁，越来越多的细菌感染变得难以治疗。预测抗生素耐药基因（ARGs）的潜在耐药机制对于理解和应对这一问题至关重要。然而，现有的方法难以准确预测与已知序列相似度低的 ARGs 的耐药机制，并且缺乏对预测模型的充分解释。

结果

在这项研究中，我们提出了一种使用 ProteinBERT 预测 ARG 耐药机制的新方法，ProteinBERT 是一种基于深度学习的蛋白质语言模型（pLM）。我们的方法在不同的 ARG 数据集上的表现优于最先进的技术，包括与训练数据同源性低的数据集，这突出了它在预测未知 ARGs 的耐药机制方面的潜力。对模型的注意力分析表明，它在进行预测时考虑了生物上相关的特征，如保守的氨基酸残基和抗生素靶结合位点。这些发现为抗生素耐药的分子基础提供了有价值的见解，并展示了 pLMs 的可解释性，为它们在生物信息学中的应用提供了新的视角。