DeepSEA:一种用于注释抗微生物蛋白的无序列比对可解释方法。

DeepSEA: an alignment-free explainable approach to annotate antimicrobial resistance proteins.

作者信息

Borelli Tiago Cabral, Paschoal Alexandre Rossi, da Silva Ricardo Roberto

机构信息

Computational Chemical Biology Laboratory, Department of BioMolecular Sciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, 14040-900, Brazil.

NPPNS, Department of BioMolecular Sciences, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, 14040-900, Brazil.

出版信息

BMC Bioinformatics. 2025 Sep 1;26(1):224. doi: 10.1186/s12859-025-06256-4.

Abstract

Antimicrobial resistance (AMR) is one of the most concerning modern threats as it places a greater burden on health systems than HIV and malaria combined. Current surveillance strategies for tracking antimicrobial resistance (AMR) rely on genomic comparisons and depend on sequence alignment with strict similarity cutoffs of greater than 95%. Therefore, these methods have high false-negative error rates due to a lack of reference sequences with a representative coverage of AMR protein diversity. Deep learning has been used as an alternative to sequence alignment, as artificial neural networks can extract abstract features from data, thereby limiting the need for sequence comparisons. Here, a convolutional neural network (CNN) was trained to differentiate between antimicrobial resistance proteins and non-resistance proteins, and to annotate them in nine resistance classes. Our model demonstrated higher recall values (> 0.9) than the alignment-based approach for all protein classes tested. Additionally, our CNN architecture allowed us to investigate internal states and explain the model classification regarding protein domain feature importance related to antimicrobial molecule inactivation. Finally, we built an open-source bioinformatic tool ( https://github.com/computational-chemical-biology/DeepSEA-project ) that can be used to annotate antimicrobial resistance proteins and provide information on protein domains without sequence alignment.

摘要

抗菌药物耐药性(AMR)是现代最令人担忧的威胁之一,因为它给卫生系统带来的负担比艾滋病和疟疾加起来还要大。目前用于追踪抗菌药物耐药性(AMR)的监测策略依赖于基因组比较,并取决于与大于95%的严格相似性阈值进行序列比对。因此,由于缺乏具有AMR蛋白多样性代表性覆盖范围的参考序列,这些方法具有较高的假阴性错误率。深度学习已被用作序列比对的替代方法,因为人工神经网络可以从数据中提取抽象特征,从而减少对序列比较的需求。在这里,训练了一个卷积神经网络(CNN)来区分抗菌耐药蛋白和非耐药蛋白,并将它们注释为九个耐药类别。对于所有测试的蛋白质类别,我们的模型显示出比基于比对的方法更高的召回值(>0.9)。此外,我们的CNN架构使我们能够研究内部状态,并解释与抗菌分子失活相关的蛋白质结构域特征重要性的模型分类。最后,我们构建了一个开源生物信息工具(https://github.com/computational-chemical-biology/DeepSEA-project),可用于注释抗菌耐药蛋白,并在无需序列比对的情况下提供蛋白质结构域信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索