College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027, Hangzhou,Zhejiang, P.R.China.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad336.
Antimicrobial peptides (AMPs) are promising candidates for the development of new antibiotics due to their broad-spectrum activity against a range of pathogens. However, identifying AMPs through a huge bunch of candidates is challenging due to their complex structures and diverse sequences. In this study, we propose SenseXAMP, a cross-modal framework that leverages semantic embeddings of and protein descriptors (PDs) of input sequences to improve the identification performance of AMPs. SenseXAMP includes a multi-input alignment module and cross-representation fusion module to explore the hidden information between the two input features and better leverage the fusion feature. To better address the AMPs identification task, we accumulate the latest annotated AMPs data to form more generous benchmark datasets. Additionally, we expand the existing AMPs identification task settings by adding an AMPs regression task to meet more specific requirements like antimicrobial activity prediction. The experimental results indicated that SenseXAMP outperformed existing state-of-the-art models on multiple AMP-related datasets including commonly used AMPs classification datasets and our proposed benchmark datasets. Furthermore, we conducted a series of experiments to demonstrate the complementary nature of traditional PDs and protein pre-training models in AMPs tasks. Our experiments reveal that SenseXAMP can effectively combine the advantages of PDs to improve the performance of protein pre-training models in AMPs tasks.
抗菌肽 (AMPs) 因其对多种病原体的广谱活性而成为开发新型抗生素的有前途的候选物。然而,由于其复杂的结构和多样的序列,通过大量候选物来识别 AMPs 具有挑战性。在这项研究中,我们提出了 SenseXAMP,这是一个跨模态框架,利用输入序列的语义嵌入和蛋白质描述符 (PD) 来提高 AMPs 的识别性能。SenseXAMP 包括多输入对齐模块和交叉表示融合模块,以探索两个输入特征之间的隐藏信息,并更好地利用融合特征。为了更好地解决 AMPs 识别任务,我们积累了最新的注释 AMPs 数据,以形成更慷慨的基准数据集。此外,我们通过添加 AMPs 回归任务来扩展现有的 AMPs 识别任务设置,以满足更具体的要求,如抗菌活性预测。实验结果表明,在包括常用 AMPs 分类数据集和我们提出的基准数据集在内的多个与 AMPs 相关的数据集中,SenseXAMP 优于现有的最先进模型。此外,我们进行了一系列实验,以证明传统 PDs 和蛋白质预训练模型在 AMPs 任务中的互补性。我们的实验表明,SenseXAMP 可以有效地结合 PDs 的优势,提高蛋白质预训练模型在 AMPs 任务中的性能。