Chen Zixin, Ji Chengming, Xu Wenwen, Gao Jianfeng, Huang Ji, Xu Huanliang, Qian Guoliang, Huang Junxian
College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.
StarHelix Inc, Jiangmiao Road, Nanjing, 210000, Jiangsu, China.
BMC Bioinformatics. 2025 Jan 11;26(1):10. doi: 10.1186/s12859-025-06033-3.
Antimicrobial peptides (AMPs) have been widely recognized as a promising solution to combat antimicrobial resistance of microorganisms due to the increasing abuse of antibiotics in medicine and agriculture around the globe. In this study, we propose UniAMP, a systematic prediction framework for discovering AMPs. We observe that feature vectors used in various existing studies constructed from peptide information, such as sequence, composition, and structure, can be augmented and even replaced by information inferred by deep learning models. Specifically, we use a feature vector with 2924 values inferred by two deep learning models, UniRep and ProtT5, to demonstrate that such inferred information of peptides suffice for the task, with the help of our proposed deep neural network model composed of fully connected layers and transformer encoders for predicting the antibacterial activity of peptides. Evaluation results demonstrate superior performance of our proposed model on both balanced benchmark datasets and imbalanced test datasets compared with existing studies. Subsequently, we analyze the relations among peptide sequences, manually extracted features, and automatically inferred information by deep learning models, leading to observations that the inferred information is more comprehensive and non-redundant for the task of predicting AMPs. Moreover, this approach alleviates the impact of the scarcity of positive data and demonstrates great potential in future research and applications.
由于全球医学和农业领域抗生素的滥用日益严重,抗菌肽(AMPs)已被广泛认为是对抗微生物抗药性的一种有前景的解决方案。在本研究中,我们提出了UniAMP,这是一个用于发现抗菌肽的系统预测框架。我们观察到,各种现有研究中使用的由肽信息(如序列、组成和结构)构建的特征向量,可以通过深度学习模型推断出的信息进行扩充甚至替代。具体而言,我们使用由两个深度学习模型UniRep和ProtT5推断出的具有2924个值的特征向量,借助我们提出的由全连接层和Transformer编码器组成的深度神经网络模型来预测肽的抗菌活性,以证明此类肽的推断信息足以完成该任务。评估结果表明,与现有研究相比,我们提出的模型在平衡基准数据集和不平衡测试数据集上均具有卓越的性能。随后,我们分析了肽序列、手动提取的特征以及深度学习模型自动推断的信息之间的关系,得出的观察结果是,推断出的信息对于预测抗菌肽的任务而言更全面且无冗余。此外,这种方法减轻了阳性数据稀缺的影响,并在未来的研究和应用中展现出巨大潜力。