Xu Jing, Ruan Xiaoli, Yang Jing, Hu Bingqi, Li Shaobo, Hu Jianjun
State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
Comput Biol Chem. 2024 Apr;109:108033. doi: 10.1016/j.compbiolchem.2024.108033. Epub 2024 Feb 20.
As a promising alternative to conventional antibiotic drugs in the biomedical field, functional peptide has been widely used in disease treatment owing to its low toxicity, high absorption rate, and biological activity. Recently, several machine learning methods have been developed for functional peptide prediction. However, the main research heavily relies on statistical features and few consider multifunctional peptide identification. So, we propose SME-MFP, a novel predictor in the imbalanced multi-label functional peptide datasets. First, we employ physicochemical and evolutionary information to represent the peptide sequence's initialization features from multiple perspectives. Second, the features are fused and then put into spatial feature extractors, where the residual connection and multiscale convolutional neural network extract more discriminative features of different lengths' peptide sequences. Besides, we also design AFT-based temporal feature extractors to fully capture the global interactions of the sequences. Finally, devising a new loss to replace the traditional cross entropy loss to settle the class imbalance problems. The results show that our framework not only enhances the model's ability to capture sequence features effectively, but also accuracy improves by 3.89% over existing methods on public peptide datasets.
作为生物医学领域传统抗生素药物的一种有前景的替代品,功能肽因其低毒性、高吸收率和生物活性而被广泛用于疾病治疗。最近,已经开发了几种用于功能肽预测的机器学习方法。然而,主要研究严重依赖统计特征,很少考虑多功能肽的识别。因此,我们提出了SME-MFP,一种用于不平衡多标签功能肽数据集的新型预测器。首先,我们利用物理化学和进化信息从多个角度表示肽序列的初始化特征。其次,对特征进行融合,然后放入空间特征提取器中,其中残差连接和多尺度卷积神经网络提取不同长度肽序列的更具判别力的特征。此外,我们还设计了基于AFT的时间特征提取器来充分捕捉序列的全局相互作用。最后,设计一种新的损失函数来取代传统的交叉熵损失以解决类别不平衡问题。结果表明,我们的框架不仅提高了模型有效捕捉序列特征的能力,而且在公共肽数据集上的准确率比现有方法提高了3.89%。