School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.
J Proteome Res. 2023 Mar 3;22(3):718-728. doi: 10.1021/acs.jproteome.2c00363. Epub 2023 Feb 7.
Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.
神经肽在不同的生理过程中发挥着关键作用,与各种疾病有关。鉴定神经肽对于研究这些生理过程的机制和治疗神经紊乱具有重要意义。已经开发了几种最先进的神经肽预测器,这些预测器使用两层堆叠集成算法。尽管两层堆叠集成算法可以提高特征表示能力,但这些模型很复杂,不如基于单个分类器的模型高效。在这项研究中,我们提出了一种新的模型 NeuroPpred-SVM,该模型基于 Transformer 的双向编码器表示和其他序列特征,通过支持向量机 (SVM) 来预测神经肽。实验结果表明,我们的模型在训练数据集上的交叉验证接收者操作特征 (AUROC) 曲线下面积为 0.969,在独立测试数据集上的 AUROC 为 0.966。通过将我们的模型与其他四个最先进的模型,包括 NeuroPIpred、PredNeuroP、NeuroPpred-Fuse 和 NeuroPpred-FRL,在独立测试数据集上进行比较,我们的模型在 AUROC、马修斯相关系数、准确性和特异性方面均取得了最高得分,这表明我们的模型优于现有的模型。我们相信 NeuroPpred-SVM 可以成为一种具有高精度和低成本的识别神经肽的有用工具。数据集和 Python 代码可在 https://github.com/liuyf-a/NeuroPpred-SVM 上获得。