State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab310.
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
神经肽作为各种动物神经系统中的信号分子,在广泛的生理功能和激素调节行为中发挥着关键作用。神经肽为发现治疗神经疾病的新药和新靶点提供了许多机会。近年来,已经有几种针对各种类型生物活性肽的基于数据的计算预测器,但目前关于神经肽的相关工作很少。在这项工作中,我们开发了一种可解释的堆叠模型,名为 NeuroPpred-Fuse,通过融合多种序列衍生特征和特征选择方法来预测神经肽。具体来说,我们使用了六种类型的序列衍生特征来对肽序列进行编码,然后将它们结合起来。在第一层,我们集成了三个基础分类器和四个特征选择算法,它们互补地选择非冗余的重要特征。在第二层,第一层的输出被合并并输入逻辑回归 (LR) 分类器进行模型训练。此外,我们还分析了选择的特征,并解释了选择特征的可行性。实验结果表明,我们的模型在独立测试集上的准确率达到了 90.6%,AUC 达到了 95.8%,优于最先进的模型。此外,我们通过这些树模型展示了所选特征的分布,并比较了训练集和测试集上的结果。这些结果充分表明,我们的模型具有一定的泛化能力。因此,我们期望我们的模型在发现治疗神经疾病的神经肽新药方面取得重要进展。