神经肽预测模型 FRL：基于特征表示学习的神经肽识别可解释预测模型。

NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning.

机构信息

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.

Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab167.

DOI:10.1093/bib/bbab167

PMID:33975333

Abstract

Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.

摘要

神经肽（NPs）是免疫系统中最通用的神经递质，可调节各种中枢焦虑激素。在免疫信息学中，开发一种高效、准确的大规模 NP 快速识别的生物信息学工具至关重要，这对于基础研究和药物开发是不可或缺的。虽然已经开发了一些 NP 预测工具，但必须提高它们的 NP 预测性能。在这项研究中，我们通过采用特征表示学习方法，开发了一种基于机器学习的元预测器 NeuroPred-FRL。首先，我们通过采用 11 种不同的编码、6 种不同的分类器和两步特征选择方法，生成了 66 个最佳的基线模型。其次，为了增强特征表示能力，我们采用两步特征选择方法优化了 66-D 概率特征向量，然后将最优特征向量输入随机森林分类器，最终构建了元模型（NeuroPred-FRL）。基于交叉验证和独立测试的基准实验表明，NeuroPred-FRL 与其他最先进的预测器相比，对 NP 的预测性能更优。我们相信，所提出的 NeuroPred-FRL 可以作为大规模识别 NP 的有力工具，有助于对其功能机制进行表征，并加速其在临床治疗中的应用。此外，我们通过利用强大的 SHapley Additive exPlanation 算法，解释了 NeuroPred-FRL 的一些模型机制。