Li Jinjin, Xiong Shuwen, Shi Hua, Cui Feifei, Zhang Zilong, Wei Leyi
Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China.
School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China.
J Chem Inf Model. 2025 May 12;65(9):4740-4750. doi: 10.1021/acs.jcim.5c00444. Epub 2025 Apr 21.
Neuropeptides are key signaling molecules that regulate fundamental physiological processes ranging from metabolism to cognitive function. However, accurate identification is a huge challenge due to sequence heterogeneity, obscured functional motifs and limited experimentally validated data. Accurate identification of neuropeptides is critical for advancing neurological disease therapeutics and peptide-based drug design. Existing neuropeptide identification methods rely on manual features combined with traditional machine learning methods, which are difficult to capture the deep patterns of sequences. To address these limitations, we propose NeuroPred-AIMP (adaptive integrated multimodal predictor), an interpretable model that synergizes global semantic representation of the protein language model (ESM) and the multiscale structural features of the temporal convolutional network (TCN). The model introduced the adaptive features fusion mechanism of residual enhancement to dynamically recalibrate feature contributions, to achieve robust integration of evolutionary and local sequence information. The experimental results demonstrated that the proposed model showed excellent comprehensive performance on the independence test set, with an accuracy of 92.3% and the AUROC of 0.974. Simultaneously, the model showed good balance in the ability to identify positive and negative samples, with a sensitivity of 92.6% and a specificity of 92.1%, with a difference of less than 0.5%. The result fully confirms the effectiveness of the multimodal features strategy in the task of neuropeptide recognition.
神经肽是调节从新陈代谢到认知功能等基本生理过程的关键信号分子。然而,由于序列异质性、功能基序模糊以及实验验证数据有限,准确识别是一项巨大挑战。神经肽的准确识别对于推进神经疾病治疗和基于肽的药物设计至关重要。现有的神经肽识别方法依赖于人工特征与传统机器学习方法相结合,难以捕捉序列的深层模式。为解决这些局限性,我们提出了NeuroPred-AIMP(自适应集成多模态预测器),这是一种可解释的模型,它将蛋白质语言模型(ESM)的全局语义表示与时间卷积网络(TCN)的多尺度结构特征相结合。该模型引入了残差增强的自适应特征融合机制,以动态重新校准特征贡献,实现进化信息和局部序列信息的稳健整合。实验结果表明,所提出的模型在独立测试集上表现出优异的综合性能,准确率为92.3%,曲线下面积(AUROC)为0.974。同时,该模型在识别正样本和负样本的能力上表现出良好的平衡,灵敏度为92.6%,特异性为92.1%,差异小于0.5%。结果充分证实了多模态特征策略在神经肽识别任务中的有效性。