NLPEI：一种基于自然语言处理和进化信息的新型自相互作用蛋白预测模型。

NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information.

作者信息

Jia Li-Na, Yan Xin, You Zhu-Hong, Zhou Xi, Li Li-Ping, Wang Lei, Song Ke-Jian

机构信息

College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China.

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.

出版信息

Evol Bioinform Online. 2020 Dec 26;16:1176934320984171. doi: 10.1177/1176934320984171. eCollection 2020.

DOI:10.1177/1176934320984171

PMID:33488064

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7768313/

Abstract

The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.

摘要

蛋白质自相互作用（SIPs）的研究不仅可以在分子水平上揭示蛋白质的功能，对于理解诸如生长、发育、分化和凋亡等活动也至关重要，为探索重大疾病的发病机制提供了重要的理论基础。随着生物技术的飞速发展，大量的SIPs已被发现。然而，由于生物实验固有的周期长和成本高的问题，SIPs的识别与数据积累之间的差距越来越大。因此，需要快速准确的计算方法来有效预测SIPs。在本研究中，我们基于自然语言理解理论和进化信息设计了一种预测SIPs的新方法NLPEI。具体来说，我们首先将蛋白质序列理解为自然语言，并使用自然语言处理算法提取其特征。然后，我们使用位置特异性得分矩阵（PSSM）来表示蛋白质的进化信息，并通过深度学习的堆叠自动编码器（SAE）算法提取其特征。最后，我们将蛋白质的自然语言特征与进化特征融合，并通过极限学习机（ELM）分类器进行准确预测。在人类和酵母的SIPs金标准数据集中，NLPEI的预测准确率分别达到了94.19%和91.29%。与不同的分类器模型、不同的特征模型以及其他现有方法相比，NLPEI取得了最佳结果。这些实验结果表明，NLPEI是预测SIPs的有效工具，可以为生物实验提供可靠的候选对象。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48d7/7768313/18f1dbbd9c61/10.1177_1176934320984171-fig1.jpg

相似文献

NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information.NLPEI：一种基于自然语言处理和进化信息的新型自相互作用蛋白预测模型。

Evol Bioinform Online. 2020 Dec 26;16:1176934320984171. doi: 10.1177/1176934320984171. eCollection 2020.

Improving Prediction of Self-interacting Proteins Using Stacked Sparse Auto-Encoder with PSSM profiles.利用 PSSM 特征的堆叠稀疏自编码器改进自相互作用蛋白的预测。

Int J Biol Sci. 2018 May 23;14(8):983-991. doi: 10.7150/ijbs.23817. eCollection 2018.

Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information.使用加权极限学习机结合蛋白质进化信息预测蛋白质自相互作用的计算方法。

J Cheminform. 2017 Aug 18;9(1):47. doi: 10.1186/s13321-017-0233-z.

Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier.利用基于加权稀疏表示的分类器从蛋白质序列信息中准确预测自我相互作用的蛋白质。

BMC Bioinformatics. 2022 Dec 1;23(Suppl 7):518. doi: 10.1186/s12859-022-04880-y.

Identification of self-interacting proteins by integrating random projection classifier and finite impulse response filter.通过整合随机投影分类器和有限脉冲响应滤波器来鉴定自我相互作用的蛋白质。

BMC Genomics. 2019 Dec 27;20(Suppl 13):928. doi: 10.1186/s12864-019-6301-1.

Prediction of protein self-interactions using stacked long short-term memory from protein sequences information.利用来自蛋白质序列信息的堆叠长短期记忆预测蛋白质自相互作用。

BMC Syst Biol. 2018 Dec 21;12(Suppl 8):129. doi: 10.1186/s12918-018-0647-x.

Predicting Self-Interacting Proteins Using a Recurrent Neural Network and Protein Evolutionary Information.使用递归神经网络和蛋白质进化信息预测自我相互作用蛋白质。

Evol Bioinform Online. 2020 May 28;16:1176934320924674. doi: 10.1177/1176934320924674. eCollection 2020.

PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning.PSPEL：使用集成学习从氨基酸序列预测蛋白质相互作用。

IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1165-1172. doi: 10.1109/TCBB.2017.2649529. Epub 2017 Jan 10.

Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model.蛋白质序列的全局向量表示及其在使用多粒度级联森林模型预测自相互作用蛋白中的应用。

Genes (Basel). 2019 Nov 12;10(11):924. doi: 10.3390/genes10110924.

An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation.一种基于小波变换从蛋白质序列预测自相互作用蛋白质的改进深度森林模型。

Front Genet. 2019 Mar 1;10:90. doi: 10.3389/fgene.2019.00090. eCollection 2019.

引用本文的文献

An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix.一种基于VGGNet卷积神经网络和灰度共生矩阵预测自相互作用蛋白的有效计算方法。

Evol Bioinform Online. 2024 Oct 21;20:11769343241292224. doi: 10.1177/11769343241292224. eCollection 2024.

The Importance of Weakly Co-Evolving Residue Networks in Proteins is Revealed by Visual Analytics.可视化分析揭示了蛋白质中弱协同进化残基网络的重要性。

Front Bioinform. 2022 Apr 5;2:836526. doi: 10.3389/fbinf.2022.836526. eCollection 2022.

Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network.利用图卷积网络预测CRISPR/Cas9基因编辑中sgRNA的脱靶活性

Entropy (Basel). 2021 May 14;23(5):608. doi: 10.3390/e23050608.

本文引用的文献

GCNCDA: A new method for predicting circRNA-disease associations based on Graph Convolutional Network Algorithm.GCNCDA：一种基于图卷积网络算法的 circRNA-疾病关联预测新方法。

PLoS Comput Biol. 2020 May 20;16(5):e1007568. doi: 10.1371/journal.pcbi.1007568. eCollection 2020 May.

iCDA-CGR: Identification of circRNA-disease associations based on Chaos Game Representation.iCDA-CGR：基于混沌游戏表示的 circRNA-疾病关联识别。

PLoS Comput Biol. 2020 May 18;16(5):e1007872. doi: 10.1371/journal.pcbi.1007872. eCollection 2020 May.

Incorporating chemical sub-structures and protein evolutionary information for inferring drug-target interactions.结合化学亚结构和蛋白质进化信息推断药物-靶标相互作用。

Sci Rep. 2020 Apr 20;10(1):6641. doi: 10.1038/s41598-020-62891-2.

DBMDA: A Unified Embedding for Sequence-Based miRNA Similarity Measure with Applications to Predict and Validate miRNA-Disease Associations.DBMDA：一种用于基于序列的miRNA相似性度量的统一嵌入方法及其在预测和验证miRNA-疾病关联中的应用

Mol Ther Nucleic Acids. 2020 Mar 6;19:602-611. doi: 10.1016/j.omtn.2019.12.010. Epub 2019 Dec 18.

An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model.基于 PSSM 进化信息与局部二值模式模型相结合的蛋白质-蛋白质相互作用预测的集成分类器。

Int J Mol Sci. 2019 Jul 17;20(14):3511. doi: 10.3390/ijms20143511.

Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest.基于卷积神经网络和特征选择旋转森林的矩阵基蛋白序列预测蛋白-蛋白相互作用

Sci Rep. 2019 Jul 8;9(1):9848. doi: 10.1038/s41598-019-46369-4.

Front Genet. 2019 Mar 1;10:90. doi: 10.3389/fgene.2019.00090. eCollection 2019.

The BioGRID interaction database: 2019 update.生物相互作用数据库（BioGRID）：2019 年更新版。

Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541. doi: 10.1093/nar/gky1079.

Prediction of RNA-protein interactions by combining deep convolutional neural network with feature selection ensemble method.通过结合深度卷积神经网络和特征选择集成方法预测 RNA-蛋白质相互作用。

J Theor Biol. 2019 Jan 14;461:230-238. doi: 10.1016/j.jtbi.2018.10.029. Epub 2018 Oct 12.

Combining High Speed ELM Learning with a Deep Convolutional Neural Network Feature Encoding for Predicting Protein-RNA Interactions.将高速 ELM 学习与深度卷积神经网络特征编码相结合，用于预测蛋白质-RNA 相互作用。

IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):972-980. doi: 10.1109/TCBB.2018.2874267. Epub 2018 Oct 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

NLPEI：一种基于自然语言处理和进化信息的新型自相互作用蛋白预测模型。

NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献