DeepEP：一种用于识别必需蛋白质的深度学习框架。

DeepEP: a deep learning framework for identifying essential proteins.

机构信息

School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.

Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada.

出版信息

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):506. doi: 10.1186/s12859-019-3076-y.

DOI:10.1186/s12859-019-3076-y

PMID:31787076

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6886168/

Abstract

BACKGROUND

Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics.

RESULTS

We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins.

CONCLUSION

We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.

摘要

背景

必需蛋白对细胞生命至关重要，因此，鉴定必需蛋白是研究人员的一个重要课题和具有挑战性的问题。最近已经提出了许多计算方法来处理这个问题。然而，传统的中心性方法不能充分表示生物网络的拓扑特征。此外，鉴定必需蛋白是一个不平衡的学习问题，但目前很少有基于浅层机器学习的方法被设计来处理不平衡的特点。

结果

我们基于深度学习框架开发了 DeepEP，该框架使用 node2vec 技术、多尺度卷积神经网络和采样技术来识别必需蛋白。在 DeepEP 中，node2vec 技术被应用于自动学习蛋白质-蛋白质相互作用 (PPI) 网络中每个蛋白质的拓扑和语义特征。基因表达谱被视为图像，多尺度卷积神经网络被应用于提取其模式。此外，DeepEP 使用了一种采样方法来缓解不平衡的特点。该采样方法在一个训练时期中对多数和少数样本进行相同数量的采样，在训练过程中不会偏向任何一个类别。实验结果表明，DeepEP 优于传统的中心性方法。此外，DeepEP 优于基于浅层机器学习的方法。详细分析表明，node2vec 技术生成的密集向量对提高性能贡献很大。很明显，node2vec 技术有效地捕捉了 PPI 网络的拓扑和语义特性。采样方法也提高了识别必需蛋白的性能。

结论

我们证明了 DeepEP 通过整合多种深度学习技术和采样方法来提高预测性能。DeepEP 比现有的方法更有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c75/6886168/8f152733caef/12859_2019_3076_Fig1_HTML.jpg

相似文献

DeepEP: a deep learning framework for identifying essential proteins.

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):506. doi: 10.1186/s12859-019-3076-y.

A deep learning framework for identifying essential proteins based on multiple biological information.

BMC Bioinformatics. 2022 Aug 4;23(1):318. doi: 10.1186/s12859-022-04868-8.

DeepHE: Accurately predicting human essential genes based on deep learning.

PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.

A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):296-305. doi: 10.1109/TCBB.2019.2897679. Epub 2021 Feb 3.

Multimodal deep representation learning for protein interaction identification and protein family classification.

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):531. doi: 10.1186/s12859-019-3084-y.

EPGAT: Gene Essentiality Prediction With Graph Attention Networks.

IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1615-1626. doi: 10.1109/TCBB.2021.3054738. Epub 2022 Jun 3.

Completing sparse and disconnected protein-protein network by deep learning.

BMC Bioinformatics. 2018 Mar 22;19(1):103. doi: 10.1186/s12859-018-2112-7.

EPI-SF: essential protein identification in protein interaction networks using sequence features.

PeerJ. 2024 Mar 13;12:e17010. doi: 10.7717/peerj.17010. eCollection 2024.

A novel biomedical image indexing and retrieval system via deep preference learning.

Comput Methods Programs Biomed. 2018 May;158:53-69. doi: 10.1016/j.cmpb.2018.02.003. Epub 2018 Feb 6.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.

IEEE Trans Image Process. 2018;27(1):106-120. doi: 10.1109/TIP.2017.2755766.

引用本文的文献

A deep ensemble framework for human essential gene prediction by integrating multi-omics data.

Sci Rep. 2025 Jul 21;15(1):26407. doi: 10.1038/s41598-025-99164-9.

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

Constructing multilayer PPI networks based on homologous proteins and integrating multiple PageRank to identify essential proteins.

BMC Bioinformatics. 2025 Mar 10;26(1):80. doi: 10.1186/s12859-025-06093-5.

AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms.

Comput Struct Biotechnol J. 2024 Nov 29;23:4315-4323. doi: 10.1016/j.csbj.2024.11.039. eCollection 2024 Dec.

Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.

EPI-SF: essential protein identification in protein interaction networks using sequence features.

PeerJ. 2024 Mar 13;12:e17010. doi: 10.7717/peerj.17010. eCollection 2024.

A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae050.

ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization.

BMC Genomics. 2024 Jan 26;25(1):117. doi: 10.1186/s12864-024-10019-5.

Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience.

Biomolecules. 2023 Dec 22;14(1):18. doi: 10.3390/biom14010018.

'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.

本文引用的文献

Protein-protein interaction site prediction through combining local and global features with deep neural networks.

Bioinformatics. 2020 Feb 15;36(4):1114-1120. doi: 10.1093/bioinformatics/btz699.

DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions.

Proteomics. 2019 Jun;19(12):e1900019. doi: 10.1002/pmic.201900019. Epub 2019 May 27.

Network-based methods for predicting essential genes or proteins: a survey.

Brief Bioinform. 2020 Mar 23;21(2):566-583. doi: 10.1093/bib/bbz017.

A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):296-305. doi: 10.1109/TCBB.2019.2897679. Epub 2021 Feb 3.

United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1451-1458. doi: 10.1109/TCBB.2018.2889978. Epub 2018 Dec 27.

Control principles for complex biological networks.

Brief Bioinform. 2019 Nov 27;20(6):2253-2266. doi: 10.1093/bib/bby088.

Automated ICD-9 Coding via A Deep Learning Approach.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1193-1202. doi: 10.1109/TCBB.2018.2817488. Epub 2018 Mar 20.

Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation.

Med Image Anal. 2017 Feb;36:61-78. doi: 10.1016/j.media.2016.10.004. Epub 2016 Oct 29.

node2vec: Scalable Feature Learning for Networks.

KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.

Predicting essential proteins based on subcellular localization, orthology and PPI networks.

BMC Bioinformatics. 2016 Aug 31;17 Suppl 8(Suppl 8):279. doi: 10.1186/s12859-016-1115-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DeepEP：一种用于识别必需蛋白质的深度学习框架。

DeepEP: a deep learning framework for identifying essential proteins.

机构信息

School of Computer Science and Engineering, Central South University, Changsha, 410083, People's Republic of China.

Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada.