• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过整合多种类型生物信息来识别必需蛋白质的深度学习框架。

A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information.

作者信息

Zeng Min, Li Min, Fei Zhihui, Wu Fang-Xiang, Li Yaohang, Pan Yi, Wang Jianxin

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):296-305. doi: 10.1109/TCBB.2019.2897679. Epub 2021 Feb 3.

DOI:10.1109/TCBB.2019.2897679
PMID:30736002
Abstract

Computational methods including centrality and machine learning-based methods have been proposed to identify essential proteins for understanding the minimum requirements of the survival and evolution of a cell. In centrality methods, researchers are required to design a score function which is based on prior knowledge, yet is usually not sufficient to capture the complexity of biological information. In machine learning-based methods, some selected biological features cannot represent the complete properties of biological information as they lack a computational framework to automatically select features. To tackle these problems, we propose a deep learning framework to automatically learn biological features without prior knowledge. We use node2vec technique to automatically learn a richer representation of protein-protein interaction (PPI) network topologies than a score function. Bidirectional long short term memory cells are applied to capture non-local relationships in gene expression data. For subcellular localization information, we exploit a high dimensional indicator vector to characterize their feature. To evaluate the performance of our method, we tested it on PPI network of S. cerevisiae. Our experimental results demonstrate that the performance of our method is better than traditional centrality methods and is superior to existing machine learning-based methods. To explore which of the three types of biological information is the most vital element, we conduct an ablation study by removing each component in turn. Our results show that the PPI network embedding contributes most to the improvement. In addition, gene expression profiles and subcellular localization information are also helpful to improve the performance in identification of essential proteins.

摘要

为了理解细胞生存和进化的最低要求,人们提出了包括中心性方法和基于机器学习的方法在内的计算方法来识别必需蛋白质。在中心性方法中,研究人员需要设计一个基于先验知识的评分函数,但该函数通常不足以捕捉生物信息的复杂性。在基于机器学习的方法中,一些选定的生物学特征由于缺乏自动选择特征的计算框架,无法代表生物信息的完整属性。为了解决这些问题,我们提出了一个深度学习框架,无需先验知识即可自动学习生物学特征。我们使用node2vec技术自动学习比评分函数更丰富的蛋白质-蛋白质相互作用(PPI)网络拓扑结构表示。双向长短期记忆细胞用于捕捉基因表达数据中的非局部关系。对于亚细胞定位信息,我们利用高维指示向量来表征其特征。为了评估我们方法的性能,我们在酿酒酵母的PPI网络上对其进行了测试。我们的实验结果表明,我们方法的性能优于传统的中心性方法,并且优于现有的基于机器学习的方法。为了探索三种类型的生物信息中哪一种是最重要的元素,我们依次去除每个组件进行了消融研究。我们的结果表明,PPI网络嵌入对性能提升的贡献最大。此外,基因表达谱和亚细胞定位信息也有助于提高必需蛋白质识别的性能。

相似文献

1
A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information.一种通过整合多种类型生物信息来识别必需蛋白质的深度学习框架。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):296-305. doi: 10.1109/TCBB.2019.2897679. Epub 2021 Feb 3.
2
A deep learning framework for identifying essential proteins based on multiple biological information.基于多种生物信息识别必需蛋白的深度学习框架。
BMC Bioinformatics. 2022 Aug 4;23(1):318. doi: 10.1186/s12859-022-04868-8.
3
DeepEP: a deep learning framework for identifying essential proteins.DeepEP:一种用于识别必需蛋白质的深度学习框架。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):506. doi: 10.1186/s12859-019-3076-y.
4
DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE:基于深度学习的人类必需基因精准预测。
PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.
5
Predicting Essential Proteins by Integrating Network Topology, Subcellular Localization Information, Gene Expression Profile and GO Annotation Data.通过整合网络拓扑结构、亚细胞定位信息、基因表达谱和 GO 注释数据预测必需蛋白。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2053-2061. doi: 10.1109/TCBB.2019.2916038. Epub 2020 Dec 8.
6
A Topology Potential-Based Method for Identifying Essential Proteins from PPI Networks.一种基于拓扑势的从蛋白质-蛋白质相互作用网络中识别必需蛋白质的方法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):372-83. doi: 10.1109/TCBB.2014.2361350.
7
Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information.基于子网络划分和基于细胞内定位信息的优先级排序来识别必需蛋白质。
J Theor Biol. 2018 Jun 14;447:65-73. doi: 10.1016/j.jtbi.2018.03.029. Epub 2018 Mar 21.
8
ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins.ACDMBI:一种基于社区划分和多源生物信息融合的深度学习模型,用于预测必需蛋白质。
Comput Biol Chem. 2024 Oct;112:108115. doi: 10.1016/j.compbiolchem.2024.108115. Epub 2024 Jun 6.
9
Temporal Protein Complex Identification Based on Dynamic Heterogeneous Protein Information Network Representation Learning.基于动态异构蛋白质信息网络表示学习的时间蛋白质复合物识别
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1154-1164. doi: 10.1109/TCBB.2024.3351078. Epub 2024 Oct 9.
10
A new method for the discovery of essential proteins.一种发现必需蛋白的新方法。
PLoS One. 2013;8(3):e58763. doi: 10.1371/journal.pone.0058763. Epub 2013 Mar 21.

引用本文的文献

1
A deep ensemble framework for human essential gene prediction by integrating multi-omics data.一种通过整合多组学数据进行人类必需基因预测的深度集成框架。
Sci Rep. 2025 Jul 21;15(1):26407. doi: 10.1038/s41598-025-99164-9.
2
Target identification of natural products in cancer with chemical proteomics and artificial intelligence approaches.利用化学蛋白质组学和人工智能方法鉴定癌症中天然产物的靶点
Cancer Biol Med. 2025 Jul 9;22(6):549-97. doi: 10.20892/j.issn.2095-3941.2025.0145.
3
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.
蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
4
AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms.AttentionEP:通过注意力机制融合多尺度特征预测必需蛋白质
Comput Struct Biotechnol J. 2024 Nov 29;23:4315-4323. doi: 10.1016/j.csbj.2024.11.039. eCollection 2024 Dec.
5
Recent advances in the characterization of essential genes and development of a database of essential genes.必需基因表征的最新进展及必需基因数据库的开发。
Imeta. 2024 Jan 2;3(1):e157. doi: 10.1002/imt2.157. eCollection 2024 Feb.
6
Improving prediction of maternal health risks using PCA features and TreeNet model.使用主成分分析(PCA)特征和TreeNet模型改善对孕产妇健康风险的预测。
PeerJ Comput Sci. 2024 Apr 15;10:e1982. doi: 10.7717/peerj-cs.1982. eCollection 2024.
7
ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization.ECDEP:基于进化群落发现和亚细胞定位识别必需蛋白质
BMC Genomics. 2024 Jan 26;25(1):117. doi: 10.1186/s12864-024-10019-5.
8
Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience.通过机器学习理清必需基因的语境特异性:一种建设性的经验。
Biomolecules. 2023 Dec 22;14(1):18. doi: 10.3390/biom14010018.
9
'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程,用于从蛋白质数据中预测必需基因。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.
10
A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics.基于种子扩展的方法,通过整合蛋白质-蛋白质相互作用子网络和多种生物学特性来鉴定必需蛋白质。
BMC Bioinformatics. 2023 Nov 30;24(1):452. doi: 10.1186/s12859-023-05583-8.