• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于序列和功能特征,使用机器学习方法进行必需基因预测。

Essential gene prediction in using machine learning approaches based on sequence and functional features.

作者信息

Aromolaran Olufemi, Beder Thomas, Oswald Marcus, Oyelade Jelili, Adebiyi Ezekiel, Koenig Rainer

机构信息

Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria.

Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.

出版信息

Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.

DOI:10.1016/j.csbj.2020.02.022
PMID:32257045
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7096750/
Abstract

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in . A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in . a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.

摘要

如果基因功能丧失会损害生存能力或导致适应性严重丧失,那么这些基因就被称为必需基因。在基因组规模上,可以通过实验使用RNA干扰或基因敲除筛选来确定这些基因,但这需要大量资源。用于预测必需基因的计算方法可以克服这一缺点,特别是当考虑内在特征(例如来自蛋白质序列)以及外在特征(例如来自转录谱)时。在这项工作中,我们使用机器学习来预测[具体物种]中的必需基因。基于包括核苷酸和蛋白质序列、基因网络、蛋白质-蛋白质相互作用、进化保守性和功能注释等各种不同方面,共生成了27340个特征。通过交叉验证,我们获得了出色的预测性能。最佳模型在[具体物种]中实现了受试者工作特征曲线下面积(ROC-AUC)为0.90、精确率-召回率曲线下面积(PR-AUC)为0.30以及F1分数为0.34。我们的方法显著优于仅使用源自蛋白质序列的特征的基准方法(P < 0.001)。在研究哪些特征促成了这一成功时,我们发现所有类别的特征都有贡献,最突出的是网络拓扑、功能和基于序列的特征。为了评估我们的方法,我们在人类中进行了相同的必需基因预测工作流程,获得了ROC-AUC = 0.97、PR-AUC = 0.73和F1 = 0.64。总之,这项研究表明,使用我们精心构建的涵盖广泛内在和外在基因及蛋白质特征的特征集合,使智能系统能够很好地预测生物体中基因的必需性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c88b866dddfc/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c06ccf4c3e5a/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/22e938b57ac9/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/144234a935d1/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c37cbd41ad63/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/73009ed8985f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/db7ae6085142/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c88b866dddfc/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c06ccf4c3e5a/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/22e938b57ac9/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/144234a935d1/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c37cbd41ad63/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/73009ed8985f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/db7ae6085142/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5b3e/7096750/c88b866dddfc/gr6.jpg

相似文献

1
Essential gene prediction in using machine learning approaches based on sequence and functional features.基于序列和功能特征,使用机器学习方法进行必需基因预测。
Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.
2
Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用:综述。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
3
Predicting gene essentiality in by feature engineering and machine-learning.通过特征工程和机器学习预测基因必需性。 (你提供的原文“Predicting gene essentiality in by feature engineering and machine-learning.”似乎不完整,“in”后面缺少具体内容,但按照要求进行了现有内容的翻译。)
Comput Struct Biotechnol J. 2020 May 15;18:1093-1102. doi: 10.1016/j.csbj.2020.05.008. eCollection 2020.
4
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估
Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.
5
Identifying essential genes in bacterial metabolic networks with machine learning methods.运用机器学习方法识别细菌代谢网络中的必需基因。
BMC Syst Biol. 2010 May 3;4:56. doi: 10.1186/1752-0509-4-56.
6
'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程,用于从蛋白质数据中预测必需基因。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.
7
Combined use of feature engineering and machine-learning to predict essential genes in .结合特征工程和机器学习来预测……中的必需基因。 (原文句末不完整)
NAR Genom Bioinform. 2020 Jul 22;2(3):lqaa051. doi: 10.1093/nargab/lqaa051. eCollection 2020 Sep.
8
Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster.启发式支持的主动机器学习:以预测黑腹果蝇必需发育阶段和免疫反应基因为例的研究。
PLoS One. 2023 Aug 9;18(8):e0288023. doi: 10.1371/journal.pone.0288023. eCollection 2023.
9
Predicting host dependency factors of pathogens in using machine learning.利用机器学习预测病原体的宿主依赖性因子。
Comput Struct Biotechnol J. 2021 Aug 9;19:4581-4592. doi: 10.1016/j.csbj.2021.08.010. eCollection 2021.
10
An integrated machine-learning model to predict prokaryotic essential genes.一种用于预测原核生物必需基因的集成机器学习模型。
Methods Mol Biol. 2015;1279:137-51. doi: 10.1007/978-1-4939-2398-4_9.

引用本文的文献

1
A hybrid machine learning model with attention mechanism and multidimensional multivariate feature coding for essential gene prediction.一种具有注意力机制和多维多变量特征编码的混合机器学习模型用于必需基因预测。
BMC Biol. 2025 Apr 24;23(1):108. doi: 10.1186/s12915-025-02209-8.
2
Machine learning methods for predicting essential metabolic genes from Plasmodium falciparum genome-scale metabolic network.基于恶性疟原虫基因组规模代谢网络预测必需代谢基因的机器学习方法
PLoS One. 2024 Dec 23;19(12):e0315530. doi: 10.1371/journal.pone.0315530. eCollection 2024.
3
Artificial intelligence and machine learning applications for cultured meat.

本文引用的文献

1
New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform.基于整合分析和 HEGIAP 网络平台构建的人类必需基因新见解。
Brief Bioinform. 2020 Jul 15;21(4):1397-1410. doi: 10.1093/bib/bbz072.
2
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估
Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.
3
g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).
用于培养肉的人工智能和机器学习应用。
Front Artif Intell. 2024 Sep 24;7:1424012. doi: 10.3389/frai.2024.1424012. eCollection 2024.
4
Inference of essential genes in and by machine learning and the implications for discovering new interventions.通过机器学习推断[具体物种1]和[具体物种2]中的必需基因及其对发现新干预措施的意义。 (你原文中“and”前后的内容缺失,我根据格式推测补充了[具体物种1]和[具体物种2],你可根据实际情况修改)
Comput Struct Biotechnol J. 2024 Aug 2;23:3081-3089. doi: 10.1016/j.csbj.2024.07.025. eCollection 2024 Dec.
5
Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.基于机器学习分析的细菌必需基因中差异使用密码子的鉴定。
Mol Genet Genomics. 2024 Jul 27;299(1):72. doi: 10.1007/s00438-024-02163-0.
6
Inference of Essential Genes of the Parasite via Machine Learning.通过机器学习推断寄生虫的必需基因。
Int J Mol Sci. 2024 Jun 27;25(13):7015. doi: 10.3390/ijms25137015.
7
Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience.通过机器学习理清必需基因的语境特异性:一种建设性的经验。
Biomolecules. 2023 Dec 22;14(1):18. doi: 10.3390/biom14010018.
8
Essential genes identification model based on sequence feature map and graph convolutional neural network.基于序列特征图和图卷积神经网络的必需基因识别模型。
BMC Genomics. 2024 Jan 10;25(1):47. doi: 10.1186/s12864-024-09958-w.
9
'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程,用于从蛋白质数据中预测必需基因。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.
10
Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications.玉米特征库:一个集中式资源,用于管理和分析经过策展的玉米多组学特征,以用于机器学习应用。
Database (Oxford). 2023 Nov 6;2023. doi: 10.1093/database/baad078.
g:Profiler:一个用于功能富集分析和基因列表转换的网络服务器(2019 更新)。
Nucleic Acids Res. 2019 Jul 2;47(W1):W191-W198. doi: 10.1093/nar/gkz369.
4
Analysis of Topological Parameters of Complex Disease Genes Reveals the Importance of Location in a Biomolecular Network.分析复杂疾病基因的拓扑参数揭示了其在生物分子网络中位置的重要性。
Genes (Basel). 2019 Feb 14;10(2):143. doi: 10.3390/genes10020143.
5
Identifying mouse developmental essential genes using machine learning.利用机器学习识别小鼠发育必需基因。
Dis Model Mech. 2018 Dec 13;11(12):dmm034546. doi: 10.1242/dmm.034546.
6
Network-based features enable prediction of essential genes across diverse organisms.基于网络的特征可实现跨多种生物的必需基因预测。
PLoS One. 2018 Dec 13;13(12):e0208722. doi: 10.1371/journal.pone.0208722. eCollection 2018.
7
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
8
The BioGRID interaction database: 2019 update.生物相互作用数据库(BioGRID):2019 年更新版。
Nucleic Acids Res. 2019 Jan 8;47(D1):D529-D541. doi: 10.1093/nar/gky1079.
9
FlyBase 2.0: the next generation.FlyBase 2.0:下一代。
Nucleic Acids Res. 2019 Jan 8;47(D1):D759-D765. doi: 10.1093/nar/gky1003.
10
Vector Control and Insecticidal Resistance in the African Malaria Mosquito Anopheles gambiae.非洲疟疾蚊冈比亚按蚊的病媒控制和抗药性。
Chem Res Toxicol. 2018 Jul 16;31(7):534-547. doi: 10.1021/acs.chemrestox.7b00285. Epub 2018 Jun 15.