• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器学习在基因必需性预测中的应用:综述。

Machine learning approach to gene essentiality prediction: a review.

机构信息

Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.

Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.

DOI:10.1093/bib/bbab128
PMID:33842944
Abstract

UNLABELLED

Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions.

SHORT ABSTRACT

Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.

摘要

未加标签的

必需基因对于任何生物的生长和存活都是至关重要的。机器学习方法补充了实验方法,以最小化必需性测定所需的资源。以前的研究表明,需要发现显著分类必需基因的相关特征,提高预测模型在生物体之间的泛化能力,并构建稳健的黄金标准作为训练数据的类标签,以增强预测。研究结果还表明,机器学习方法的一个显著局限性是预测条件必需基因。由于生物体的特定条件,基因的必需性状态可能会发生变化。本综述检查了应用于必需基因预测任务的各种方法,它们的优缺点以及有效计算预测必需基因的因素。我们讨论了特征类别以及它们如何有助于必需性预测模型的分类性能。为了对其必需性预测能力进行比较分析,针对秀丽隐杆线虫生成了五类特征,即基因序列、蛋白质序列、网络拓扑、同源性和基于基因本体论的特征。基于基因本体论的特征类别表现优于其他特征类别,主要是由于其与基因的生物学功能高度相关。然而,拓扑特征类别提供了最高的区分能力,使其更适合必需性预测。机器学习预测必需基因条件性的主要限制因素是缺乏感兴趣条件的标记数据,这些数据可以训练分类器。因此,合作机器学习可以进一步利用能够在条件必需性预测中表现良好的模型。

简介

鉴定必需基因至关重要,因为它提供了对核心结构和功能的理解,加速了药物靶点的发现等功能。最近的研究已经应用机器学习来补充必需基因的实验鉴定。然而,有几个因素限制了机器学习方法的性能。本综述旨在介绍预测生物体中必需基因的标准程序和可用资源,并强调当前在使用机器学习进行条件基因必需性预测方面的限制的原因。特征和 ML 技术的选择被确定为有效预测必需基因的重要因素。

相似文献

1
Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用:综述。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
2
Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.利用有限的基因必需性信息进行必需基因预测——一种综合的半监督机器学习策略。
PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.
3
DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE:基于深度学习的人类必需基因精准预测。
PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.
4
Prediction of essential genes in prokaryote based on artificial neural network.基于人工神经网络的原核生物必需基因预测。
Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.
5
Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information.通过整合网络拓扑、细胞定位和生物过程信息来预测必需基因。
BMC Bioinformatics. 2009 Sep 16;10:290. doi: 10.1186/1471-2105-10-290.
6
Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning.使用机器学习在两种模式真核生物之间交叉预测必需基因。
Int J Mol Sci. 2021 May 11;22(10):5056. doi: 10.3390/ijms22105056.
7
EPGAT: Gene Essentiality Prediction With Graph Attention Networks.EPGAT:基于图注意力网络的基因必需性预测。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1615-1626. doi: 10.1109/TCBB.2021.3054738. Epub 2022 Jun 3.
8
Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster.启发式支持的主动机器学习:以预测黑腹果蝇必需发育阶段和免疫反应基因为例的研究。
PLoS One. 2023 Aug 9;18(8):e0288023. doi: 10.1371/journal.pone.0288023. eCollection 2023.
9
Identifying mouse developmental essential genes using machine learning.利用机器学习识别小鼠发育必需基因。
Dis Model Mech. 2018 Dec 13;11(12):dmm034546. doi: 10.1242/dmm.034546.
10
Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications.利用模式生物基因组学来支持基于机器学习的真核生物必需基因预测——生物技术意义。
Biotechnol Adv. 2022 Jan-Feb;54:107822. doi: 10.1016/j.biotechadv.2021.107822. Epub 2021 Aug 27.

引用本文的文献

1
Bacterial Species in Engineered Living Materials: Strategies and Future Directions.工程化活材料中的细菌种类:策略与未来方向。
Microb Biotechnol. 2025 May;18(5):e70164. doi: 10.1111/1751-7915.70164.
2
Properties of "Stable" Mosquito Cytochrome P450 Enzymes.“稳定”的蚊子细胞色素P450酶的特性
Insects. 2025 Feb 8;16(2):184. doi: 10.3390/insects16020184.
3
An optimized deep-forest algorithm using a modified differential evolution optimization algorithm: A case of host-pathogen protein-protein interaction prediction.一种使用改进差分进化优化算法的优化深度森林算法:宿主-病原体蛋白质-蛋白质相互作用预测实例
Comput Struct Biotechnol J. 2025 Jan 26;27:595-611. doi: 10.1016/j.csbj.2025.01.020. eCollection 2025.
4
Predicting the risk of gastroparesis in critically ill patients after CME using an interpretable machine learning algorithm - a 10-year multicenter retrospective study.使用可解释的机器学习算法预测心脏代谢疾病(CME)后重症患者发生胃轻瘫的风险——一项为期10年的多中心回顾性研究。
Front Med (Lausanne). 2025 Jan 6;11:1467565. doi: 10.3389/fmed.2024.1467565. eCollection 2024.
5
Machine learning methods for predicting essential metabolic genes from Plasmodium falciparum genome-scale metabolic network.基于恶性疟原虫基因组规模代谢网络预测必需代谢基因的机器学习方法
PLoS One. 2024 Dec 23;19(12):e0315530. doi: 10.1371/journal.pone.0315530. eCollection 2024.
6
Construction of a risk prediction model for postoperative deep vein thrombosis in colorectal cancer patients based on machine learning algorithms.基于机器学习算法构建结直肠癌患者术后深静脉血栓形成风险预测模型。
Front Oncol. 2024 Nov 27;14:1499794. doi: 10.3389/fonc.2024.1499794. eCollection 2024.
7
Prediction of in-hospital mortality risk for patients with acute ST-elevation myocardial infarction after primary PCI based on predictors selected by GRACE score and two feature selection methods.基于GRACE评分及两种特征选择方法筛选出的预测因子对急性ST段抬高型心肌梗死患者直接经皮冠状动脉介入治疗后院内死亡风险的预测
Front Cardiovasc Med. 2024 Oct 22;11:1419551. doi: 10.3389/fcvm.2024.1419551. eCollection 2024.
8
HELP: A computational framework for labelling and predicting human common and context-specific essential genes.帮助:一种用于标记和预测人类普遍和特定情境必需基因的计算框架。
PLoS Comput Biol. 2024 Sep 27;20(9):e1012076. doi: 10.1371/journal.pcbi.1012076. eCollection 2024 Sep.
9
Machine learning-based prediction of gastroparesis risk following complete mesocolic excision.基于机器学习对完整结肠系膜切除术后胃轻瘫风险的预测
Discov Oncol. 2024 Sep 27;15(1):483. doi: 10.1007/s12672-024-01355-9.
10
Identification of biomarkers and potential drug targets in osteoarthritis based on bioinformatics analysis and mendelian randomization.基于生物信息学分析和孟德尔随机化的骨关节炎生物标志物及潜在药物靶点的鉴定
Front Pharmacol. 2024 Aug 29;15:1439289. doi: 10.3389/fphar.2024.1439289. eCollection 2024.