通过机器学习识别真核生物中的必需基因。

Identifying essential genes across eukaryotes by machine learning.

作者信息

Beder Thomas, Aromolaran Olufemi, Dönitz Jürgen, Tapanelli Sofia, Adedeji Eunice O, Adebiyi Ezekiel, Bucher Gregor, Koenig Rainer

机构信息

Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.

Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria.

出版信息

NAR Genom Bioinform. 2021 Nov 30;3(4):lqab110. doi: 10.1093/nargab/lqab110. eCollection 2021 Dec.

DOI:10.1093/nargab/lqab110

PMID:34859210

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8634067/

Abstract

Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to and and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.

摘要

在全基因组范围内鉴定必需基因需要大量资源，并且仅在少数真核生物中进行过。对于研究较少的生物，必需性可能通过基因同源性来预测。然而，这种方法不适用于非保守基因。此外，从单细胞或整个多细胞生物的研究中获得了不同的必需性信息，特别是当这些信息来自人类细胞系筛选和人类群体研究时。我们对六种模式真核生物和60381个基因采用了机器学习，使用了从序列、基因功能信息和网络拓扑结构中提取的41635个特征。在留一生物交叉验证中，分类器显示出很高的通用性，在留出的物种中平均准确率接近80%。作为一个案例研究，我们将该方法应用于[具体内容缺失]并通过实验验证了预测结果，得到了相似的性能。最后，使用基于所研究模式生物的分类器能够将人类细胞系筛选和群体研究的必需性信息联系起来。

相似文献

Identifying essential genes across eukaryotes by machine learning.通过机器学习识别真核生物中的必需基因。

NAR Genom Bioinform. 2021 Nov 30;3(4):lqab110. doi: 10.1093/nargab/lqab110. eCollection 2021 Dec.

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估

Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.

Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用：综述。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.

Identifying essential genes in bacterial metabolic networks with machine learning methods.运用机器学习方法识别细菌代谢网络中的必需基因。

BMC Syst Biol. 2010 May 3;4:56. doi: 10.1186/1752-0509-4-56.

Sequence-based information-theoretic features for gene essentiality prediction.用于基因必需性预测的基于序列的信息论特征。

BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.

Identifying mouse developmental essential genes using machine learning.利用机器学习识别小鼠发育必需基因。

Dis Model Mech. 2018 Dec 13;11(12):dmm034546. doi: 10.1242/dmm.034546.

Essential gene prediction in using machine learning approaches based on sequence and functional features.基于序列和功能特征，使用机器学习方法进行必需基因预测。

Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.

Network-based features enable prediction of essential genes across diverse organisms.基于网络的特征可实现跨多种生物的必需基因预测。

PLoS One. 2018 Dec 13;13(12):e0208722. doi: 10.1371/journal.pone.0208722. eCollection 2018.

Towards the identification of essential genes using targeted genome sequencing and comparative analysis.利用靶向基因组测序和比较分析鉴定必需基因

BMC Genomics. 2006 Oct 19;7:265. doi: 10.1186/1471-2164-7-265.

Gene Essentiality Analyzed by Transposon Mutagenesis and Machine Learning in a Stable Haploid Isolate of .通过转座子诱变和机器学习分析在稳定的单倍体分离株中的基因必需性。

mBio. 2018 Oct 30;9(5):e02048-18. doi: 10.1128/mBio.02048-18.

引用本文的文献

Application of machine learning and genomics for orphan crop improvement.机器学习与基因组学在小众作物改良中的应用。

Nat Commun. 2025 Jan 24;16(1):982. doi: 10.1038/s41467-025-56330-x.

Superior target genes and pathways for RNAi-mediated pest control revealed by genome-wide analysis in the beetle Tribolium castaneum.通过对赤拟谷盗全基因组分析揭示RNA干扰介导的害虫防治的优势靶基因和途径。

Pest Manag Sci. 2025 Feb;81(2):1026-1036. doi: 10.1002/ps.8505. Epub 2024 Nov 5.

HELP: A computational framework for labelling and predicting human common and context-specific essential genes.帮助：一种用于标记和预测人类普遍和特定情境必需基因的计算框架。

PLoS Comput Biol. 2024 Sep 27;20(9):e1012076. doi: 10.1371/journal.pcbi.1012076. eCollection 2024 Sep.

Inference of essential genes in and by machine learning and the implications for discovering new interventions.通过机器学习推断[具体物种1]和[具体物种2]中的必需基因及其对发现新干预措施的意义。（你原文中“and”前后的内容缺失，我根据格式推测补充了[具体物种1]和[具体物种2]，你可根据实际情况修改）

Comput Struct Biotechnol J. 2024 Aug 2;23:3081-3089. doi: 10.1016/j.csbj.2024.07.025. eCollection 2024 Dec.

Inference of Essential Genes of the Parasite via Machine Learning.通过机器学习推断寄生虫的必需基因。

Int J Mol Sci. 2024 Jun 27;25(13):7015. doi: 10.3390/ijms25137015.

Combination of computational techniques and RNAi reveal targets in Anopheles gambiae for malaria vector control.计算技术与 RNAi 的结合揭示了疟蚊控制的靶标

PLoS One. 2024 Jul 5;19(7):e0305207. doi: 10.1371/journal.pone.0305207. eCollection 2024.

Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience.通过机器学习理清必需基因的语境特异性：一种建设性的经验。

Biomolecules. 2023 Dec 22;14(1):18. doi: 10.3390/biom14010018.

'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程，用于从蛋白质数据中预测必需基因。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.

Machine learning on large scale perturbation screens for SARS-CoV-2 host factors identifies β-catenin/CBP inhibitor PRI-724 as a potent antiviral.针对新型冠状病毒（SARS-CoV-2）宿主因子的大规模扰动筛选的机器学习确定β-连环蛋白/CBP抑制剂PRI-724为一种有效的抗病毒药物。

Front Microbiol. 2023 Jun 5;14:1193320. doi: 10.3389/fmicb.2023.1193320. eCollection 2023.

Genome-wide functional screens enable the prediction of high activity CRISPR-Cas9 and -Cas12a guides in Yarrowia lipolytica.全基因组功能筛选可预测在解脂耶氏酵母中具有高活性的 CRISPR-Cas9 和 -Cas12a 向导。

Nat Commun. 2022 Feb 17;13(1):922. doi: 10.1038/s41467-022-28540-0.

本文引用的文献

Targeting pan-essential genes in cancer: Challenges and opportunities.靶向癌症中的泛必需基因：挑战与机遇。

Cancer Cell. 2021 Apr 12;39(4):466-479. doi: 10.1016/j.ccell.2020.12.008. Epub 2021 Jan 14.

CEG 2.0: an updated database of clusters of essential genes including eukaryotic organisms.CEG 2.0：一个包含真核生物的必需基因簇的更新数据库。

Database (Oxford). 2020 Dec 11;2020. doi: 10.1093/database/baaa112.

FlyBase: updates to the Drosophila melanogaster knowledge base.FlyBase：果蝇知识库的更新。

Nucleic Acids Res. 2021 Jan 8;49(D1):D899-D907. doi: 10.1093/nar/gkaa1026.

OGEE v3: Online GEne Essentiality database with increased coverage of organisms and human cell lines.OGEE v3：在线基因必需性数据库，涵盖的生物体和人类细胞系更多。

Nucleic Acids Res. 2021 Jan 8;49(D1):D998-D1003. doi: 10.1093/nar/gkaa884.

Genome-wide CRISPR screening reveals genes essential for cell viability and resistance to abiotic and biotic stresses in .全基因组 CRISPR 筛选揭示了在细胞存活和抵抗非生物及生物胁迫方面必不可少的基因。

Genome Res. 2020 May;30(5):757-767. doi: 10.1101/gr.249045.119. Epub 2020 May 18.

Essential gene prediction in using machine learning approaches based on sequence and functional features.基于序列和功能特征，使用机器学习方法进行必需基因预测。

Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.

A large-scale resource for tissue-specific CRISPR mutagenesis in .大规模组织特异性 CRISPR 基因敲除资源库。

Elife. 2020 Feb 13;9:e53865. doi: 10.7554/eLife.53865.

Human and mouse essentiality screens as a resource for disease gene discovery.人类和小鼠必需性筛选作为疾病基因发现的资源。

Nat Commun. 2020 Jan 31;11(1):655. doi: 10.1038/s41467-020-14284-2.

Genes essential for embryonic stem cells are associated with neurodevelopmental disorders.胚胎干细胞必需的基因与神经发育障碍有关。

Genome Res. 2019 Nov;29(11):1910-1918. doi: 10.1101/gr.250019.119. Epub 2019 Oct 24.

WormBase: a modern Model Organism Information Resource.WormBase：现代模式生物信息资源。

Nucleic Acids Res. 2020 Jan 8;48(D1):D762-D767. doi: 10.1093/nar/gkz920.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过机器学习识别真核生物中的必需基因。

Identifying essential genes across eukaryotes by machine learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献