使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.

作者信息

Campos Tulio L, Korhonen Pasi K, Gasser Robin B, Young Neil D

机构信息

Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.

Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil.

出版信息

Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.

DOI:10.1016/j.csbj.2019.05.008

PMID:31312416

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6607062/

Abstract

The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.

摘要

全基因组序列及相关多组学数据集的可得性，再加上基因敲除和敲低方法的进展，使得对真核生物中基因和蛋白质功能进行大规模注释和探索成为可能。了解哪些基因对于真核生物的生存至关重要，对于理解生命的基本机制至关重要，并且有助于确定真核病原体和癌症中的干预靶点。在此，我们研究了选定真核生物物种中的必需基因直系同源物，然后采用系统的机器学习方法，利用蛋白质序列衍生特征和选择程序，来研究物种内部和物种之间的必需基因预测。我们表明，与所研究的真核生物物种中的直系同源物总数相比，必需基因直系同源物的数量只占很小的比例。此外，我们证明，用与必需性相关的数据子集训练的机器学习模型，在预测特定物种的基因必需性方面比随机猜测表现更好。与我们的基因直系同源物分析一致，对多个（包括远缘相关的）物种中的必需基因进行预测是可能的，但具有挑战性，这表明大多数必需基因是物种特有的。本研究为使用机器学习方法扩展真核生物全基因组必需性研究奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45c5/6607062/fe53c1b0551a/ga1.jpg

相似文献

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估

Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.

Predicting gene essentiality in by feature engineering and machine-learning.通过特征工程和机器学习预测基因必需性。（你提供的原文“Predicting gene essentiality in by feature engineering and machine-learning.”似乎不完整，“in”后面缺少具体内容，但按照要求进行了现有内容的翻译。）

Comput Struct Biotechnol J. 2020 May 15;18:1093-1102. doi: 10.1016/j.csbj.2020.05.008. eCollection 2020.

Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用：综述。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.

Sequence-based information-theoretic features for gene essentiality prediction.用于基因必需性预测的基于序列的信息论特征。

BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.

Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.利用有限的基因必需性信息进行必需基因预测——一种综合的半监督机器学习策略。

PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.

Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method.通过基于序列嵌入的机器学习方法预测人类与病毒的蛋白质-蛋白质相互作用。

Comput Struct Biotechnol J. 2019 Dec 26;18:153-161. doi: 10.1016/j.csbj.2019.12.005. eCollection 2020.

DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE：基于深度学习的人类必需基因精准预测。

PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.

Identifying essential genes across eukaryotes by machine learning.通过机器学习识别真核生物中的必需基因。

NAR Genom Bioinform. 2021 Nov 30;3(4):lqab110. doi: 10.1093/nargab/lqab110. eCollection 2021 Dec.

Essential gene prediction in using machine learning approaches based on sequence and functional features.基于序列和功能特征，使用机器学习方法进行必需基因预测。

Comput Struct Biotechnol J. 2020 Mar 10;18:612-621. doi: 10.1016/j.csbj.2020.02.022. eCollection 2020.

Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae.范贡森林（F2）：一种用于对双生病毒科中的基因和属进行分类的机器学习方法。

BMC Bioinformatics. 2017 Sep 30;18(1):431. doi: 10.1186/s12859-017-1839-x.

引用本文的文献

Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.基于机器学习分析的细菌必需基因中差异使用密码子的鉴定。

Mol Genet Genomics. 2024 Jul 27;299(1):72. doi: 10.1007/s00438-024-02163-0.

Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality.基于图神经网络和基因组代谢模型的基因必需性预测方法。

NPJ Syst Biol Appl. 2024 Mar 6;10(1):24. doi: 10.1038/s41540-024-00348-2.

'Bingo'-a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data.'Bingo'——一个基于大语言模型和图神经网络的工作流程，用于从蛋白质数据中预测必需基因。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad472.

Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster.启发式支持的主动机器学习：以预测黑腹果蝇必需发育阶段和免疫反应基因为例的研究。

PLoS One. 2023 Aug 9;18(8):e0288023. doi: 10.1371/journal.pone.0288023. eCollection 2023.

Predicting and explaining the impact of genetic disruptions and interactions on organismal viability.预测和解释遗传干扰和相互作用对生物个体生存能力的影响。

Bioinformatics. 2022 Sep 2;38(17):4088-4099. doi: 10.1093/bioinformatics/btac519.

XGEM: Predicting Essential miRNAs by the Ensembles of Various Sequence-Based Classifiers With XGBoost Algorithm.XGEM：使用XGBoost算法通过各种基于序列的分类器集成来预测必需的微小RNA

Front Genet. 2022 Mar 28;13:877409. doi: 10.3389/fgene.2022.877409. eCollection 2022.

SGII: Systematic Identification of Essential lncRNAs in Mouse and Human Genome With lncRNA-Protein-Protein Heterogeneous Interaction Network.SGII：利用lncRNA-蛋白质-蛋白质异质相互作用网络对小鼠和人类基因组中的必需lncRNA进行系统鉴定

Front Genet. 2022 Mar 21;13:864564. doi: 10.3389/fgene.2022.864564. eCollection 2022.

Identifying essential genes across eukaryotes by machine learning.通过机器学习识别真核生物中的必需基因。

NAR Genom Bioinform. 2021 Nov 30;3(4):lqab110. doi: 10.1093/nargab/lqab110. eCollection 2021 Dec.

Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning.使用机器学习在两种模式真核生物之间交叉预测必需基因。

Int J Mol Sci. 2021 May 11;22(10):5056. doi: 10.3390/ijms22105056.

Combined use of feature engineering and machine-learning to predict essential genes in .结合特征工程和机器学习来预测……中的必需基因。（原文句末不完整）

NAR Genom Bioinform. 2020 Jul 22;2(3):lqaa051. doi: 10.1093/nargab/lqaa051. eCollection 2020 Sep.

本文引用的文献

Deep learning: new computational modelling techniques for genomics.深度学习：基因组学的新计算建模技术。

Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6.

Variable selection in omics data: A practical evaluation of small sample sizes.组学数据中的变量选择：小样本量的实际评估。

PLoS One. 2018 Jun 21;13(6):e0197910. doi: 10.1371/journal.pone.0197910. eCollection 2018.

Sequence-based information-theoretic features for gene essentiality prediction.用于基因必需性预测的基于序列的信息论特征。

BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.

The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces.2018 年的 OMA 同源数据库：通过更丰富的网络和编程接口检索所有生命领域之间的进化关系。

Nucleic Acids Res. 2018 Jan 4;46(D1):D477-D485. doi: 10.1093/nar/gkx1019.

Human gene essentiality.人类基因的必需性。

Nat Rev Genet. 2018 Jan;19(1):51-62. doi: 10.1038/nrg.2017.75. Epub 2017 Oct 30.

Emerging and evolving concepts in gene essentiality.基因必需性的新兴和发展概念。

Nat Rev Genet. 2018 Jan;19(1):34-49. doi: 10.1038/nrg.2017.74. Epub 2017 Oct 16.

Properties of genes essential for mouse development.对小鼠发育至关重要的基因特性。

PLoS One. 2017 May 31;12(5):e0178273. doi: 10.1371/journal.pone.0178273. eCollection 2017.

Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.基于关键序列特征的31种不同细菌物种中必需基因预测的选择

PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.

Minimal Cells-Real and Imagined.最小细胞——真实与想象。

Cold Spring Harb Perspect Biol. 2017 Dec 1;9(12):a023861. doi: 10.1101/cshperspect.a023861.

Accurate prediction of human essential genes using only nucleotide composition and association information.仅利用核苷酸组成和关联信息对人类必需基因进行准确预测。

Bioinformatics. 2017 Jun 15;33(12):1758-1764. doi: 10.1093/bioinformatics/btx055.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献