一种新的预测必需基因的计算策略。

A new computational strategy for predicting essential genes.

机构信息

College of Life Science, State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, China.

出版信息

BMC Genomics. 2013 Dec 21;14:910. doi: 10.1186/1471-2164-14-910.

DOI:10.1186/1471-2164-14-910

PMID:24359534

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3880044/

Abstract

BACKGROUND

Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms.

RESULTS

We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction.

CONCLUSIONS

FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

摘要

背景

确定细胞生命所需的最小基因集是生物学的核心目标之一。在某些细菌物种中，全基因组必需基因的鉴定已经取得了快速进展；然而，在大多数真核生物物种中，这仍然是困难的。最近已经开发了几种计算模型来整合基因特征，并用作在生物体之间转移基因必需性注释的替代方法。

结果

我们首先收集了以前预测模型广泛使用的特征，并使用逐步回归模型评估了基因特征与基因必需性之间的关系。我们发现了两个可能显著降低模型准确性的问题：（i）基因特征之间的多线性影响，以及（ii）不同物种内部和之间基因特征与基因必需性之间存在的多样甚至相反的相关性。为了解决这些问题，我们开发了一种称为基于特征的加权朴素贝叶斯模型（FWM）的新模型，该模型基于朴素贝叶斯分类器、逻辑回归和遗传算法。所提出的模型评估特征并过滤掉多线性和多样性的影响。通过将 FWM 应用于 21 个物种之间和内部的必需基因相互预测，将 FWM 的性能与其他流行模型（如支持向量机、朴素贝叶斯模型和逻辑回归模型）进行了比较。我们的结果表明，FWM 显著提高了必需基因预测的准确性和稳健性。

结论

FWM 可以显著提高必需基因预测的准确性，并且可以用作其他分类工作的替代方法。这种方法可以为生物体所需的最小基因集的知识和新药物靶点的发现做出重要贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/3607b343f60f/1471-2164-14-910-1.jpg

相似文献

A new computational strategy for predicting essential genes.

BMC Genomics. 2013 Dec 21;14:910. doi: 10.1186/1471-2164-14-910.

Machine learning approach to gene essentiality prediction: a review.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.

Training set selection for the prediction of essential genes.

PLoS One. 2014 Jan 22;9(1):e86805. doi: 10.1371/journal.pone.0086805. eCollection 2014.

DeepHE: Accurately predicting human essential genes based on deep learning.

PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.

Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.

PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.

Prediction of essential genes in prokaryote based on artificial neural network.

Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.

Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography.

Int J Environ Res Public Health. 2020 Sep 4;17(18):6449. doi: 10.3390/ijerph17186449.

Feature weight estimation for gene selection: a local hyperlinear learning approach.

BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

Prediction of essential proteins based on gene expression programming.

BMC Genomics. 2013;14 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2164-14-S4-S7. Epub 2013 Oct 1.

Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.

PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.

引用本文的文献

Machine learning methods for predicting essential metabolic genes from Plasmodium falciparum genome-scale metabolic network.

PLoS One. 2024 Dec 23;19(12):e0315530. doi: 10.1371/journal.pone.0315530. eCollection 2024.

Recent advances in genome annotation and synthetic biology for the development of microbial chassis.

J Genet Eng Biotechnol. 2023 Dec 1;21(1):156. doi: 10.1186/s43141-023-00598-3.

Evaluation of machine learning classifiers for predicting essential genes in strains.

Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.

Bacterial genome reductions: Tools, applications, and challenges.

Front Genome Ed. 2022 Aug 31;4:957289. doi: 10.3389/fgeed.2022.957289. eCollection 2022.

Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN.

Cells. 2022 Aug 25;11(17):2648. doi: 10.3390/cells11172648.

Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.

PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.

DeeplyEssential: a deep neural network for predicting essential genes in microbes.

BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):367. doi: 10.1186/s12859-020-03688-y.

Prediction of essential genes in prokaryote based on artificial neural network.

Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.

Identifying mouse developmental essential genes using machine learning.

Dis Model Mech. 2018 Dec 13;11(12):dmm034546. doi: 10.1242/dmm.034546.

Network-based features enable prediction of essential genes across diverse organisms.

PLoS One. 2018 Dec 13;13(12):e0208722. doi: 10.1371/journal.pone.0208722. eCollection 2018.

本文引用的文献

Duplication and retention biases of essential and non-essential genes revealed by systematic knockdown analyses.

PLoS Genet. 2013 May;9(5):e1003330. doi: 10.1371/journal.pgen.1003330. Epub 2013 May 9.

Indirect and suboptimal control of gene expression is widespread in bacteria.

Mol Syst Biol. 2013 Apr 16;9:660. doi: 10.1038/msb.2013.16.

Genome-wide essential gene identification in Streptococcus sanguinis.

Sci Rep. 2011;1:125. doi: 10.1038/srep00125. Epub 2011 Oct 20.

Younger genes are less likely to be essential than older genes, and duplicates are less likely to be essential than singletons of the same age.

Mol Biol Evol. 2012 Jul;29(7):1703-6. doi: 10.1093/molbev/mss014. Epub 2012 Jan 19.

The global transcriptional response of fission yeast to hydrogen sulfide.

PLoS One. 2011;6(12):e28275. doi: 10.1371/journal.pone.0028275. Epub 2011 Dec 2.

The Pfam protein families database.

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Defining the role of essential genes in human disease.

PLoS One. 2011;6(11):e27368. doi: 10.1371/journal.pone.0027368. Epub 2011 Nov 11.

eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.

Nucleic Acids Res. 2012 Jan;40(Database issue):D284-9. doi: 10.1093/nar/gkr1060. Epub 2011 Nov 16.

OGEE: an online gene essentiality database.

Nucleic Acids Res. 2012 Jan;40(Database issue):D901-6. doi: 10.1093/nar/gkr986. Epub 2011 Nov 10.

HMMER web server: interactive sequence similarity searching.

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种新的预测必需基因的计算策略。

A new computational strategy for predicting essential genes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献