• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种新的预测必需基因的计算策略。

A new computational strategy for predicting essential genes.

机构信息

College of Life Science, State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, China.

出版信息

BMC Genomics. 2013 Dec 21;14:910. doi: 10.1186/1471-2164-14-910.

DOI:10.1186/1471-2164-14-910
PMID:24359534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3880044/
Abstract

BACKGROUND

Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms.

RESULTS

We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction.

CONCLUSIONS

FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

摘要

背景

确定细胞生命所需的最小基因集是生物学的核心目标之一。在某些细菌物种中,全基因组必需基因的鉴定已经取得了快速进展;然而,在大多数真核生物物种中,这仍然是困难的。最近已经开发了几种计算模型来整合基因特征,并用作在生物体之间转移基因必需性注释的替代方法。

结果

我们首先收集了以前预测模型广泛使用的特征,并使用逐步回归模型评估了基因特征与基因必需性之间的关系。我们发现了两个可能显著降低模型准确性的问题:(i)基因特征之间的多线性影响,以及(ii)不同物种内部和之间基因特征与基因必需性之间存在的多样甚至相反的相关性。为了解决这些问题,我们开发了一种称为基于特征的加权朴素贝叶斯模型(FWM)的新模型,该模型基于朴素贝叶斯分类器、逻辑回归和遗传算法。所提出的模型评估特征并过滤掉多线性和多样性的影响。通过将 FWM 应用于 21 个物种之间和内部的必需基因相互预测,将 FWM 的性能与其他流行模型(如支持向量机、朴素贝叶斯模型和逻辑回归模型)进行了比较。我们的结果表明,FWM 显著提高了必需基因预测的准确性和稳健性。

结论

FWM 可以显著提高必需基因预测的准确性,并且可以用作其他分类工作的替代方法。这种方法可以为生物体所需的最小基因集的知识和新药物靶点的发现做出重要贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/8fa8c29c22bb/1471-2164-14-910-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/3607b343f60f/1471-2164-14-910-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/cb5eeabcf8be/1471-2164-14-910-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/3f29405eb2ae/1471-2164-14-910-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/e5f3b243898e/1471-2164-14-910-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/8fa8c29c22bb/1471-2164-14-910-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/3607b343f60f/1471-2164-14-910-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/cb5eeabcf8be/1471-2164-14-910-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/3f29405eb2ae/1471-2164-14-910-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/e5f3b243898e/1471-2164-14-910-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80a7/3880044/8fa8c29c22bb/1471-2164-14-910-5.jpg

相似文献

1
A new computational strategy for predicting essential genes.一种新的预测必需基因的计算策略。
BMC Genomics. 2013 Dec 21;14:910. doi: 10.1186/1471-2164-14-910.
2
Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用:综述。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
3
Training set selection for the prediction of essential genes.用于预测必需基因的训练集选择。
PLoS One. 2014 Jan 22;9(1):e86805. doi: 10.1371/journal.pone.0086805. eCollection 2014.
4
DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE:基于深度学习的人类必需基因精准预测。
PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.
5
Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.基于关键序列特征的31种不同细菌物种中必需基因预测的选择
PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.
6
Prediction of essential genes in prokaryote based on artificial neural network.基于人工神经网络的原核生物必需基因预测。
Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.
7
Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography.支持向量机、朴素贝叶斯和逻辑回归在评估冠状动脉造影必要性中的比较。
Int J Environ Res Public Health. 2020 Sep 4;17(18):6449. doi: 10.3390/ijerph17186449.
8
Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用:一种局部超线性学习方法。
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.
9
Prediction of essential proteins based on gene expression programming.基于基因表达编程的必需蛋白质预测。
BMC Genomics. 2013;14 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2164-14-S4-S7. Epub 2013 Oct 1.
10
Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.利用有限的基因必需性信息进行必需基因预测——一种综合的半监督机器学习策略。
PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.

引用本文的文献

1
Machine learning methods for predicting essential metabolic genes from Plasmodium falciparum genome-scale metabolic network.基于恶性疟原虫基因组规模代谢网络预测必需代谢基因的机器学习方法
PLoS One. 2024 Dec 23;19(12):e0315530. doi: 10.1371/journal.pone.0315530. eCollection 2024.
2
Recent advances in genome annotation and synthetic biology for the development of microbial chassis.用于微生物底盘开发的基因组注释和合成生物学的最新进展。
J Genet Eng Biotechnol. 2023 Dec 1;21(1):156. doi: 10.1186/s43141-023-00598-3.
3
Evaluation of machine learning classifiers for predicting essential genes in strains.

本文引用的文献

1
Duplication and retention biases of essential and non-essential genes revealed by systematic knockdown analyses.通过系统敲低分析揭示必需基因和非必需基因的重复和保留偏差。
PLoS Genet. 2013 May;9(5):e1003330. doi: 10.1371/journal.pgen.1003330. Epub 2013 May 9.
2
Indirect and suboptimal control of gene expression is widespread in bacteria.细菌中广泛存在间接和非最佳的基因表达控制。
Mol Syst Biol. 2013 Apr 16;9:660. doi: 10.1038/msb.2013.16.
3
Genome-wide essential gene identification in Streptococcus sanguinis.在酿脓链球菌中进行全基因组必需基因鉴定。
用于预测菌株中必需基因的机器学习分类器评估
Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.
4
Bacterial genome reductions: Tools, applications, and challenges.细菌基因组缩减:工具、应用及挑战
Front Genome Ed. 2022 Aug 31;4:957289. doi: 10.3389/fgeed.2022.957289. eCollection 2022.
5
Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN.基于规则的修剪和酵母蛋白质-蛋白质相互作用网络中必需蛋白质的计算机识别。
Cells. 2022 Aug 25;11(17):2648. doi: 10.3390/cells11172648.
6
Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.利用有限的基因必需性信息进行必需基因预测——一种综合的半监督机器学习策略。
PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.
7
DeeplyEssential: a deep neural network for predicting essential genes in microbes.深度必需:一种用于预测微生物必需基因的深度神经网络。
BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):367. doi: 10.1186/s12859-020-03688-y.
8
Prediction of essential genes in prokaryote based on artificial neural network.基于人工神经网络的原核生物必需基因预测。
Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.
9
Identifying mouse developmental essential genes using machine learning.利用机器学习识别小鼠发育必需基因。
Dis Model Mech. 2018 Dec 13;11(12):dmm034546. doi: 10.1242/dmm.034546.
10
Network-based features enable prediction of essential genes across diverse organisms.基于网络的特征可实现跨多种生物的必需基因预测。
PLoS One. 2018 Dec 13;13(12):e0208722. doi: 10.1371/journal.pone.0208722. eCollection 2018.
Sci Rep. 2011;1:125. doi: 10.1038/srep00125. Epub 2011 Oct 20.
4
Younger genes are less likely to be essential than older genes, and duplicates are less likely to be essential than singletons of the same age.年轻的基因比年老的基因不太可能是必需的,而与同一年龄的单倍体相比,重复基因不太可能是必需的。
Mol Biol Evol. 2012 Jul;29(7):1703-6. doi: 10.1093/molbev/mss014. Epub 2012 Jan 19.
5
The global transcriptional response of fission yeast to hydrogen sulfide.秀丽隐杆线虫对硫化氢的全球转录反应。
PLoS One. 2011;6(12):e28275. doi: 10.1371/journal.pone.0028275. Epub 2011 Dec 2.
6
The Pfam protein families database.Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.
7
Defining the role of essential genes in human disease.定义人类疾病中必需基因的作用。
PLoS One. 2011;6(11):e27368. doi: 10.1371/journal.pone.0027368. Epub 2011 Nov 11.
8
eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.eggNOG v3.0:涵盖了 41 个不同分类范围的 1133 个生物体的直系同源物组。
Nucleic Acids Res. 2012 Jan;40(Database issue):D284-9. doi: 10.1093/nar/gkr1060. Epub 2011 Nov 16.
9
OGEE: an online gene essentiality database.OGEE:一个在线基因必需性数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D901-6. doi: 10.1093/nar/gkr986. Epub 2011 Nov 10.
10
HMMER web server: interactive sequence similarity searching.HMMER 网页服务器:交互式序列相似性搜索。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011 May 18.