• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基因必需性预测的基于序列的信息论特征。

Sequence-based information-theoretic features for gene essentiality prediction.

作者信息

Nigatu Dawit, Sobetzko Patrick, Yousef Malik, Henkel Werner

机构信息

Transmission Systems Group, Jacobs University Bremen, Campus Ring 1, Bremen, D-28759, Germany.

Philipps-Universität Marburg, LOEWE-Zentrum für Synthetische Mikrobiologie, Hans-Meerwein-Straße, Mehrzweckgebäude, Marburg, 35043, Germany.

出版信息

BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.

DOI:10.1186/s12859-017-1884-5
PMID:29121868
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5679510/
Abstract

BACKGROUND

Identification of essential genes is not only useful for our understanding of the minimal gene set required for cellular life but also aids the identification of novel drug targets in pathogens. In this work, we present a simple and effective gene essentiality prediction method using information-theoretic features that are derived exclusively from the gene sequences.

RESULTS

We developed a Random Forest classifier and performed an extensive model performance evaluation among and within 15 selected bacteria. In intra-organism predictions, where training and testing sets are taken from the same organism, AUC (Area Under the Curve) scores ranging from 0.73 to 0.90, 0.84 on average, were obtained. Cross-organism predictions using 5-fold cross-validation, pairwise, leave-one-species-out, leave-one-taxon-out, and cross-taxon yielded average AUC scores of 0.88, 0.75, 0.80, 0.82, and 0.78, respectively. To further show the applicability of our method in other domains of life, we predicted the essential genes of the yeast Schizosaccharomyces pombe and obtained a similar accuracy (AUC 0.84).

CONCLUSIONS

The proposed method enables a simple and reliable identification of essential genes without searching in databases for orthologs and demanding further experimental data such as network topology and gene-expression.

摘要

背景

鉴定必需基因不仅有助于我们理解细胞生命所需的最小基因集,还有助于识别病原体中的新型药物靶点。在这项工作中,我们提出了一种简单有效的基因必需性预测方法,该方法使用仅从基因序列中导出的信息论特征。

结果

我们开发了一种随机森林分类器,并在15种选定的细菌内部和之间进行了广泛的模型性能评估。在生物体内部预测中,训练集和测试集取自同一生物体,获得的曲线下面积(AUC)分数范围为0.73至0.90,平均为0.84。使用5折交叉验证、成对、留一物种法、留一分类单元法和交叉分类单元法进行的跨生物体预测,平均AUC分数分别为0.88、0.75、0.80、0.82和0.78。为了进一步展示我们的方法在生命其他领域的适用性,我们预测了粟酒裂殖酵母的必需基因,并获得了相似的准确率(AUC 0.84)。

结论

所提出的方法能够简单可靠地鉴定必需基因,而无需在数据库中搜索直系同源物,也不需要诸如网络拓扑和基因表达等进一步的实验数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/4ca20a96c00e/12859_2017_1884_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/f678e5bfc497/12859_2017_1884_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/d0db6c5e0057/12859_2017_1884_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/339170b6518d/12859_2017_1884_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/a8742e554053/12859_2017_1884_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/4ca20a96c00e/12859_2017_1884_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/f678e5bfc497/12859_2017_1884_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/d0db6c5e0057/12859_2017_1884_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/339170b6518d/12859_2017_1884_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/a8742e554053/12859_2017_1884_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dbc/5679510/4ca20a96c00e/12859_2017_1884_Fig5_HTML.jpg

相似文献

1
Sequence-based information-theoretic features for gene essentiality prediction.用于基因必需性预测的基于序列的信息论特征。
BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.
2
Gene essentiality prediction based on fractal features and machine learning.基于分形特征和机器学习的基因必需性预测
Mol Biosyst. 2017 Feb 28;13(3):577-584. doi: 10.1039/c6mb00806b.
3
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.使用蛋白质序列衍生特征对真核生物中必需基因进行预测的机器学习方法评估
Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.
4
Machine learning approach to gene essentiality prediction: a review.机器学习在基因必需性预测中的应用:综述。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
5
Predicting essential genes of 37 prokaryotes by combining information-theoretic features.通过结合信息论特征预测37种原核生物的必需基因。
J Microbiol Methods. 2021 Sep;188:106297. doi: 10.1016/j.mimet.2021.106297. Epub 2021 Jul 31.
6
Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS.使用线性方法ZUPLS预测原核生物基因组中的必需基因。
Integr Biol (Camb). 2014 Apr;6(4):460-9. doi: 10.1039/c3ib40241j. Epub 2014 Mar 7.
7
Identifying essential genes in bacterial metabolic networks with machine learning methods.运用机器学习方法识别细菌代谢网络中的必需基因。
BMC Syst Biol. 2010 May 3;4:56. doi: 10.1186/1752-0509-4-56.
8
Towards the identification of essential genes using targeted genome sequencing and comparative analysis.利用靶向基因组测序和比较分析鉴定必需基因
BMC Genomics. 2006 Oct 19;7:265. doi: 10.1186/1471-2164-7-265.
9
Predicting bacterial essential genes using only sequence composition information.仅使用序列组成信息预测细菌必需基因。
Genet Mol Res. 2014 Jun 17;13(2):4564-72. doi: 10.4238/2014.June.17.8.
10
Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy.利用有限的基因必需性信息进行必需基因预测——一种综合的半监督机器学习策略。
PLoS One. 2020 Nov 30;15(11):e0242943. doi: 10.1371/journal.pone.0242943. eCollection 2020.

引用本文的文献

1
In silico characterization, structural modeling, and molecular docking of GabP in citrus and its potential role in GABA uptake.柑橘中GabP的计算机模拟表征、结构建模及分子对接及其在γ-氨基丁酸摄取中的潜在作用
Sci Rep. 2025 Jul 4;15(1):23919. doi: 10.1038/s41598-025-07447-y.
2
Machine learning methods for predicting essential metabolic genes from Plasmodium falciparum genome-scale metabolic network.基于恶性疟原虫基因组规模代谢网络预测必需代谢基因的机器学习方法
PLoS One. 2024 Dec 23;19(12):e0315530. doi: 10.1371/journal.pone.0315530. eCollection 2024.
3
Essential genes identification model based on sequence feature map and graph convolutional neural network.

本文引用的文献

1
Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.基于关键序列特征的31种不同细菌物种中必需基因预测的选择
PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.
2
MicroRNA categorization using sequence motifs and k-mers.使用序列基序和k-mer对微小RNA进行分类。
BMC Bioinformatics. 2017 Mar 14;18(1):170. doi: 10.1186/s12859-017-1584-1.
3
Accurate prediction of human essential genes using only nucleotide composition and association information.
基于序列特征图和图卷积神经网络的必需基因识别模型。
BMC Genomics. 2024 Jan 10;25(1):47. doi: 10.1186/s12864-024-09958-w.
4
Evaluation of machine learning classifiers for predicting essential genes in strains.用于预测菌株中必需基因的机器学习分类器评估
Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.
5
Recent advances in genetic tools for engineering probiotic lactic acid bacteria.近年来,用于工程益生菌乳酸菌的遗传工具取得了新进展。
Biosci Rep. 2023 Jan 31;43(1). doi: 10.1042/BSR20211299.
6
NetGenes: A Database of Essential Genes Predicted Using Features From Interaction Networks.NetGenes:一个利用相互作用网络特征预测的必需基因数据库。
Front Genet. 2021 Sep 23;12:722198. doi: 10.3389/fgene.2021.722198. eCollection 2021.
7
DELEAT: gene essentiality prediction and deletion design for bacterial genome reduction.DELEAT:细菌基因组减少的基因必需性预测和删除设计。
BMC Bioinformatics. 2021 Sep 18;22(1):444. doi: 10.1186/s12859-021-04348-5.
8
IIMLP: integrated information-entropy-based method for LncRNA prediction.IIMLP:基于集成信息熵的长链非编码RNA预测方法
BMC Bioinformatics. 2021 May 13;22(Suppl 3):243. doi: 10.1186/s12859-020-03884-w.
9
On the relation of gene essentiality to intron structure: a computational and deep learning approach.关于基因必需性与内含子结构的关系:一种计算和深度学习方法。
Life Sci Alliance. 2021 Apr 27;4(6). doi: 10.26508/lsa.202000951. Print 2021 Jun.
10
A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification.基于集成深度神经网络的必需基因识别计算框架。
Int J Mol Sci. 2020 Nov 28;21(23):9070. doi: 10.3390/ijms21239070.
仅利用核苷酸组成和关联信息对人类必需基因进行准确预测。
Bioinformatics. 2017 Jun 15;33(12):1758-1764. doi: 10.1093/bioinformatics/btx055.
4
Gene essentiality prediction based on fractal features and machine learning.基于分形特征和机器学习的基因必需性预测
Mol Biosyst. 2017 Feb 28;13(3):577-584. doi: 10.1039/c6mb00806b.
5
Sequence comparison and essential gene identification with new inter-nucleotide distance sequences.利用新的核苷酸间距离序列进行序列比较和必需基因鉴定。
J Theor Biol. 2017 Apr 7;418:84-93. doi: 10.1016/j.jtbi.2017.01.031. Epub 2017 Jan 27.
6
Predicting Essential Genes and Proteins Based on Machine Learning and Network Topological Features: A Comprehensive Review.基于机器学习和网络拓扑特征预测必需基因和蛋白质:综述
Front Physiol. 2016 Mar 8;7:75. doi: 10.3389/fphys.2016.00075. eCollection 2016.
7
Design and synthesis of a minimal bacterial genome.最小细菌基因组的设计与合成。
Science. 2016 Mar 25;351(6280):aad6253. doi: 10.1126/science.aad6253.
8
Relationship between digital information and thermodynamic stability in bacterial genomes.细菌基因组中数字信息与热力学稳定性之间的关系。
EURASIP J Bioinform Syst Biol. 2016 Feb 2;2016(1):4. doi: 10.1186/s13637-016-0037-x. eCollection 2016 Dec.
9
Identifying essential Streptococcus sanguinis genes using genome-wide deletion mutation.利用全基因组缺失突变鉴定血链球菌的必需基因
Methods Mol Biol. 2015;1279:15-23. doi: 10.1007/978-1-4939-2398-4_2.
10
Predicting bacterial essential genes using only sequence composition information.仅使用序列组成信息预测细菌必需基因。
Genet Mol Res. 2014 Jun 17;13(2):4564-72. doi: 10.4238/2014.June.17.8.