Suppr超能文献

用于预测菌株中必需基因的机器学习分类器评估

Evaluation of machine learning classifiers for predicting essential genes in strains.

作者信息

Mukul Das Monish, Sarkar Keka

机构信息

Department of Computer Science and Engineering, University of Kalyani, Kalyani, Nadia - 741235.

Department of Microbiology, University of Kalyani, Kalyani, Nadia - 741235.

出版信息

Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.

Abstract

Accurate investigation and prediction of essential genes from bacterial genome is very important as it might be explored in effective targets for antimicrobial drugs and understanding biological mechanism of a cell. A subset of key features data obtained from 14 genome sequence-based features of 20 strains of bacteria whose essential gene information was downloaded from ePath and NCBI database for mapping and matching essential genes by using a genome extraction program. The selection of key features was performed by using Genetic Algorithm. For each of three classifiers, 80%, 10% and 10% of subset key features were used for training, validation and testing, respectively. Experimental results (10-f-cv) illustrated that DNN (proposed), DT, and SVM achieved AUC of 0.98, 0.88 and 0.82, respectively. DNN (proposed) outperformed DT and SVM. The higher prediction accuracy of classifiers was observed because of using only key features which also justified better generalizability of classifiers and efficiency of key features related to gene essentiality. Besides, DNN (proposed) also showed best prediction performance while compared with other predictors used in previous studies. The genome extraction program was developed for mapping and matching of essential genes between ePath and NCBI database.

摘要

从细菌基因组中准确研究和预测必需基因非常重要,因为它可能有助于探索抗菌药物的有效靶点并理解细胞的生物学机制。从20株细菌的14个基于基因组序列的特征中获取关键特征数据子集,这些细菌的必需基因信息从ePath和NCBI数据库下载,通过使用基因组提取程序来映射和匹配必需基因。关键特征的选择通过遗传算法进行。对于三个分类器中的每一个,分别使用80%、10%和10%的关键特征子集进行训练、验证和测试。实验结果(10折交叉验证)表明,所提出的深度神经网络(DNN)、决策树(DT)和支持向量机(SVM)的曲线下面积(AUC)分别为0.98、0.88和0.82。所提出的DNN优于DT和SVM。由于仅使用关键特征,观察到分类器具有更高的预测准确性,这也证明了分类器具有更好的泛化能力以及与基因必需性相关的关键特征的有效性。此外,与先前研究中使用的其他预测器相比,所提出的DNN也表现出最佳的预测性能。基因组提取程序是为在ePath和NCBI数据库之间映射和匹配必需基因而开发的。

相似文献

1
Evaluation of machine learning classifiers for predicting essential genes in strains.
Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.
2
Predicting drug-target interaction network using deep learning model.
Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.
3
Predicting Health Material Accessibility: Development of Machine Learning Algorithms.
JMIR Med Inform. 2021 Sep 1;9(9):e29175. doi: 10.2196/29175.
4
Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach.
Front Genet. 2020 Sep 17;11:539227. doi: 10.3389/fgene.2020.539227. eCollection 2020.
6
A universal deep learning approach for modeling the flow of patients under different severities.
Comput Methods Programs Biomed. 2018 Feb;154:191-203. doi: 10.1016/j.cmpb.2017.11.003. Epub 2017 Nov 7.
7
Machine learning approach to gene essentiality prediction: a review.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab128.
8
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features.
Comput Struct Biotechnol J. 2019 Jun 8;17:785-796. doi: 10.1016/j.csbj.2019.05.008. eCollection 2019.
9
Deep learning-based landslide susceptibility mapping.
Sci Rep. 2021 Dec 16;11(1):24112. doi: 10.1038/s41598-021-03585-1.

本文引用的文献

1
DeeplyEssential: a deep neural network for predicting essential genes in microbes.
BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):367. doi: 10.1186/s12859-020-03688-y.
2
Prediction of essential genes in prokaryote based on artificial neural network.
Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.
3
ePath: an online database towards comprehensive essential gene annotation for prokaryotes.
Sci Rep. 2019 Sep 10;9(1):12949. doi: 10.1038/s41598-019-49098-w.
4
Network-based features enable prediction of essential genes across diverse organisms.
PLoS One. 2018 Dec 13;13(12):e0208722. doi: 10.1371/journal.pone.0208722. eCollection 2018.
5
Sequence-based information-theoretic features for gene essentiality prediction.
BMC Bioinformatics. 2017 Nov 9;18(1):473. doi: 10.1186/s12859-017-1884-5.
6
Maximum entropy methods for extracting the learned features of deep neural networks.
PLoS Comput Biol. 2017 Oct 30;13(10):e1005836. doi: 10.1371/journal.pcbi.1005836. eCollection 2017 Oct.
7
Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species.
PLoS One. 2017 Mar 30;12(3):e0174638. doi: 10.1371/journal.pone.0174638. eCollection 2017.
8
Sequence comparison and essential gene identification with new inter-nucleotide distance sequences.
J Theor Biol. 2017 Apr 7;418:84-93. doi: 10.1016/j.jtbi.2017.01.031. Epub 2017 Jan 27.
9
Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.
Comput Med Imaging Graph. 2017 Sep;60:42-49. doi: 10.1016/j.compmedimag.2016.12.002. Epub 2016 Dec 28.
10
Predicting bacterial essential genes using only sequence composition information.
Genet Mol Res. 2014 Jun 17;13(2):4564-72. doi: 10.4238/2014.June.17.8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验