Suppr超能文献

用于预测菌株中必需基因的机器学习分类器评估

Evaluation of machine learning classifiers for predicting essential genes in strains.

作者信息

Mukul Das Monish, Sarkar Keka

机构信息

Department of Computer Science and Engineering, University of Kalyani, Kalyani, Nadia - 741235.

Department of Microbiology, University of Kalyani, Kalyani, Nadia - 741235.

出版信息

Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.

Abstract

Accurate investigation and prediction of essential genes from bacterial genome is very important as it might be explored in effective targets for antimicrobial drugs and understanding biological mechanism of a cell. A subset of key features data obtained from 14 genome sequence-based features of 20 strains of bacteria whose essential gene information was downloaded from ePath and NCBI database for mapping and matching essential genes by using a genome extraction program. The selection of key features was performed by using Genetic Algorithm. For each of three classifiers, 80%, 10% and 10% of subset key features were used for training, validation and testing, respectively. Experimental results (10-f-cv) illustrated that DNN (proposed), DT, and SVM achieved AUC of 0.98, 0.88 and 0.82, respectively. DNN (proposed) outperformed DT and SVM. The higher prediction accuracy of classifiers was observed because of using only key features which also justified better generalizability of classifiers and efficiency of key features related to gene essentiality. Besides, DNN (proposed) also showed best prediction performance while compared with other predictors used in previous studies. The genome extraction program was developed for mapping and matching of essential genes between ePath and NCBI database.

摘要

从细菌基因组中准确研究和预测必需基因非常重要,因为它可能有助于探索抗菌药物的有效靶点并理解细胞的生物学机制。从20株细菌的14个基于基因组序列的特征中获取关键特征数据子集,这些细菌的必需基因信息从ePath和NCBI数据库下载,通过使用基因组提取程序来映射和匹配必需基因。关键特征的选择通过遗传算法进行。对于三个分类器中的每一个,分别使用80%、10%和10%的关键特征子集进行训练、验证和测试。实验结果(10折交叉验证)表明,所提出的深度神经网络(DNN)、决策树(DT)和支持向量机(SVM)的曲线下面积(AUC)分别为0.98、0.88和0.82。所提出的DNN优于DT和SVM。由于仅使用关键特征,观察到分类器具有更高的预测准确性,这也证明了分类器具有更好的泛化能力以及与基因必需性相关的关键特征的有效性。此外,与先前研究中使用的其他预测器相比,所提出的DNN也表现出最佳的预测性能。基因组提取程序是为在ePath和NCBI数据库之间映射和匹配必需基因而开发的。

相似文献

2
Predicting drug-target interaction network using deep learning model.利用深度学习模型预测药物-靶标相互作用网络。
Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.
9
Deep learning-based landslide susceptibility mapping.基于深度学习的滑坡易发性制图。
Sci Rep. 2021 Dec 16;11(1):24112. doi: 10.1038/s41598-021-03585-1.

本文引用的文献

6
Maximum entropy methods for extracting the learned features of deep neural networks.用于提取深度神经网络学习特征的最大熵方法。
PLoS Comput Biol. 2017 Oct 30;13(10):e1005836. doi: 10.1371/journal.pcbi.1005836. eCollection 2017 Oct.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验