Mukul Das Monish, Sarkar Keka
Department of Computer Science and Engineering, University of Kalyani, Kalyani, Nadia - 741235.
Department of Microbiology, University of Kalyani, Kalyani, Nadia - 741235.
Bioinformation. 2022 Dec 31;18(12):1126-1130. doi: 10.6026/973206300181126. eCollection 2022.
Accurate investigation and prediction of essential genes from bacterial genome is very important as it might be explored in effective targets for antimicrobial drugs and understanding biological mechanism of a cell. A subset of key features data obtained from 14 genome sequence-based features of 20 strains of bacteria whose essential gene information was downloaded from ePath and NCBI database for mapping and matching essential genes by using a genome extraction program. The selection of key features was performed by using Genetic Algorithm. For each of three classifiers, 80%, 10% and 10% of subset key features were used for training, validation and testing, respectively. Experimental results (10-f-cv) illustrated that DNN (proposed), DT, and SVM achieved AUC of 0.98, 0.88 and 0.82, respectively. DNN (proposed) outperformed DT and SVM. The higher prediction accuracy of classifiers was observed because of using only key features which also justified better generalizability of classifiers and efficiency of key features related to gene essentiality. Besides, DNN (proposed) also showed best prediction performance while compared with other predictors used in previous studies. The genome extraction program was developed for mapping and matching of essential genes between ePath and NCBI database.
从细菌基因组中准确研究和预测必需基因非常重要,因为它可能有助于探索抗菌药物的有效靶点并理解细胞的生物学机制。从20株细菌的14个基于基因组序列的特征中获取关键特征数据子集,这些细菌的必需基因信息从ePath和NCBI数据库下载,通过使用基因组提取程序来映射和匹配必需基因。关键特征的选择通过遗传算法进行。对于三个分类器中的每一个,分别使用80%、10%和10%的关键特征子集进行训练、验证和测试。实验结果(10折交叉验证)表明,所提出的深度神经网络(DNN)、决策树(DT)和支持向量机(SVM)的曲线下面积(AUC)分别为0.98、0.88和0.82。所提出的DNN优于DT和SVM。由于仅使用关键特征,观察到分类器具有更高的预测准确性,这也证明了分类器具有更好的泛化能力以及与基因必需性相关的关键特征的有效性。此外,与先前研究中使用的其他预测器相比,所提出的DNN也表现出最佳的预测性能。基因组提取程序是为在ePath和NCBI数据库之间映射和匹配必需基因而开发的。