Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria.
Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria.
PLoS One. 2023 Aug 9;18(8):e0288023. doi: 10.1371/journal.pone.0288023. eCollection 2023.
Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.
近年来,使用机器学习对绝对必需基因进行计算预测已经引起了广泛关注。然而,必需基因大多是有条件的,而不是绝对的。实验技术提供了一种识别条件必需基因的可靠方法;然而,实验方法繁琐、耗时且资源密集,因此计算技术已被用于补充实验方法。由于缺乏用于训练模型或模拟基因必需性条件的所需数据,监督机器学习或通量平衡分析等计算技术受到严重限制。本研究开发了一种基于轻梯度提升模型的启发式主动机器学习方法,用于预测果蝇中的必需免疫反应和胚胎发育基因。我们提出了一种新的采样选择技术,并引入了一个启发式函数,该函数取代了传统主动学习模型中的人为成分。启发式函数动态选择未标记的样本,以在下一次迭代中提高分类器的性能。使用四个基准数据集对提出的模型进行测试,与传统的主动学习模型(随机采样和不确定性采样)相比,提出的模型表现出更好的性能。将模型应用于识别条件必需基因,鉴定出四个新的必需免疫反应基因和一组在胚胎发育条件下必需的 48 个新基因。我们对预测基因进行了功能富集分析,以阐明它们的生物学过程,结果证明了我们的预测。在必需免疫反应和胚胎发育基因中,分别显著富集了免疫反应和胚胎发育相关过程。最后,我们提出了预测的必需基因,以供未来的实验研究使用,并在 http://heal.covenantuniversity.edu.ng 上提供了开发的工具,用于条件必需性预测。