Emmanuel Jerry, Isewon Itunuoluwa, Oyelade Jelili
Department of Computer and Information Sciences, Covenant University, Ota, Nigeria.
Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria.
Comput Struct Biotechnol J. 2025 Jan 26;27:595-611. doi: 10.1016/j.csbj.2025.01.020. eCollection 2025.
Deep Forest employs forest structures and leverages deep architecture to learn feature vector information adaptively. However, deep forest-based models have limitations such as manual hyperparameter optimization and time and memory usage inefficiencies. Bayesian optimization is a widely used model-based hyperparameter optimization method. Evolutionary algorithms such as Differential Evolution (DE) have recently been introduced to improve Bayesian optimization's acquisition function. Despite its effectiveness, DE has a significant drawback as it relies on randomly selecting indices from the population of target vectors to construct donor vectors in search of optimal solutions. This randomness is ineffective, as suboptimal or redundant indices may be selected. Therefore, in this research we developed a modified differential evolution (DE) acquisition function for improved host-pathogen protein-protein interaction prediction. The modified DE introduces a weighted and adaptive donor vector technique that selects the best-fitted donor vectors as opposed to the random approach. This modified optimization approach was implemented in a deep forest model for automatic hyperparameter optimization. The performance of the optimized deep forest model was evaluated on human- protein sequence datasets using 10-fold cross-validation. The results were compared with standard optimization methods such as traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and other machine learning models. The optimized model achieved an accuracy of 89.3 %, outperforming other models across all metrics, including a sensitivity of 85.4 % and a precision of 91.6 %. Additionally, the optimized model predicted seven novel host-pathogen interactions. Finally, the model was implemented as a web application which is accessible at http://dfh3pi.covenantuniversity.edu.ng.
深度森林采用森林结构并利用深度架构来自适应地学习特征向量信息。然而,基于深度森林的模型存在局限性,如手动超参数优化以及时间和内存使用效率低下等问题。贝叶斯优化是一种广泛使用的基于模型的超参数优化方法。最近引入了诸如差分进化(DE)等进化算法来改进贝叶斯优化的采集函数。尽管其有效,但DE有一个显著缺点,即它依赖于从目标向量群体中随机选择索引来构建供体向量以寻找最优解。这种随机性是无效的,因为可能会选择次优或冗余索引。因此,在本研究中,我们开发了一种改进的差分进化(DE)采集函数,以改进宿主 - 病原体蛋白质 - 蛋白质相互作用预测。改进后的DE引入了加权和自适应供体向量技术,与随机方法不同,它选择最适合的供体向量。这种改进的优化方法在深度森林模型中实现,用于自动超参数优化。使用10折交叉验证在人类蛋白质序列数据集上评估优化后的深度森林模型的性能。将结果与传统贝叶斯优化、遗传算法、进化策略等标准优化方法以及其他机器学习模型进行比较。优化后的模型准确率达到89.3%,在所有指标上均优于其他模型,包括85.4%的灵敏度和91.6%的精确率。此外,优化后的模型预测了七种新的宿主 - 病原体相互作用。最后,该模型被实现为一个网络应用程序,可通过http://dfh3pi.covenantuniversity.edu.ng访问。