Department of Natural and Applied Sciences, Duke Kunshan University, Kunshan, China.
Sci Rep. 2023 May 22;13(1):8219. doi: 10.1038/s41598-023-35132-5.
The present study investigates the use of algorithm selection for automatically choosing an algorithm for any given protein-ligand docking task. In drug discovery and design process, conceptualizing protein-ligand binding is a major problem. Targeting this problem through computational methods is beneficial in order to substantially reduce the resource and time requirements for the overall drug development process. One way of addressing protein-ligand docking is to model it as a search and optimization problem. There have been a variety of algorithmic solutions in this respect. However, there is no ultimate algorithm that can efficiently tackle this problem, both in terms of protein-ligand docking quality and speed. This argument motivates devising new algorithms, tailored to the particular protein-ligand docking scenarios. To this end, this paper reports a machine learning-based approach for improved and robust docking performance. The proposed set-up is fully automated, operating without any expert opinion or involvement both on the problem and algorithm aspects. As a case study, an empirical analysis was performed on a well-known protein, Human Angiotensin-Converting Enzyme (ACE), with 1428 ligands. For general applicability, AutoDock 4.2 was used as the docking platform. The candidate algorithms are also taken from AutoDock 4.2. Twenty-eight distinctly configured Lamarckian-Genetic Algorithm (LGA) are chosen to build an algorithm set. ALORS which is a recommender system-based algorithm selection system was preferred for automating the selection from those LGA variants on a per-instance basis. For realizing this selection automation, molecular descriptors and substructure fingerprints were employed as the features characterizing each target protein-ligand docking instance. The computational results revealed that algorithm selection outperforms all those candidate algorithms. Further assessment is reported on the algorithms space, discussing the contributions of LGA's parameters. As it pertains to protein-ligand docking, the contributions of the aforementioned features are examined, which shed light on the critical features affecting the docking performance.
本研究探讨了算法选择在自动选择给定蛋白质-配体对接任务的算法中的应用。在药物发现和设计过程中,概念化蛋白质-配体结合是一个主要问题。通过计算方法解决这个问题是有益的,因为它可以大大减少整个药物开发过程的资源和时间需求。解决蛋白质-配体对接问题的一种方法是将其建模为搜索和优化问题。在这方面已经有了各种算法解决方案。然而,目前还没有一种最终的算法可以在蛋白质-配体对接质量和速度方面有效地解决这个问题。这一论点促使人们设计新的算法,以适应特定的蛋白质-配体对接场景。为此,本文提出了一种基于机器学习的方法,以提高和增强对接性能。所提出的方法完全自动化,在问题和算法方面都不需要任何专家意见或参与。作为一个案例研究,对一种著名的蛋白质——人血管紧张素转换酶(ACE)及其 1428 种配体进行了实证分析。为了具有普遍适用性,使用 AutoDock 4.2 作为对接平台。候选算法也来自 AutoDock 4.2。选择了 28 种截然不同配置的 Lamarckian-Genetic Algorithm(LGA)来构建算法集。为了实现这种选择自动化,使用分子描述符和子结构指纹作为特征来表征每个目标蛋白质-配体对接实例。计算结果表明,算法选择优于所有候选算法。进一步评估了算法空间,讨论了 LGA 参数的贡献。就蛋白质-配体对接而言,检查了上述特征的贡献,这揭示了影响对接性能的关键特征。