Pitakbut Thanet, Munkert Jennifer, Xi Wenhui, Wei Yanjie, Fuhrmann Gregor
Department of Biology, Pharmaceutical Biology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Staudtstr. 5, 91058, Erlangen, Germany.
Shenzhen Key Laboratory of Intelligent Bioinformatics and Center for High - Performance Computing, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
BMC Chem. 2024 Dec 20;18(1):249. doi: 10.1186/s13065-024-01324-x.
In virtual drug screening, consensus docking is a standard in-silico approach consisting of a combined result from optimized docking experiments, a minimum of two results combination. Therefore, consensus docking is subjected to a lower success rate than the best docking method due to its mathematical nature, an unavoidable limitation. This study aims to overcome this drawback via random forest, an ensemble machine learning model. First, in vitro beta-lactamase inhibitory screening was performed using an in-house chemical library. The in vitro results were later used as a validation. Consequently, we optimized docking protocols for AutoDock Vina and DOCK6 programs. With an appropriate scoring function, we found that DOCK6 could identify up to 70% of all active molecules, double the inappropriate. Further consensus analysis reduced the success rate to 50%. Simultaneously, a false positive rate was down to 16%, which was experimentally favorable for a drug search. Finally, we trained two quantitative structure-activity relationship (QSAR) models using logistic regression as a reference model and a random forest as a test model. After combining consensus docking results, random forest-based QSAR outperformed a logistic regression by restoring the success rate to 70% and maintaining a low false positive rate of around 21%. In conclusion, this study demonstrated the benefit of using a random forest (machine learning)-based QSAR model to overcome a standard consensus docking limitation in beta-lactamase inhibitor search as a proof-of-concept.
在虚拟药物筛选中,共识对接是一种标准的计算机模拟方法,它由优化对接实验的组合结果组成,至少是两个结果的组合。因此,由于其数学性质,共识对接的成功率低于最佳对接方法,这是一个不可避免的局限性。本研究旨在通过随机森林(一种集成机器学习模型)克服这一缺点。首先,使用内部化学文库进行体外β-内酰胺酶抑制筛选。体外实验结果随后用作验证。因此,我们优化了AutoDock Vina和DOCK6程序的对接协议。通过适当的评分函数,我们发现DOCK6可以识别高达70%的所有活性分子,是不合适分子的两倍。进一步的共识分析将成功率降低到50%。同时,假阳性率降至16%,这在实验上有利于药物搜索。最后,我们使用逻辑回归作为参考模型和随机森林作为测试模型训练了两个定量构效关系(QSAR)模型。在结合共识对接结果后,基于随机森林的QSAR通过将成功率恢复到70%并保持约21%的低假阳性率,优于逻辑回归。总之,本研究证明了使用基于随机森林(机器学习)的QSAR模型克服β-内酰胺酶抑制剂搜索中标准共识对接局限性的益处,作为概念验证。