Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States.
Department of Computer Science, Brigham Young University, Provo, Utah84602, United States.
J Chem Inf Model. 2022 Nov 28;62(22):5342-5350. doi: 10.1021/acs.jcim.2c00705. Epub 2022 Nov 7.
Molecular docking tools are regularly used to computationally identify new molecules in virtual screening for drug discovery. However, docking tools suffer from inaccurate scoring functions with widely varying performance on different proteins. To enable more accurate ranking of active over inactive ligands in virtual screening, we created a machine learning consensus docking tool, MILCDock, that uses predictions from five traditional molecular docking tools to predict the probability a ligand binds to a protein. MILCDock was trained and tested on data from both the DUD-E and LIT-PCBA docking datasets and shows improved performance over traditional molecular docking tools and other consensus docking methods on the DUD-E dataset. LIT-PCBA targets proved to be difficult for all methods tested. We also find that DUD-E data, although biased, can be effective in training machine learning tools if care is taken to avoid DUD-E's biases during training.
分子对接工具常用于药物发现的虚拟筛选中,以计算识别新分子。然而,对接工具的评分函数不准确,在不同的蛋白质上性能差异很大。为了在虚拟筛选中更准确地对活性配体和非活性配体进行排序,我们创建了一个机器学习共识对接工具 MILCDock,该工具使用来自五个传统分子对接工具的预测来预测配体与蛋白质结合的概率。MILCDock 在 DUD-E 和 LIT-PCBA 对接数据集上进行了训练和测试,在 DUD-E 数据集上的性能优于传统分子对接工具和其他共识对接方法。对于所有测试的方法来说,LIT-PCBA 靶点都被证明是困难的。我们还发现,尽管 DUD-E 数据存在偏差,但如果在训练过程中注意避免 DUD-E 的偏差,它仍然可以有效地用于训练机器学习工具。