Department of Pharmacology, University of California, Davis, California 95616, United States.
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, United States.
J Chem Inf Model. 2022 May 23;62(10):2301-2315. doi: 10.1021/acs.jcim.1c01510. Epub 2022 Apr 21.
The identification of promising lead compounds showing pharmacological activities toward a biological target is essential in early stage drug discovery. With the recent increase in available small-molecule databases, virtual high-throughput screening using physics-based molecular docking has emerged as an essential tool in assisting fast and cost-efficient lead discovery and optimization. However, the best scored docking poses are often suboptimal, resulting in incorrect screening and chemical property calculation. We address the pose classification problem by leveraging data-driven machine learning approaches to identify correct docking poses from AutoDock Vina and Glide screens. To enable effective classification of docking poses, we present two convolutional neural network approaches: a three-dimensional convolutional neural network (3D-CNN) and an attention-based point cloud network (PCN) trained on the PDBbind set. We demonstrate the effectiveness of our proposed classifiers on multiple evaluation data sets including the standard PDBbind CASF-2016 benchmark data set and various compound libraries with structurally different protein targets including an ion channel data set extracted from Protein Data Bank (PDB) and an in-house KCa3.1 inhibitor data set. Our experiments show that excluding false positive docking poses using the proposed classifiers improves virtual high-throughput screening to identify novel molecules against each target protein compared to the initial screen based on the docking scores.
鉴定具有针对生物靶标药理活性的有前途的先导化合物是早期药物发现的关键。随着小分子数据库的不断增加,基于物理的分子对接虚拟高通量筛选已成为协助快速、经济高效地发现和优化先导化合物的重要工具。然而,得分最高的对接构象往往并不理想,导致筛选和化学性质计算错误。我们通过利用数据驱动的机器学习方法来解决构象分类问题,从 AutoDock Vina 和 Glide 筛选中识别正确的对接构象。为了有效地对对接构象进行分类,我们提出了两种卷积神经网络方法:基于 PDBbind 数据集训练的三维卷积神经网络(3D-CNN)和基于注意力的点云网络(PCN)。我们在多个评估数据集上展示了我们提出的分类器的有效性,包括标准的 PDBbind CASF-2016 基准数据集和各种结构不同的蛋白质靶标化合物库,包括从蛋白质数据库(PDB)中提取的离子通道数据集和内部的 KCa3.1 抑制剂数据集。我们的实验表明,与基于对接分数的初始筛选相比,使用所提出的分类器排除错误的阳性对接构象可提高虚拟高通量筛选以识别针对每个靶蛋白的新型分子的能力。