Department of Electrical and Computer Engineering, Michigan State University , East Lansing, Michigan 48824-1226, United States.
J Chem Inf Model. 2018 Jan 22;58(1):119-133. doi: 10.1021/acs.jcim.7b00309. Epub 2017 Dec 20.
Molecular docking, scoring, and virtual screening play an increasingly important role in computer-aided drug discovery. Scoring functions (SFs) are typically employed to predict the binding conformation (docking task), binding affinity (scoring task), and binary activity level (screening task) of ligands against a critical protein target in a disease's pathway. In most molecular docking software packages available today, a generic binding affinity-based (BA-based) SF is invoked for all three tasks to solve three different, but related, prediction problems. The limited predictive accuracies of such SFs in these three tasks has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we develop BT-Score, an ensemble machine-learning (ML) SF of boosted decision trees and thousands of predictive descriptors to estimate BA. BT-Score reproduced BA of out-of-sample test complexes with correlation of 0.825. Even with this high accuracy in the scoring task, we demonstrate that the docking and screening performance of BT-Score and other BA-based SFs is far from ideal. This has motivated us to build two task-specific ML SFs for the docking and screening problems. We propose BT-Dock, a boosted-tree ensemble model trained on a large number of native and computer-generated ligand conformations and optimized to predict binding poses explicitly. This model has shown an average improvement of 25% over its BA-based counterparts in different ligand pose prediction scenarios. Similar improvement has also been obtained by our screening-based SF, BT-Screen, which directly models the ligand activity labeling task as a classification problem. BT-Screen is trained on thousands of active and inactive protein-ligand complexes to optimize it for finding real actives from databases of ligands not seen in its training set. In addition to the three task-specific SFs, we propose a novel multi-task deep neural network (MT-Net) that is trained on data from the three tasks to simultaneously predict binding poses, affinities, and activity levels. We show that the performance of MT-Net is superior to conventional SFs and on a par with or better than models based on single-task neural networks.
分子对接、评分和虚拟筛选在计算机辅助药物发现中发挥着越来越重要的作用。评分函数(SF)通常用于预测配体与疾病途径中关键蛋白靶标的结合构象(对接任务)、结合亲和力(评分任务)和二元活性水平(筛选任务)。在当今可用的大多数分子对接软件包中,针对所有三个任务调用通用基于结合亲和力的(BA 基)SF,以解决三个不同但相关的预测问题。在这三个任务中,此类 SF 的有限预测准确性一直是实现具有成本效益的药物发现的主要障碍。因此,在这项工作中,我们开发了 BT-Score,这是一种基于集成机器学习(ML)的决策树和数千个预测描述符的增强型 SF,用于估计 BA。BT-Score 对样本外测试复合物的 BA 进行了重现,相关系数为 0.825。即使在评分任务中具有如此高的准确性,我们也证明了 BT-Score 和其他 BA 基 SF 的对接和筛选性能远非理想。这促使我们为对接和筛选问题构建了两个特定于任务的 ML SF。我们提出了 BT-Dock,这是一种基于大量天然和计算机生成的配体构象的增强树集成模型,经过优化可明确预测结合构象。在不同的配体构象预测场景中,该模型与基于 BA 的对应模型相比平均提高了 25%。我们的基于筛选的 SF BT-Screen 也取得了类似的改进,该模型直接将配体活性标记任务建模为分类问题。BT-Screen 在数千个活性和非活性的蛋白质-配体复合物上进行训练,以优化其从其训练集中未见过的配体数据库中找到真实活性的能力。除了这三个特定于任务的 SF 之外,我们还提出了一种新颖的多任务深度神经网络(MT-Net),该网络基于三个任务的数据进行训练,以同时预测结合构象、亲和力和活性水平。我们表明,MT-Net 的性能优于传统 SF,并且与基于单任务神经网络的模型相当或更好。