Healthcare & Life Sciences Research, IBM TJ Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, New York 10598, United States.
J Chem Inf Model. 2020 Sep 28;60(9):4170-4179. doi: 10.1021/acs.jcim.9b00927. Epub 2020 Mar 3.
We present a simple, modular graph-based convolutional neural network that takes structural information from protein-ligand complexes as input to generate models for activity and binding mode prediction. Complex structures are generated by a standard docking procedure and fed into a dual-graph architecture that includes separate subnetworks for the ligand bonded topology and the ligand-protein contact map. Recent work has indicated that data set bias drives many past promising results derived from combining deep learning and docking. Our dual-graph network allows contributions from ligand identity that give rise to such biases to be distinguished from effects of protein-ligand interactions on classification. We show that our neural network is capable of learning from protein structural information when, as in the case of binding mode prediction, an unbiased data set is constructed. We next develop a deep learning model for binding mode prediction that uses docking ranking as input in combination with docking structures. This strategy mirrors past consensus models and outperforms a baseline docking program (AutoDock Vina) in a variety of tests, including on cross-docking data sets that mimic real-world docking use cases. Furthermore, the magnitudes of network predictions serve as reliable measures of model confidence.
我们提出了一种简单的、基于图的模块化卷积神经网络,它将蛋白质-配体复合物的结构信息作为输入,生成用于活性和结合模式预测的模型。复杂结构是通过标准对接程序生成的,并输入到一个双图架构中,该架构包括配体结合拓扑和配体-蛋白质接触图的独立子网。最近的研究表明,数据集偏差驱动了许多过去从深度学习和对接相结合中得出的有前途的结果。我们的双图网络允许区分导致这种偏差的配体身份的贡献与蛋白质-配体相互作用对分类的影响。当构建无偏数据集时,例如在结合模式预测的情况下,我们表明我们的神经网络能够从蛋白质结构信息中学习。接下来,我们开发了一种用于结合模式预测的深度学习模型,该模型将对接评分作为输入与对接结构结合使用。这种策略反映了过去的共识模型,并在各种测试中表现优于基线对接程序(AutoDock Vina),包括模拟真实对接用例的交叉对接数据集。此外,网络预测的幅度可以作为模型置信度的可靠度量。