Division of Medicine, University College London, Gower Street, LondonWC1E 6BT, U.K.
J Chem Inf Model. 2022 Nov 28;62(22):5383-5396. doi: 10.1021/acs.jcim.2c00832. Epub 2022 Nov 7.
The analysis and comparison of protein-binding sites aid various applications in the drug discovery process, e.g., hit finding, drug repurposing, and polypharmacology. Classification of binding sites has been a hot topic for the past 30 years, and many different methods have been published. The rapid development of machine learning computational algorithms, coupled with the large volume of publicly available protein-ligand 3D structures, makes it possible to apply deep learning techniques in binding site comparison. Our method uses a cutting-edge spherical convolutional neural network based on the DeepSphere architecture to learn global representations of protein-binding sites. The model was trained on TOUGH-C1 and TOUGH-M1 data and validated with the ProSPECCTs datasets. Our results show that our model can (1) perform well in protein-binding site similarity and classification tasks and (2) learn and separate the physicochemical properties of binding sites. Lastly, we tested the model on a set of kinases, where the results show that it is able to cluster the different kinase subfamilies effectively. This example demonstrates the method's promise for lead hopping within or outside a protein target, directly based on binding site information.
蛋白质结合位点的分析和比较有助于药物发现过程中的各种应用,例如命中发现、药物再利用和多药理学。结合位点的分类是过去 30 年来的热门话题,已经发表了许多不同的方法。机器学习计算算法的快速发展,加上大量公开可用的蛋白质 - 配体 3D 结构,使得可以将深度学习技术应用于结合位点比较。我们的方法使用基于 DeepSphere 架构的前沿球形卷积神经网络来学习蛋白质结合位点的全局表示。该模型在 TOUGH-C1 和 TOUGH-M1 数据上进行了训练,并使用 ProSPECCTs 数据集进行了验证。我们的结果表明,我们的模型可以:(1)在蛋白质结合位点相似性和分类任务中表现良好;(2)学习和分离结合位点的物理化学性质。最后,我们在一组激酶上测试了该模型,结果表明它能够有效地对不同的激酶亚家族进行聚类。这个例子证明了该方法在基于结合位点信息在蛋白质靶标内或外进行先导跳跃的潜力。