IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA.
J Comput Aided Mol Des. 2022 May;36(5):391-404. doi: 10.1007/s10822-021-00421-6. Epub 2021 Nov 24.
We here present a streamlined, explainable graph convolutional neural network (gCNN) architecture for small molecule activity prediction. We first conduct a hyperparameter optimization across nearly 800 protein targets that produces a simplified gCNN QSAR architecture, and we observe that such a model can yield performance improvements over both standard gCNN and RF methods on difficult-to-classify test sets. Additionally, we discuss how reductions in convolutional layer dimensions potentially speak to the "anatomical" needs of gCNNs with respect to radial coarse graining of molecular substructure. We augment this simplified architecture with saliency map technology that highlights molecular substructures relevant to activity, and we perform saliency analysis on nearly 100 data-rich protein targets. We show that resultant substructural clusters are useful visualization tools for understanding substructure-activity relationships. We go on to highlight connections between our models' saliency predictions and observations made in the medicinal chemistry literature, focusing on four case studies of past lead finding and lead optimization campaigns.
我们在这里提出了一种简化的、可解释的图卷积神经网络(gCNN)架构,用于小分子活性预测。我们首先在近 800 个蛋白质靶标上进行了超参数优化,得到了一个简化的 gCNN-QSAR 架构,我们观察到,与标准的 gCNN 和 RF 方法相比,该模型在难以分类的测试集上可以提高性能。此外,我们讨论了卷积层维度的减少如何可能与 gCNN 相对于分子亚结构的径向粗粒化的“解剖”需求有关。我们使用显着性映射技术来增强这个简化的架构,突出与活性相关的分子亚结构,并对近 100 个数据丰富的蛋白质靶标进行显着性分析。我们表明,所得的亚结构聚类是用于理解亚结构-活性关系的有用可视化工具。我们继续强调我们的模型的显着性预测与药物化学文献中的观察结果之间的联系,重点介绍过去的先导发现和先导优化活动的四个案例研究。