Tian Yujia, Yang Zhenwu, Wang Hongzhao, Yan Aixia
Department of Pharmaceutical Engineering, State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing, People's Republic of China.
Chem Biol Drug Des. 2023 Jun;101(6):1307-1321. doi: 10.1111/cbdd.14214. Epub 2023 Feb 20.
There is a strong interest in the development of microsomal prostaglandin E2 synthase-1 (mPGES-1) inhibitors of their potential to safely and effectively treat inflammation. Herein, 70 QSAR models were built on the dataset (735 mPGES-1 inhibitors) characterized with RDKit descriptors by multiple linear regression (MLR), support vector machine (SVM), random forest (RF), deep neural networks (DNN), and eXtreme Gradient Boosting (XGBoost). The other three regression models on the dataset are represented by SMILES using self-attention recurrent neural networks (RNN) and Graph Convolutional Networks (GCN). For the best model (Model C2), which was developed by SVM with RDKit descriptors, the coefficient of determination (R ) of 0.861 and root mean squared error (RMSE) of 0.235 were achieved for the test set. Additionally, R of 0.692 and RMSE of 0.383 were obtained on the external test set. We investigated the applicability domain (AD) of Model C2 with the rivality index (RI), the prediction of Model C2 on 78.92% of molecules in the test set, and 78.33% of molecules in the external test set were reliable. After dissecting the RDKit descriptors of Model C2, we found important physicochemical properties of highly active mPGES-1 inhibitors. Besides, by analyzing the attention weight of each atom of each inhibitor from the attention layer, we found that the benzamide group and the trifluoromethyl cyclohexane group are favorable substructures for mPGES-1 inhibitors.
人们对开发微粒体前列腺素E2合酶-1(mPGES-1)抑制剂以安全有效地治疗炎症的潜力有着浓厚兴趣。在此,基于由RDKit描述符表征的数据集(735种mPGES-1抑制剂),通过多元线性回归(MLR)、支持向量机(SVM)、随机森林(RF)、深度神经网络(DNN)和极端梯度提升(XGBoost)构建了70个定量构效关系(QSAR)模型。该数据集上的其他三个回归模型由使用自注意力循环神经网络(RNN)和图卷积网络(GCN)的SMILES表示。对于由带有RDKit描述符的SVM开发的最佳模型(模型C2),测试集的决定系数(R)为0.861,均方根误差(RMSE)为0.235。此外,在外部测试集上获得的R为0.692,RMSE为0.383。我们用竞争指数(RI)研究了模型C2的适用域(AD),模型C2对测试集中78.92%的分子以及外部测试集中78.33%的分子的预测是可靠的。在剖析模型C2的RDKit描述符后,我们发现了高活性mPGES-1抑制剂的重要物理化学性质。此外,通过分析注意力层中每种抑制剂每个原子的注意力权重,我们发现苯甲酰胺基团和三氟甲基环己烷基团是mPGES-1抑制剂的有利子结构。