Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
J Mol Graph Model. 2023 Jul;122:108466. doi: 10.1016/j.jmgm.2023.108466. Epub 2023 Apr 7.
Kirsten rat sarcoma virus G12C (KRAS) is the major protein mutation associated with non-small cell lung cancer (NSCLC) severity. Inhibiting KRAS is therefore one of the key therapeutic strategies for NSCLC patients. In this paper, a cost-effective data driven drug design employing machine learning-based quantitative structure-activity relationship (QSAR) analysis was built for predicting ligand affinities against KRAS protein. A curated and non-redundant dataset of 1033 compounds with KRAS inhibitory activity (pIC) was used to build and test the models. The PubChem fingerprint, Substructure fingerprint, Substructure fingerprint count, and the conjoint fingerprint-a combination of PubChem fingerprint and Substructure fingerprint count-were used to train the models. Using comprehensive validation methods and various machine learning algorithms, the results clearly showed that the XGBoost regression (XGBoost) achieved the highest performance in term of goodness of fit, predictivity, generalizability and model robustness (R = 0.81, Q = 0.60, Q = 0.62, R - Q = 0.19, R = 0.31 ± 0.03, Q = -0.09 ± 0.04). The top 13 molecular fingerprints that correlated with the predicted pIC values were SubFPC274 (aromatic atoms), SubFPC307 (number of chiral-centers), PubChemFP37 (≥1 Chlorine), SubFPC18 (Number of alkylarylethers), SubFPC1 (number of primary carbons), SubFPC300 (number of 1,3-tautomerizables), PubChemFP621 (N-C:C:C:N structure), PubChemFP23 (≥1 Fluorine), SubFPC2 (number of secondary carbons), SubFPC295 (number of C-ONS bonds), PubChemFP199 (≥4 6-membered rings), PubChemFP180 (≥1 nitrogen-containing 6-membered ring), and SubFPC180 (number of tertiary amine). These molecular fingerprints were virtualized and validated using molecular docking experiments. In conclusion, this conjoint fingerprint and XGBoost-QSAR model demonstrated to be useful as a high-throughput screening tool for KRAS inhibitor identification and drug design.
克氏肉瘤病毒 G12C(KRAS)是与非小细胞肺癌(NSCLC)严重程度相关的主要蛋白突变。因此,抑制 KRAS 是 NSCLC 患者的关键治疗策略之一。在本文中,我们构建了一种具有成本效益的数据驱动药物设计,采用基于机器学习的定量构效关系(QSAR)分析,用于预测针对 KRAS 蛋白的配体亲和力。使用具有 KRAS 抑制活性(pIC)的 1033 种化合物的经过精心整理且无冗余的数据集来构建和测试模型。使用 PubChem 指纹、子结构指纹、子结构指纹计数以及联合指纹(PubChem 指纹和子结构指纹计数的组合)来训练模型。使用综合验证方法和各种机器学习算法,结果清楚地表明,在拟合度、预测性、通用性和模型稳健性方面,XGBoost 回归(XGBoost)表现最佳(R=0.81,Q=0.60,Q=0.62,R-Q=0.19,R=0.31±0.03,Q=-0.09±0.04)。与预测的 pIC 值相关的前 13 种分子指纹是 SubFPC274(芳环原子)、SubFPC307(手性中心数)、PubChemFP37(≥1 个氯原子)、SubFPC18(烷基芳基醚数)、SubFPC1(一级碳原子数)、SubFPC300(1,3-互变异构体数)、PubChemFP621(N-C:C:C:N 结构)、PubChemFP23(≥1 个氟原子)、SubFPC2(二级碳原子数)、SubFPC295(C-ONS 键数)、PubChemFP199(≥4 个六元环)、PubChemFP180(≥1 个含氮六元环)和 SubFPC180(叔胺数)。这些分子指纹使用分子对接实验进行了可视化和验证。总之,该联合指纹和 XGBoost-QSAR 模型可作为 KRAS 抑制剂鉴定和药物设计的高通量筛选工具。