Sun Zhiqi, Huo Donghui, Guo Jiangyu, Yan Aixia
State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, P.O. Box 53, 15 BeiSanHuan East Road, Beijing 100029, China.
ACS Omega. 2025 Mar 14;10(11):11176-11187. doi: 10.1021/acsomega.4c10464. eCollection 2025 Mar 25.
The fourth-generation EGFR inhibitors targeting L858R/T790M/C797S mutations are in clinical trials mostly, and it is necessary to develop new inhibitors. In this study, an internal data set containing 2302 multitarget EGFR inhibitors targeting the wild type (83%) and the L858R (92%), L858R/T790M (96%), and L858R/T790M/C797S (60%) mutations was collected. We established a structure-activity relationship model for predicting the bioactivities of multigeneration EGFR inhibitors by a multitask deep neural network (MT-DNN). We also constructed four single-task models on 1384 L858R/T790M/C797S (60%) mutation inhibitors by support vector machine (SVM), random forest (RF), XGBoost (XGB), and single-target neural network (ST-DNN), respectively. The MT-DNN model significantly outperformed single-task models on the external data set of 304 fourth-generation EGFR inhibitors. Furthermore, the combined application of MT-DNN and SHAP/delta-SHAP value interpretability analysis offers rigorous structural information from a global perspective. With SHAP/delta-SHAP methods, the MT-DNN model can mine the core scaffold and important fragments of multigeneration EGFR inhibitors and provide valuable information from a structure-activity relationship perspective to address the resistant mutation problem.
针对L858R/T790M/C797S突变的第四代表皮生长因子受体(EGFR)抑制剂大多处于临床试验阶段,因此有必要开发新的抑制剂。在本研究中,收集了一个内部数据集,其中包含2302种多靶点EGFR抑制剂,这些抑制剂针对野生型(83%)、L858R(92%)、L858R/T790M(96%)和L858R/T790M/C797S(60%)突变。我们通过多任务深度神经网络(MT-DNN)建立了一个结构-活性关系模型,用于预测多代EGFR抑制剂的生物活性。我们还分别通过支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost,XGB)和单靶点神经网络(ST-DNN),在1384种L858R/T790M/C797S(60%)突变抑制剂上构建了四个单任务模型。在304种第四代EGFR抑制剂的外部数据集上,MT-DNN模型显著优于单任务模型。此外,MT-DNN与SHAP/δ-SHAP值可解释性分析的联合应用从全局角度提供了严谨的结构信息。借助SHAP/δ-SHAP方法,MT-DNN模型可以挖掘多代EGFR抑制剂的核心骨架和重要片段,并从结构-活性关系的角度提供有价值的信息,以解决耐药突变问题。