State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, University of Chemical Technology, Beijing, People's Republic of China.
Mol Divers. 2022 Jun;26(3):1715-1730. doi: 10.1007/s11030-021-10300-9. Epub 2021 Oct 12.
Epidermal growth factor receptor (EGFR) has received widespread attention because it is an important target for anticancer drug design. Mutations in the EGFR, especially the T790M/L858R double mutation, have made cancer treatment more difficult. We herein built the structure-activity relationship models of small-molecule inhibitors on wild-type and T790M/L858R double-mutant EGFR with a whole dataset of 379 compounds. For 2D classification models, we used ECFP4 fingerprints to build support vector machine and random forest models and used SMILES to build self-attention recurrent neural network models. Each of all six models resulted in an accuracy of above 0.87 and the Matthews correlation coefficient value of above 0.76 on the test set, respectively. We concluded that inhibitors containing anilinoquinoline and methoxy or fluoro phenyl are highly active against wild EGFR. Substructures such as anilinopyrimidine, acrylamide, amino phenyl, methoxy phenyl, and thienopyrimidinyl amide appeared more in highly active inhibitors against double-mutant EGFR. We also used self-organizing map to cluster the inhibitors into six subsets based on ECFP4 fingerprints and analyzed the activity characteristics of different scaffolds in each subset. Among them, three datasets, which are based on pteridin, anilinopyrimidine, and anilinoquinoline scaffold, were selected to build 3D comparative molecular similarity analysis models individually. Models with the leave-one-out coefficient of determination (q) above 0.65 were selected, and five descriptor types (steric, electrostatic, hydrophobic, donor, and acceptor) were used to study the effects of side chains of inhibitors on the activity against wild-type and mutant-type EGFR.
表皮生长因子受体 (EGFR) 是抗肿瘤药物设计的重要靶点,受到广泛关注。EGFR 的突变,特别是 T790M/L858R 双突变,使得癌症治疗更加困难。我们在此构建了小分子抑制剂对野生型和 T790M/L858R 双突变 EGFR 的构效关系模型,该模型使用了 379 个化合物的全数据集。对于 2D 分类模型,我们使用 ECFP4 指纹构建支持向量机和随机森林模型,并使用 SMILES 构建自注意力递归神经网络模型。所有六个模型在测试集上的准确率均高于 0.87,马氏相关系数值均高于 0.76。我们得出结论,含有苯胺喹啉和甲氧基或氟苯基的抑制剂对野生型 EGFR 具有高度活性。在针对双突变 EGFR 的高度活性抑制剂中,出现了更多的亚结构,如苯胺嘧啶、丙烯酰胺、氨基苯基、甲氧基苯基和噻吩嘧啶酰胺。我们还使用自组织映射根据 ECFP4 指纹将抑制剂聚类成六个子集,并分析每个子集不同支架的活性特征。其中,选择了三个基于蝶啶、苯胺嘧啶和苯胺喹啉支架的数据集,分别构建 3D 比较分子相似性分析模型。选择了Leave-one-out 决定系数 (q) 高于 0.65 的模型,并使用了五种描述符类型(立体、静电、疏水、供体和受体)来研究抑制剂侧链对野生型和突变型 EGFR 活性的影响。