Jaradat Nour Jamal, Alshaer Walhan, Hatmal Mamon, Taha Mutasem Omar
Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan Amman 11492 Jordan
Cell Therapy Center, The University of Jordan Amman 11942 Jordan.
RSC Adv. 2023 Feb 3;13(7):4623-4640. doi: 10.1039/d2ra07007c. eCollection 2023 Jan 31.
STAT3 belongs to a family of seven vital transcription factors. High levels of STAT3 are detected in several types of cancer. Hence, STAT3 inhibition is considered a promising therapeutic anti-cancer strategy. In this work, we used multiple docked poses of STAT3 inhibitors to augment training data for machine learning QSAR modeling. Ligand-Receptor Contact Fingerprints and scoring values were implemented as descriptor variables. Escalating docking-scoring consensus levels were scanned against orthogonal machine learners, and the best learners (Random Forests and XGBoost) were coupled with genetic algorithm and Shapley additive explanations (SHAP) to identify critical descriptors that determine anti-STAT3 bioactivity to be translated into pharmacophore model(s). Two successful pharmacophores were deduced and subsequently used for screening against the National Cancer Institute (NCI) database. A total of 26 hits were evaluated for their anti-STAT3 bioactivities. Out of which, three hits of novel chemotypes, showed cytotoxic IC values in the nanomolar range (35 nM to 6.7 μM). However, two are potent dihydrofolate reductase (DHFR) inhibitors and therefore should have significant indirect STAT3 inhibitory effects. The third hit (cytotoxic IC = 0.44 μM) is purely direct STAT3 inhibitor (devoid of DHFR activity) and caused, at its cytotoxic IC, more than two-fold reduction in the expression of STAT3 downstream genes (c-Myc and Bcl-xL). The presented work indicates that the concept of data augmentation using multiple docked poses is a promising strategy for generating valid machine learning models capable of discriminating active from inactive compounds.
信号转导和转录激活因子3(STAT3)属于由七个重要转录因子组成的家族。在几种类型的癌症中都检测到高水平的STAT3。因此,抑制STAT3被认为是一种有前景的抗癌治疗策略。在这项工作中,我们使用了STAT3抑制剂的多个对接构象来增加用于机器学习定量构效关系(QSAR)建模的训练数据。配体-受体接触指纹和评分值被用作描述符变量。针对正交机器学习算法扫描逐步提高的对接评分共识水平,并将最佳的学习算法(随机森林和极端梯度提升)与遗传算法和夏普利值加性解释(SHAP)相结合,以识别决定抗STAT3生物活性的关键描述符,并将其转化为药效团模型。推导出了两个成功的药效团,随后用于对美国国立癌症研究所(NCI)数据库进行筛选。共评估了26个命中化合物的抗STAT3生物活性。其中,三种新化学类型的命中化合物显示出纳摩尔范围内(35 nM至6.7 μM)的细胞毒性IC值。然而,其中两种是有效的二氢叶酸还原酶(DHFR)抑制剂,因此应该具有显著的间接STAT3抑制作用。第三个命中化合物(细胞毒性IC = 0.44 μM)是纯粹的直接STAT3抑制剂(无DHFR活性),在其细胞毒性IC浓度下,导致STAT3下游基因(c-Myc和Bcl-xL)的表达降低了两倍多。所展示的工作表明,使用多个对接构象进行数据增强的概念是生成能够区分活性和非活性化合物的有效机器学习模型的一种有前景的策略。