Yagin Fatma Hilal, Gormez Yasin, Colak Cemil, Algarni Abdulmohsen, Al-Hashem Fahaid, Ardigò Luca Paolo
Department of Biostatistics, Faculty of Medicine, Malatya Turgut Ozal University, 44210 Malatya, Turkey.
Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Sivas Cumhuriyet University, 58140 Sivas, Turkey.
Pharmaceuticals (Basel). 2025 Jun 28;18(7):975. doi: 10.3390/ph18070975.
Dysregulated tyrosine kinase signaling is a central driver of tumorigenesis, metastasis, and therapeutic resistance. While tyrosine kinase inhibitors (TKIs) have revolutionized targeted cancer treatment, identifying compounds with optimal bioactivity remains a critical bottleneck. This study presents a robust machine learning framework-leveraging deep artificial neural networks (dANNs), convolutional neural networks (CNNs), and structural molecular fingerprints-to accurately predict TKI bioactivity, ultimately accelerating the preclinical phase of drug development. A curated dataset of 28,314 small molecules from the ChEMBL database targeting 11 tyrosine kinases was analyzed. Using Morgan fingerprints and physicochemical descriptors (e.g., molecular weight, LogP, hydrogen bonding), ten supervised models, including dANN, SVM, CatBoost, and CNN, were trained and optimized through a randomized hyperparameter search. Model performance was evaluated using F1-score, ROC-AUC, precision-recall curves, and log loss. SVM achieved the highest F1-score (87.9%) and accuracy (85.1%), while dANNs yielded the lowest log loss (0.25096), indicating superior probabilistic reliability. CatBoost excelled in ROC-AUC and precision-recall metrics. The integration of Morgan fingerprints significantly improved bioactivity prediction across all models by enhancing structural feature recognition. This work highlights the transformative role of machine learning-particularly dANNs and SVM-in rational drug discovery. By enabling accurate bioactivity prediction, our model pipeline can effectively reduce experimental burden, optimize compound selection, and support personalized cancer treatment design. The proposed framework advances kinase inhibitor screening pipelines and provides a scalable foundation for translational applications in precision oncology. By enabling early identification of bioactive compounds with favorable pharmacological profiles, the results of this study may support more efficient candidate selection for clinical drug development, particularly in regards to cancer therapy and kinase-associated disorders.
酪氨酸激酶信号失调是肿瘤发生、转移和治疗耐药性的核心驱动因素。虽然酪氨酸激酶抑制剂(TKIs)彻底改变了靶向癌症治疗,但识别具有最佳生物活性的化合物仍然是一个关键瓶颈。本研究提出了一个强大的机器学习框架,利用深度人工神经网络(dANNs)、卷积神经网络(CNNs)和结构分子指纹来准确预测TKI生物活性,最终加速药物开发的临床前阶段。分析了来自ChEMBL数据库的针对11种酪氨酸激酶的28314个小分子的精选数据集。使用摩根指纹和物理化学描述符(如分子量、LogP、氢键),通过随机超参数搜索训练和优化了包括dANN、支持向量机(SVM)、CatBoost和CNN在内的10个监督模型。使用F1分数、ROC-AUC、精确召回曲线和对数损失评估模型性能。SVM获得了最高的F1分数(87.9%)和准确率(85.1%),而dANNs产生了最低的对数损失(0.25096),表明具有卓越的概率可靠性。CatBoost在ROC-AUC和精确召回指标方面表现出色。摩根指纹的整合通过增强结构特征识别显著改善了所有模型的生物活性预测。这项工作突出了机器学习,特别是dANNs和SVM在合理药物发现中的变革性作用。通过实现准确的生物活性预测,我们的模型管道可以有效减轻实验负担,优化化合物选择,并支持个性化癌症治疗设计。所提出的框架推进了激酶抑制剂筛选管道,并为精准肿瘤学的转化应用提供了可扩展的基础。通过能够早期识别具有良好药理学特征的生物活性化合物,本研究结果可能支持临床药物开发中更高效的候选药物选择,特别是在癌症治疗和激酶相关疾病方面。