Talukder Md Alamin, Kazi Mohsin, Alazab Ammar
Department of Computer Science and Engineering, International University of Business Agriculture and Technology, Dhaka, Bangladesh.
Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. BOX-2457, Riyadh, 11451, Saudi Arabia.
Sci Rep. 2025 Jun 3;15(1):19495. doi: 10.1038/s41598-025-03932-6.
Drug-Target Interaction (DTI) prediction is a vital task in drug discovery, yet it faces significant challenges such as data imbalance and the complexity of biochemical representations. This study makes several contributions to address these issues, introducing a novel hybrid framework that combines advanced machine learning (ML) and deep learning (DL) techniques. The framework leverages comprehensive feature engineering, utilizing MACCS keys to extract structural drug features and amino acid/dipeptide compositions to represent target biomolecular properties. This dual feature extraction method enables a deeper understanding of chemical and biological interactions, enhancing predictive accuracy. To address data imbalance, Generative Adversarial Networks (GANs) are employed to create synthetic data for the minority class, effectively reducing false negatives and improving the sensitivity of the predictive model. The Random Forest Classifier (RFC) is utilized to make precise DTI predictions, optimized for handling high-dimensional data. The proposed framework's scalability and robustness were validated across diverse datasets, including BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50. For the BindingDB-Kd dataset, the GAN+RFC model achieved remarkable performance metrics: accuracy of 97.46%, precision of 97.49%, sensitivity of 97.46%, specificity of 98.82%, F1-score of 97.46%, and ROC-AUC of 99.42%. Similarly, for the BindingDB-Ki dataset, the model attained an accuracy of 91.69%, precision of 91.74%, sensitivity of 91.69%, specificity of 93.40%, F1-score of 91.69%, and ROC-AUC of 97.32%. On the BindingDB-IC50 dataset, the model achieved an accuracy of 95.40%, precision of 95.41%, sensitivity of 95.40%, specificity of 96.42%, F1-score of 95.39%, and ROC-AUC of 98.97%. These results demonstrate the efficacy of the GAN-based approach in capturing complex patterns, significantly improving DTI prediction outcomes. In conclusion, the proposed GAN-based hybrid framework sets a new benchmark in computational drug discovery by addressing critical challenges in DTI prediction. Its robust performance, scalability, and generalizability contribute substantially to therapeutic development and pharmaceutical research.
药物-靶点相互作用(DTI)预测是药物研发中的一项重要任务,但它面临着诸如数据不平衡和生化表征复杂性等重大挑战。本研究为解决这些问题做出了多项贡献,引入了一种结合先进机器学习(ML)和深度学习(DL)技术的新型混合框架。该框架利用了全面的特征工程,使用MACCS键提取药物结构特征,并使用氨基酸/二肽组成来表示靶点生物分子特性。这种双重特征提取方法能够更深入地理解化学和生物相互作用,提高预测准确性。为了解决数据不平衡问题,采用生成对抗网络(GAN)为少数类创建合成数据,有效减少假阴性并提高预测模型的敏感性。随机森林分类器(RFC)用于进行精确的DTI预测,并针对处理高维数据进行了优化。所提出框架的可扩展性和鲁棒性在包括BindingDB-Kd、BindingDB-Ki和BindingDB-IC50在内的各种数据集上得到了验证。对于BindingDB-Kd数据集,GAN+RFC模型取得了显著的性能指标:准确率为97.46%,精确率为97.49%,敏感性为97.46%,特异性为98.82%,F1分数为97.46%,ROC-AUC为99.42%。同样,对于BindingDB-Ki数据集,该模型的准确率为91.69%,精确率为91.74%,敏感性为91.69%,特异性为93.40%,F1分数为91.69%,ROC-AUC为97.32%。在BindingDB-IC50数据集上,该模型的准确率为95.40%,精确率为95.41%,敏感性为95.40%,特异性为96.42%,F1分数为95.39%,ROC-AUC为98.97%。这些结果证明了基于GAN的方法在捕捉复杂模式方面的有效性,显著改善了DTI预测结果。总之,所提出的基于GAN的混合框架通过解决DTI预测中的关键挑战,在计算药物研发中树立了新的标杆。其强大的性能、可扩展性和通用性对治疗开发和药物研究做出了重大贡献。