利用机器学习并通过改进数据平衡和特征工程来预测药物-靶点相互作用。

Predicting drug-target interactions using machine learning with improved data balancing and feature engineering.

作者信息

Talukder Md Alamin, Kazi Mohsin, Alazab Ammar

机构信息

Department of Computer Science and Engineering, International University of Business Agriculture and Technology, Dhaka, Bangladesh.

Department of Pharmacognosy, College of Pharmacy, King Saud University, P.O. BOX-2457, Riyadh, 11451, Saudi Arabia.

出版信息

Sci Rep. 2025 Jun 3;15(1):19495. doi: 10.1038/s41598-025-03932-6.

DOI:10.1038/s41598-025-03932-6

PMID:40461636

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12134243/

Abstract

Drug-Target Interaction (DTI) prediction is a vital task in drug discovery, yet it faces significant challenges such as data imbalance and the complexity of biochemical representations. This study makes several contributions to address these issues, introducing a novel hybrid framework that combines advanced machine learning (ML) and deep learning (DL) techniques. The framework leverages comprehensive feature engineering, utilizing MACCS keys to extract structural drug features and amino acid/dipeptide compositions to represent target biomolecular properties. This dual feature extraction method enables a deeper understanding of chemical and biological interactions, enhancing predictive accuracy. To address data imbalance, Generative Adversarial Networks (GANs) are employed to create synthetic data for the minority class, effectively reducing false negatives and improving the sensitivity of the predictive model. The Random Forest Classifier (RFC) is utilized to make precise DTI predictions, optimized for handling high-dimensional data. The proposed framework's scalability and robustness were validated across diverse datasets, including BindingDB-Kd, BindingDB-Ki, and BindingDB-IC50. For the BindingDB-Kd dataset, the GAN+RFC model achieved remarkable performance metrics: accuracy of 97.46%, precision of 97.49%, sensitivity of 97.46%, specificity of 98.82%, F1-score of 97.46%, and ROC-AUC of 99.42%. Similarly, for the BindingDB-Ki dataset, the model attained an accuracy of 91.69%, precision of 91.74%, sensitivity of 91.69%, specificity of 93.40%, F1-score of 91.69%, and ROC-AUC of 97.32%. On the BindingDB-IC50 dataset, the model achieved an accuracy of 95.40%, precision of 95.41%, sensitivity of 95.40%, specificity of 96.42%, F1-score of 95.39%, and ROC-AUC of 98.97%. These results demonstrate the efficacy of the GAN-based approach in capturing complex patterns, significantly improving DTI prediction outcomes. In conclusion, the proposed GAN-based hybrid framework sets a new benchmark in computational drug discovery by addressing critical challenges in DTI prediction. Its robust performance, scalability, and generalizability contribute substantially to therapeutic development and pharmaceutical research.

摘要

药物-靶点相互作用（DTI）预测是药物研发中的一项重要任务，但它面临着诸如数据不平衡和生化表征复杂性等重大挑战。本研究为解决这些问题做出了多项贡献，引入了一种结合先进机器学习（ML）和深度学习（DL）技术的新型混合框架。该框架利用了全面的特征工程，使用MACCS键提取药物结构特征，并使用氨基酸/二肽组成来表示靶点生物分子特性。这种双重特征提取方法能够更深入地理解化学和生物相互作用，提高预测准确性。为了解决数据不平衡问题，采用生成对抗网络（GAN）为少数类创建合成数据，有效减少假阴性并提高预测模型的敏感性。随机森林分类器（RFC）用于进行精确的DTI预测，并针对处理高维数据进行了优化。所提出框架的可扩展性和鲁棒性在包括BindingDB-Kd、BindingDB-Ki和BindingDB-IC50在内的各种数据集上得到了验证。对于BindingDB-Kd数据集，GAN+RFC模型取得了显著的性能指标：准确率为97.46%，精确率为97.49%，敏感性为97.46%，特异性为98.82%，F1分数为97.46%，ROC-AUC为99.42%。同样，对于BindingDB-Ki数据集，该模型的准确率为91.69%，精确率为91.74%，敏感性为91.69%，特异性为93.40%，F1分数为91.69%，ROC-AUC为97.32%。在BindingDB-IC50数据集上，该模型的准确率为95.40%，精确率为95.41%，敏感性为95.40%，特异性为96.42%，F1分数为95.39%，ROC-AUC为98.97%。这些结果证明了基于GAN的方法在捕捉复杂模式方面的有效性，显著改善了DTI预测结果。总之,所提出的基于GAN的混合框架通过解决DTI预测中的关键挑战，在计算药物研发中树立了新的标杆。其强大的性能、可扩展性和通用性对治疗开发和药物研究做出了重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4263/12134243/e40665662296/41598_2025_3932_Fig1_HTML.jpg

相似文献

Predicting drug-target interactions using machine learning with improved data balancing and feature engineering.

Sci Rep. 2025 Jun 3;15(1):19495. doi: 10.1038/s41598-025-03932-6.

A medical image classification method based on self-regularized adversarial learning.

Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing.

BMC Med Inform Decis Mak. 2025 Jul 10;25(1):257. doi: 10.1186/s12911-025-03094-5.

Attention-driven hybrid deep learning and SVM model for early Alzheimer's diagnosis using neuroimaging fusion.

BMC Med Inform Decis Mak. 2025 Jul 1;25(1):219. doi: 10.1186/s12911-025-03073-w.

GAN-enhanced deep learning for improved Alzheimer's disease classification and longitudinal brain change analysis.

Front Med (Lausanne). 2025 Jun 17;12:1587026. doi: 10.3389/fmed.2025.1587026. eCollection 2025.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

AI-driven pharmacovigilance: Enhancing adverse drug reaction detection with deep learning and NLP.

MethodsX. 2025 Jun 23;15:103460. doi: 10.1016/j.mex.2025.103460. eCollection 2025 Dec.

FLPneXAINet: Federated deep learning and explainable AI for improved pneumonia prediction utilizing GAN-augmented chest X-ray data.

PLoS One. 2025 Jul 17;20(7):e0324957. doi: 10.1371/journal.pone.0324957. eCollection 2025.

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.

JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.

本文引用的文献

A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction.

Sci Rep. 2025 Feb 7;15(1):4617. doi: 10.1038/s41598-025-87028-1.

Drivers of innovation value: simulation for new drug pricing evaluation based on system dynamics modelling.

Front Pharmacol. 2025 Jan 22;16:1474856. doi: 10.3389/fphar.2025.1474856. eCollection 2025.

Barlow Twins deep neural network for advanced 1D drug-target interaction prediction.

J Cheminform. 2025 Feb 5;17(1):18. doi: 10.1186/s13321-025-00952-2.

Artificial intelligence streamlines scientific discovery of drug-target interactions.

Br J Pharmacol. 2025 Jan 22. doi: 10.1111/bph.17427.

Artificial Intelligence in Computer-Aided Drug Design (CADD) Tools for the Finding of Potent Biologically Active Small Molecules: Traditional to Modern Approach.

Comb Chem High Throughput Screen. 2025 Jan 15. doi: 10.2174/0113862073334062241015043343.

A comprehensive survey of drug-target interaction analysis in allopathy and siddha medicine.

Artif Intell Med. 2024 Nov;157:102986. doi: 10.1016/j.artmed.2024.102986. Epub 2024 Sep 23.

Pharmacogenomics: A Genetic Approach to Drug Development and Therapy.

Pharmaceuticals (Basel). 2024 Jul 13;17(7):940. doi: 10.3390/ph17070940.

Estimating the volume of penumbra in rodents using DTI and stack-based ensemble machine learning framework.

Eur Radiol Exp. 2024 May 15;8(1):59. doi: 10.1186/s41747-024-00455-z.

Advancing Precision Medicine: A Review of Innovative In Silico Approaches for Drug Development, Clinical Pharmacology and Personalized Healthcare.

Pharmaceutics. 2024 Feb 27;16(3):332. doi: 10.3390/pharmaceutics16030332.

Comprehensive Review on Drug-target Interaction Prediction - Latest Developments and Overview.

Curr Drug Discov Technol. 2024;21(2):e010923220652. doi: 10.2174/1570163820666230901160043.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用机器学习并通过改进数据平衡和特征工程来预测药物-靶点相互作用。

Predicting drug-target interactions using machine learning with improved data balancing and feature engineering.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献