机器学习和深度学习模型中用于药物-靶标相互作用预测的重采样技术的比较研究。

Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.

机构信息

School of Computer Sciences, Universiti Sains Malaysia, Pulau Pinang 11800, Malaysia.

出版信息

Molecules. 2023 Feb 9;28(4):1663. doi: 10.3390/molecules28041663.

DOI:10.3390/molecules28041663

PMID:36838652

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9964614/

Abstract

The prediction of drug-target interactions (DTIs) is a vital step in drug discovery. The success of machine learning and deep learning methods in accurately predicting DTIs plays a huge role in drug discovery. However, when dealing with learning algorithms, the datasets used are usually highly dimensional and extremely imbalanced. To solve this issue, the dataset must be resampled accordingly. In this paper, we have compared several data resampling techniques to overcome class imbalance in machine learning methods as well as to study the effectiveness of deep learning methods in overcoming class imbalance in DTI prediction in terms of binary classification using ten (10) cancer-related activity classes from BindingDB. It is found that the use of Random Undersampling (RUS) in predicting DTIs severely affects the performance of a model, especially when the dataset is highly imbalanced, thus, rendering RUS unreliable. It is also found that SVM-SMOTE can be used as a go-to resampling method when paired with the Random Forest and Gaussian Naïve Bayes classifiers, whereby a high F1 score is recorded for all activity classes that are severely and moderately imbalanced. Additionally, the deep learning method called Multilayer Perceptron recorded high F1 scores for all activity classes even when no resampling method was applied.

摘要

药物-靶点相互作用（DTI）的预测是药物发现的重要步骤。机器学习和深度学习方法在准确预测 DTI 方面的成功在药物发现中起着巨大的作用。然而，在处理学习算法时，所使用的数据集通常是高度多维的且极其不平衡的。为了解决这个问题，必须相应地对数据集进行重采样。在本文中，我们比较了几种数据重采样技术，以克服机器学习方法中的类别不平衡问题，并研究深度学习方法在克服使用来自 BindingDB 的十个（10）癌症相关活性类别的二进制分类中的 DTI 预测中的类别不平衡问题方面的有效性。结果发现，在预测 DTI 时使用随机欠采样（RUS）严重影响模型的性能，尤其是当数据集高度不平衡时，因此，RUS 不可靠。还发现，当与随机森林和高斯朴素贝叶斯分类器配对使用时，SVM-SMOTE 可以作为首选的重采样方法，所有严重和中度不平衡的活性类别的 F1 得分都很高。此外，即使没有应用重采样方法，称为多层感知器的深度学习方法也记录了所有活性类别的高 F1 得分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e21/9964614/e9728a3c00a7/molecules-28-01663-g001.jpg

相似文献

Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.机器学习和深度学习模型中用于药物-靶标相互作用预测的重采样技术的比较研究。

Molecules. 2023 Feb 9;28(4):1663. doi: 10.3390/molecules28041663.

Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类

J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.

Prediction and Diagnosis of Breast Cancer Using Machine and Modern Deep Learning Models.使用机器和现代深度学习模型预测和诊断乳腺癌。

Asian Pac J Cancer Prev. 2024 Mar 1;25(3):1077-1085. doi: 10.31557/APJCP.2024.25.3.1077.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测：比较不同基于 SMOTE 的机器学习算法。

BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.

Class-imbalanced crash prediction based on real-time traffic and weather data: A driving simulator study.基于实时交通和天气数据的不平衡碰撞预测：驾驶模拟器研究。

Traffic Inj Prev. 2020;21(3):201-208. doi: 10.1080/15389588.2020.1723794. Epub 2020 Mar 3.

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。

Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.

Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy.机器学习中不平衡数据集的重采样技术比较：在局灶性癫痫患者发作间期颅内脑电图记录的致痫区定位中的应用

Front Neuroinform. 2021 Nov 19;15:715421. doi: 10.3389/fninf.2021.715421. eCollection 2021.

Improved patient mortality predictions in emergency departments with deep learning data-synthesis and ensemble models.深度学习数据合成和集成模型可提高急诊科患者死亡率预测精度。

Sci Rep. 2023 Sep 12;13(1):15031. doi: 10.1038/s41598-023-41544-0.

Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania.坦桑尼亚北部一家三级医院分娩时行引产干预后 5 分钟低 Apgar 评分的预测：不平衡数据的机器学习方法。

BMC Pregnancy Childbirth. 2022 Apr 1;22(1):275. doi: 10.1186/s12884-022-04534-0.

引用本文的文献

Antagonistic Trends Between Binding Affinity and Drug-Likeness in SARS-CoV-2 Mpro Inhibitors Revealed by Machine Learning.机器学习揭示的SARS-CoV-2 Mpro抑制剂中结合亲和力与类药性之间的拮抗趋势

Viruses. 2025 Jun 30;17(7):935. doi: 10.3390/v17070935.

A review of machine learning methods for imbalanced data challenges in chemistry.化学中不平衡数据挑战的机器学习方法综述。

Chem Sci. 2025 Apr 22;16(18):7637-7658. doi: 10.1039/d5sc00270b. eCollection 2025 May 7.

Assessing Glioblastoma Treatment Response Using Machine Learning Approach Based on Magnetic Resonance Images Radiomics: An Exploratory Study.基于磁共振图像放射组学的机器学习方法评估胶质母细胞瘤治疗反应：一项探索性研究。

Health Sci Rep. 2024 Dec 30;8(1):e70323. doi: 10.1002/hsr2.70323. eCollection 2025 Jan.

Comprehensive applications of the artificial intelligence technology in new drug research and development.人工智能技术在新药研发中的综合应用。

Health Inf Sci Syst. 2024 Aug 8;12(1):41. doi: 10.1007/s13755-024-00300-y. eCollection 2024 Dec.

The Art of Finding the Right Drug Target: Emerging Methods and Strategies.寻找正确药物靶点的艺术：新兴方法和策略。

Pharmacol Rev. 2024 Aug 15;76(5):896-914. doi: 10.1124/pharmrev.123.001028.

Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery.变革药物化学：人工智能在早期药物发现中的应用。

Pharmaceuticals (Basel). 2023 Sep 6;16(9):1259. doi: 10.3390/ph16091259.

本文引用的文献

UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning.无偏 DTI：通过使用深度集成平衡学习来减轻药物-靶标相互作用预测的实际偏差。

Molecules. 2022 May 6;27(9):2980. doi: 10.3390/molecules27092980.

A review on machine learning approaches and trends in drug discovery.关于药物发现中机器学习方法与趋势的综述。

Comput Struct Biotechnol J. 2021 Aug 12;19:4538-4558. doi: 10.1016/j.csbj.2021.08.011. eCollection 2021.

Machine Learning in Drug Discovery: A Review.药物发现中的机器学习：综述

Artif Intell Rev. 2022;55(3):1947-1999. doi: 10.1007/s10462-021-10058-4. Epub 2021 Aug 11.

Application of Machine Learning for Drug-Target Interaction Prediction.机器学习在药物-靶点相互作用预测中的应用。

Front Genet. 2021 Jun 21;12:680117. doi: 10.3389/fgene.2021.680117. eCollection 2021.

Advances and Perspectives in Applying Deep Learning for Drug Design and Discovery.深度学习在药物设计与发现中的应用进展与展望

Front Robot AI. 2019 Nov 5;6:108. doi: 10.3389/frobt.2019.00108. eCollection 2019.

Machine Learning Methods in Drug Discovery.药物发现中的机器学习方法。

Molecules. 2020 Nov 12;25(22):5277. doi: 10.3390/molecules25225277.

DeepACTION: A deep learning-based method for predicting novel drug-target interactions.DeepACTION：一种基于深度学习的预测新型药物-靶标相互作用的方法。

Anal Biochem. 2020 Dec 1;610:113978. doi: 10.1016/j.ab.2020.113978. Epub 2020 Oct 6.

Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives.深度学习在药物靶点相互作用预测中的应用：现状与未来展望。

Curr Med Chem. 2021;28(11):2100-2113. doi: 10.2174/0929867327666200907141016.

Predicting Drug-Target Interactions with Electrotopological State Fingerprints and Amphiphilic Pseudo Amino Acid Composition.利用电化学拓扑状态指纹和两亲伪氨基酸组成预测药物-靶标相互作用。

Int J Mol Sci. 2020 Aug 8;21(16):5694. doi: 10.3390/ijms21165694.

Applications of Machine Learning in Drug Target Discovery.机器学习在药物靶点发现中的应用。

Curr Drug Metab. 2020;21(10):790-803. doi: 10.2174/1567201817999200728142023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习和深度学习模型中用于药物-靶标相互作用预测的重采样技术的比较研究。

Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献