使用多种药物发现数据集比较深度学习与多种机器学习方法和指标。

Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.

机构信息

Science Data Software, LLC , 14914 Bradwill Court, Rockville, Maryland 20850, United States.

Collaborations Pharmaceuticals, Inc. , 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.

出版信息

Mol Pharm. 2017 Dec 4;14(12):4462-4475. doi: 10.1021/acs.molpharmaceut.7b00578. Epub 2017 Nov 13.

DOI:10.1021/acs.molpharmaceut.7b00578

PMID:29096442

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5741413/

Abstract

Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.

摘要

几十年来，机器学习方法已应用于药物研究中的多个数据集。由于易于获取指纹类型分子描述符并结合贝叶斯方法，因此这种方法广泛应用于与药物发现相关的各种不同终点。深度学习是最新的机器学习算法，它吸引了人们的关注，可用于药物研发的许多应用，从对接筛选到虚拟筛选。深度学习基于具有多个隐藏层的人工神经网络，已在许多人工智能应用中得到广泛应用。我们之前曾建议，需要在各种不同的数据集之间比较不同的机器学习方法与深度学习，这些数据集适用于药物研究。与药物研究相关的终点包括吸收、分布、代谢、排泄和毒性（ADME/Tox）特性，以及对病原体的活性和药物发现数据集。在这项研究中，我们使用了溶解度、探针相似性、hERG、KCNQ1、腺鼠疫、恰加斯病、结核病和疟疾数据集，使用 FCFP6 指纹比较了不同的机器学习方法。这些数据集代表全细胞筛选、单个蛋白质、物理化学性质以及具有复杂终点的数据集。我们的目的是评估在使用包括 AUC、F1 分数、科恩氏 kappa、马修斯相关系数等多种指标评估时，深度学习是否在测试中提供了任何改进。基于排名归一化分数，对于这些指标或数据集，深度神经网络 (DNN) 的排名高于支持向量机 (SVM)，而 SVM 又高于所有其他机器学习方法。使用雷达图类型的图表可视化这些训练集和测试集的属性表明模型何时较差或可能过度训练。这些结果还表明，需要使用多种指标进一步评估深度学习，同时进行更大规模的比较、前瞻性测试以及评估不同的指纹和 DNN 架构，而不仅仅是使用上述方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac73/5741413/ff7aef37d749/nihms927142f1.jpg

相似文献

Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets.使用多种药物发现数据集比较深度学习与多种机器学习方法和指标。

Mol Pharm. 2017 Dec 4;14(12):4462-4475. doi: 10.1021/acs.molpharmaceut.7b00578. Epub 2017 Nov 13.

Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。

Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.

Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction.比较多种机器学习算法和指标进行雌激素受体结合预测。

Mol Pharm. 2018 Oct 1;15(10):4361-4370. doi: 10.1021/acs.molpharmaceut.8b00546. Epub 2018 Aug 28.

The Next Era: Deep Learning in Pharmaceutical Research.下一个时代：药物研究中的深度学习。

Pharm Res. 2016 Nov;33(11):2594-603. doi: 10.1007/s11095-016-2029-7. Epub 2016 Sep 6.

Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习：药物发现的机器智能方法。

Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.

Investigation of Machine Intelligence in Compound Cell Activity Classification.化合物细胞活动分类中的机器智能研究。

Mol Pharm. 2019 Nov 4;16(11):4472-4484. doi: 10.1021/acs.molpharmaceut.9b00558. Epub 2019 Oct 21.

Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets.用于 ADME-Tox 性质的预测性多任务深度神经网络模型：从大数据集学习。

J Chem Inf Model. 2019 Mar 25;59(3):1253-1268. doi: 10.1021/acs.jcim.8b00785. Epub 2019 Jan 24.

Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.基于细胞的 HIV 和逆转录酶数据集的多种机器学习比较。

Mol Pharm. 2019 Apr 1;16(4):1620-1632. doi: 10.1021/acs.molpharmaceut.8b01297. Epub 2019 Feb 26.

Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery.比较和验证用于结核分枝杆菌药物发现的机器学习模型。

Mol Pharm. 2018 Oct 1;15(10):4346-4360. doi: 10.1021/acs.molpharmaceut.8b00083. Epub 2018 Apr 26.

Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology.利用机器学习进展进行药物发现和分子生物学中的数据整合

Methods Mol Biol. 2021;2190:167-184. doi: 10.1007/978-1-0716-0826-5_7.

引用本文的文献

Machine Learning in Tuberculosis Research: A Global Bibliometric Analysis of Diagnostic, Prognostic, and Drug Discovery Trends.结核病研究中的机器学习：诊断、预后及药物发现趋势的全球文献计量分析

Ther Innov Regul Sci. 2025 Aug 21. doi: 10.1007/s43441-025-00866-z.

MEN: leveraging explainable multimodal encoding network for precision prediction of CYP450 inhibitors.MEN：利用可解释的多模态编码网络进行CYP450抑制剂的精准预测

Sci Rep. 2025 Jul 1;15(1):21820. doi: 10.1038/s41598-025-04982-6.

"Amide - amine + alcohol = carboxylic acid." chemical reactions as linear algebraic analogies in graph neural networks.“酰胺 - 胺 + 醇 = 羧酸。” 作为图神经网络中线性代数类比的化学反应。

Chem Sci. 2025 Apr 23. doi: 10.1039/d4sc05655h.

Machine learning in prediction of epidermal growth factor receptor status in non-small cell lung cancer brain metastases: a systematic review and meta-analysis.机器学习在预测非小细胞肺癌脑转移中表皮生长因子受体状态的应用：一项系统综述和荟萃分析

BMC Cancer. 2025 May 1;25(1):818. doi: 10.1186/s12885-025-14221-w.

Machine Learning-Enhanced Optimization for High-Throughput Precision in Cellular Droplet Bioprinting.机器学习增强的细胞微滴生物打印高通量精度优化

Adv Sci (Weinh). 2025 May;12(20):e2412831. doi: 10.1002/advs.202412831. Epub 2025 Apr 27.

Computational Approaches for Predicting Drug Interactions with Human Organic Anion Transporter 4 (OAT4).预测药物与人类有机阴离子转运体4（OAT4）相互作用的计算方法

Mol Pharm. 2025 Apr 7;22(4):1847-1858. doi: 10.1021/acs.molpharmaceut.4c00984. Epub 2025 Mar 20.

Designing nanoparticles to minimize unintended inflammatory responses: a step toward safer and more effective precision nanomedicine.设计纳米颗粒以尽量减少意外的炎症反应：迈向更安全、更有效的精准纳米医学的一步。

Nanomedicine (Lond). 2025 Jun;20(11):1213-1217. doi: 10.1080/17435889.2025.2476377. Epub 2025 Mar 11.

Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery.开发一种基于 PU 学习的数据增强策略的半监督方法，用于多靶标药物发现。

Int J Mol Sci. 2024 Jul 28;25(15):8239. doi: 10.3390/ijms25158239.

Predicting Chemical Immunotoxicity through Data-Driven QSAR Modeling of Aryl Hydrocarbon Receptor Agonism and Related Toxicity Mechanisms.通过基于数据驱动的芳烃受体激动作用及相关毒性机制的定量构效关系模型预测化学物质的免疫毒性

Environ Health (Wash). 2024 May 28;2(7):474-485. doi: 10.1021/envhealth.4c00026. eCollection 2024 Jul 19.

CPSign: conformal prediction for cheminformatics modeling.CPSign：用于化学信息学建模的共形预测

J Cheminform. 2024 Jun 28;16(1):75. doi: 10.1186/s13321-024-00870-9.

本文引用的文献

MoleculeNet: a benchmark for molecular machine learning.分子网络：分子机器学习的一个基准

Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.

Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data.深度学习：研究深度神经网络超参数以及将其性能与用于生物活性数据建模的浅层方法进行比较。

J Cheminform. 2017 Jun 28;9(1):42. doi: 10.1186/s13321-017-0226-y.

Integrative deep models for alternative splicing.整合的剪接异构体的深度学习模型。

Bioinformatics. 2017 Jul 15;33(14):i274-i282. doi: 10.1093/bioinformatics/btx268.

A deep convolutional neural network model to classify heartbeats.一种用于分类心跳的深度卷积神经网络模型。

Comput Biol Med. 2017 Oct 1;89:389-396. doi: 10.1016/j.compbiomed.2017.08.022. Epub 2017 Aug 24.

TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions.拓扑网络：用于生物分子性质预测的基于拓扑的深度卷积和多任务神经网络。

PLoS Comput Biol. 2017 Jul 27;13(7):e1005690. doi: 10.1371/journal.pcbi.1005690. eCollection 2017 Jul.

ADMET Evaluation in Drug Discovery. Part 17: Development of Quantitative and Qualitative Prediction Models for Chemical-Induced Respiratory Toxicity.药物研发中的ADMET评估。第17部分：化学诱导呼吸毒性的定量和定性预测模型的开发。

Mol Pharm. 2017 Jul 3;14(7):2407-2421. doi: 10.1021/acs.molpharmaceut.7b00317. Epub 2017 Jun 21.

Low Data Drug Discovery with One-Shot Learning.基于一次性学习的低数据药物发现

ACS Cent Sci. 2017 Apr 26;3(4):283-293. doi: 10.1021/acscentsci.6b00367. Epub 2017 Apr 3.

The Next Era: Deep Learning in Pharmaceutical Research.下一个时代：药物研究中的深度学习。

Pharm Res. 2016 Nov;33(11):2594-603. doi: 10.1007/s11095-016-2029-7. Epub 2016 Sep 6.

Molecular graph convolutions: moving beyond fingerprints.分子图卷积：超越指纹图谱

J Comput Aided Mol Des. 2016 Aug;30(8):595-608. doi: 10.1007/s10822-016-9938-8. Epub 2016 Aug 24.

Deep Learning in Drug Discovery.药物研发中的深度学习

Mol Inform. 2016 Jan;35(1):3-14. doi: 10.1002/minf.201501008. Epub 2015 Dec 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验