利用剂量反应数据的不确定性估计改进机器学习对半数有效浓度（EC50）的预测

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.

作者信息

Bellamy Hugo, Dickhaut Joachim, King Ross D

机构信息

Department of Chemical engineering and biotechnology, University of Cambridge, Cambridge CB2 1TN, United Kingdom of Great Britain and Northern Ireland.

BASF, Ludwigshafen 67056, Germany.

出版信息

J Chem Inf Model. 2025 Jun 9;65(11):5623-5634. doi: 10.1021/acs.jcim.5c00249. Epub 2025 May 19.

DOI:10.1021/acs.jcim.5c00249

PMID:40384077

Abstract

In early-stage drug design, machine learning models often rely on compressed representations of data, where raw experimental results are distilled into a single metric per molecule through curve fitting. This process discards valuable information about the quality of the curve fit. In this study, we incorporated a fit-quality metric into machine learning models to capture the reliability of metrics for individual molecules. Using 40 data sets from PubChem (public) and BASF (private), we demonstrated that including this quality metric can significantly improve predictive performance without additional experiments. Four methods were tested: random forests with parametric bootstrap, weighted random forests, variable output smearing random forests, and weighted support vector regression. When using fit-quality metrics, at least one of these methods led to a statistically significant improvement on 31 of the 40 data sets. In the best case, these methods led to a 22% reduction in the root-mean-squared error of the models. Overall, our results demonstrate that by adapting data processing to account for curve fit quality, we can improve predictive performance across a range of different data sets.

摘要

在早期药物设计中，机器学习模型通常依赖于数据的压缩表示，即通过曲线拟合将原始实验结果提炼为每个分子的单一指标。此过程丢弃了有关曲线拟合质量的有价值信息。在本研究中，我们将拟合质量指标纳入机器学习模型，以捕捉单个分子指标的可靠性。使用来自PubChem（公开）和巴斯夫（私有）的40个数据集，我们证明纳入此质量指标可以在无需额外实验的情况下显著提高预测性能。测试了四种方法：带参数自举的随机森林、加权随机森林、可变输出涂抹随机森林和加权支持向量回归。使用拟合质量指标时，这些方法中的至少一种在40个数据集中的31个上带来了统计学上显著的改进。在最佳情况下，这些方法使模型的均方根误差降低了22%。总体而言，我们的结果表明，通过调整数据处理以考虑曲线拟合质量，我们可以在一系列不同数据集上提高预测性能。

相似文献

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.利用剂量反应数据的不确定性估计改进机器学习对半数有效浓度（EC50）的预测

J Chem Inf Model. 2025 Jun 9;65(11):5623-5634. doi: 10.1021/acs.jcim.5c00249. Epub 2025 May 19.

General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。

J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.八种机器学习算法在十个临床代谢组学数据集上进行二进制分类的广义预测能力的比较评估。

Metabolomics. 2019 Nov 15;15(12):150. doi: 10.1007/s11306-019-1612-4.

Uncertainty-based saltwater intrusion prediction using integrated Bayesian machine learning modeling (IBMLM) in a deep aquifer.基于不确定性的深层含水层海水入侵预测：综合贝叶斯机器学习模型（IBMLM）的应用。

J Environ Manage. 2024 Mar;354:120252. doi: 10.1016/j.jenvman.2024.120252. Epub 2024 Feb 22.

American society of anesthesiologists physical status classification significantly affects the performances of machine learning models in intraoperative hypotension inference.美国麻醉医师协会的身体状况分类对术中低血压推断的机器学习模型的性能有显著影响。

J Clin Anesth. 2024 Feb;92:111309. doi: 10.1016/j.jclinane.2023.111309. Epub 2023 Nov 2.

A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery.一种基于决策理论的计算药物发现中机器学习算法评估方法。

Bioinformatics. 2019 Nov 1;35(22):4656-4663. doi: 10.1093/bioinformatics/btz293.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Novel learning framework (knockoff technique) to evaluate metric ranking algorithms to describe human response to injury.用于评估度量排序算法以描述人类对损伤反应的新型学习框架（仿冒技术）。

Traffic Inj Prev. 2018;19(sup2):S121-S126. doi: 10.1080/15389588.2018.1519805. Epub 2018 Dec 20.

Multi-metric comparison of machine learning imputation methods with application to breast cancer survival.基于机器学习的插补方法的多指标比较及其在乳腺癌生存分析中的应用。

BMC Med Res Methodol. 2024 Aug 30;24(1):191. doi: 10.1186/s12874-024-02305-3.

Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling.QSAR 建模中向量空间和度量空间表示的分析与比较。

Molecules. 2019 Apr 30;24(9):1698. doi: 10.3390/molecules24091698.

本文引用的文献

Multi-output prediction of dose-response curves enables drug repositioning and biomarker discovery.剂量反应曲线的多输出预测有助于药物重新定位和生物标志物发现。

NPJ Precis Oncol. 2024 Sep 20;8(1):209. doi: 10.1038/s41698-024-00691-x.

Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting.基于图神经网络的迁移学习在多保真度环境下提高分子性质预测

Nat Commun. 2024 Feb 26;15(1):1517. doi: 10.1038/s41467-024-45566-8.

The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库：一个涵盖多种生物活性数据类型和时间段的药物发现平台。

Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.

The effect of noise on the predictive limit of QSAR models.噪声对定量构效关系（QSAR）模型预测极限的影响。

J Cheminform. 2021 Nov 25;13(1):92. doi: 10.1186/s13321-021-00571-7.

Endocrine disruption: the noise in available data adversely impacts the models' performance.内分泌干扰：可用数据中的噪声对模型性能产生不利影响。

SAR QSAR Environ Res. 2021 Feb;32(2):111-131. doi: 10.1080/1062936X.2020.1864468. Epub 2021 Jan 19.

Application of decision tree-based ensemble learning in the classification of breast cancer.基于决策树的集成学习在乳腺癌分类中的应用。

Comput Biol Med. 2021 Jan;128:104089. doi: 10.1016/j.compbiomed.2020.104089. Epub 2020 Oct 31.

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction.使用神经网络进行分子性质预测的不确定性量化。

J Chem Inf Model. 2020 Aug 24;60(8):3770-3780. doi: 10.1021/acs.jcim.0c00502. Epub 2020 Aug 4.

Model averaging methods for the evaluation of dose-response model uncertainty when assessing the suitability of studies for estimating risk.评估研究是否适合估计风险时，用于评价剂量-反应模型不确定性的模型平均方法。

Environ Int. 2020 Oct;143:105857. doi: 10.1016/j.envint.2020.105857. Epub 2020 Jun 29.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

Meta-QSAR: a large-scale application of meta-learning to drug design and discovery.元定量构效关系（Meta-QSAR）：元学习在药物设计与发现中的大规模应用。

Mach Learn. 2018;107(1):285-311. doi: 10.1007/s10994-017-5685-x. Epub 2017 Dec 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用剂量反应数据的不确定性估计改进机器学习对半数有效浓度（EC50）的预测

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献