Suppr超能文献

利用剂量反应数据的不确定性估计改进机器学习对半数有效浓度(EC50)的预测

Improved Machine Learning Predictions of EC50s Using Uncertainty Estimation from Dose-Response Data.

作者信息

Bellamy Hugo, Dickhaut Joachim, King Ross D

机构信息

Department of Chemical engineering and biotechnology, University of Cambridge, Cambridge CB2 1TN, United Kingdom of Great Britain and Northern Ireland.

BASF, Ludwigshafen 67056, Germany.

出版信息

J Chem Inf Model. 2025 Jun 9;65(11):5623-5634. doi: 10.1021/acs.jcim.5c00249. Epub 2025 May 19.

Abstract

In early-stage drug design, machine learning models often rely on compressed representations of data, where raw experimental results are distilled into a single metric per molecule through curve fitting. This process discards valuable information about the quality of the curve fit. In this study, we incorporated a fit-quality metric into machine learning models to capture the reliability of metrics for individual molecules. Using 40 data sets from PubChem (public) and BASF (private), we demonstrated that including this quality metric can significantly improve predictive performance without additional experiments. Four methods were tested: random forests with parametric bootstrap, weighted random forests, variable output smearing random forests, and weighted support vector regression. When using fit-quality metrics, at least one of these methods led to a statistically significant improvement on 31 of the 40 data sets. In the best case, these methods led to a 22% reduction in the root-mean-squared error of the models. Overall, our results demonstrate that by adapting data processing to account for curve fit quality, we can improve predictive performance across a range of different data sets.

摘要

在早期药物设计中,机器学习模型通常依赖于数据的压缩表示,即通过曲线拟合将原始实验结果提炼为每个分子的单一指标。此过程丢弃了有关曲线拟合质量的有价值信息。在本研究中,我们将拟合质量指标纳入机器学习模型,以捕捉单个分子指标的可靠性。使用来自PubChem(公开)和巴斯夫(私有)的40个数据集,我们证明纳入此质量指标可以在无需额外实验的情况下显著提高预测性能。测试了四种方法:带参数自举的随机森林、加权随机森林、可变输出涂抹随机森林和加权支持向量回归。使用拟合质量指标时,这些方法中的至少一种在40个数据集中的31个上带来了统计学上显著的改进。在最佳情况下,这些方法使模型的均方根误差降低了22%。总体而言,我们的结果表明,通过调整数据处理以考虑曲线拟合质量,我们可以在一系列不同数据集上提高预测性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验