Novartis Biomedical Research, Novartis Campus, 4002 Basel, Switzerland.
J Chem Inf Model. 2024 Nov 25;64(22):8379-8386. doi: 10.1021/acs.jcim.4c01578. Epub 2024 Nov 14.
Machine learning (ML) models have become key in decision-making for many disciplines, including drug discovery and medicinal chemistry. ML models are generally evaluated prior to their usage in high-stakes decisions, such as compound synthesis or experimental testing. However, no ML model is robust or predictive in all real-world scenarios. Therefore, uncertainty quantification (UQ) in ML predictions has gained importance in recent years. Many investigations have focused on developing methodologies that provide accurate uncertainty estimates for ML-based predictions. Unfortunately, there is no UQ strategy that consistently provides robust estimates about model's applicability on new samples. Depending on the dataset, prediction task, and algorithm, accurate uncertainty estimations might be unfeasible to obtain. Moreover, the optimum UQ metric also varies across applications, and previous investigations have shown a lack of consistency across benchmarks. Herein, the UNIQUE (UNcertaInty QUantification bEnchmarking) framework is introduced to facilitate a comparison of UQ strategies in ML-based predictions. This Python library unifies the benchmarking of multiple UQ metrics, including the calculation of nonstandard UQ metrics (combining information from the dataset and model), and provides a comprehensive evaluation. In this framework, UQ metrics are evaluated for different application scenarios, e.g., eliminating the predictions with the lowest confidence or obtaining a reliable uncertainty estimate for an acquisition function. Taken together, this library will help to standardize UQ investigations and evaluate new methodologies.
机器学习 (ML) 模型已成为许多学科(包括药物发现和药物化学)决策的关键。在做出高风险决策(如化合物合成或实验测试)之前,通常会对 ML 模型进行评估。然而,没有一个 ML 模型在所有实际场景中都是稳健或可预测的。因此,近年来,机器学习预测中的不确定性量化 (UQ) 变得越来越重要。许多研究都集中在开发能够为基于 ML 的预测提供准确不确定性估计的方法上。不幸的是,没有一种 UQ 策略能够始终为模型在新样本上的适用性提供稳健的估计。根据数据集、预测任务和算法的不同,准确的不确定性估计可能难以获得。此外,最佳 UQ 指标在不同应用中也有所不同,之前的研究表明,基准测试之间缺乏一致性。在此,引入 UNIQUE(UNcertaInty QUantification bEnchmarking)框架,以促进基于 ML 的预测中 UQ 策略的比较。这个 Python 库统一了多个 UQ 指标的基准测试,包括计算非标准 UQ 指标(结合数据集和模型的信息),并提供全面的评估。在这个框架中,UQ 指标针对不同的应用场景进行评估,例如,消除置信度最低的预测,或者为获取函数获得可靠的不确定性估计。总之,这个库将有助于标准化 UQ 研究并评估新的方法。