Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden.
Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, SE-75124, Uppsala, Sweden; Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-16407, Kista, Sweden; MTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, Sweden.
J Pharm Sci. 2021 Jan;110(1):42-49. doi: 10.1016/j.xphs.2020.09.055. Epub 2020 Oct 17.
One of the challenges with predictive modeling is how to quantify the reliability of the models' predictions on new objects. In this work we give an introduction to conformal prediction, a framework that sits on top of traditional machine learning algorithms and which outputs valid confidence estimates to predictions from QSAR models in the form of prediction intervals that are specific to each predicted object. For regression, a prediction interval consists of an upper and a lower bound. For classification, a prediction interval is a set that contains none, one, or many of the potential classes. The size of the prediction interval is affected by a user-specified confidence/significance level, and by the nonconformity of the predicted object; i.e., the strangeness as defined by a nonconformity function. Conformal prediction provides a rigorous and mathematically proven framework for in silico modeling with guarantees on error rates as well as a consistent handling of the models' applicability domain intrinsically linked to the underlying machine learning model. Apart from introducing the concepts and types of conformal prediction, we also provide an example application for modeling ABC transporters using conformal prediction, as well as a discussion on general implications for drug discovery.
预测建模面临的挑战之一是如何量化模型对新对象预测的可靠性。在这项工作中,我们介绍了一种适用于传统机器学习算法的框架——共形预测,它输出有效的置信度估计,以预测间隔的形式为 QSAR 模型的预测提供特定于每个预测对象的置信度估计。对于回归,预测间隔由上限和下限组成。对于分类,预测间隔是一个包含一个、零个或多个潜在类别的集合。预测间隔的大小受用户指定的置信度/显著性水平以及预测对象的不一致性(即由不一致性函数定义的奇异度)的影响。共形预测为基于计算机的建模提供了一个严格的、数学上可证明的框架,具有对错误率的保证,以及对与基础机器学习模型内在相关的模型适用性域的一致处理。除了介绍共形预测的概念和类型外,我们还提供了一个使用共形预测对 ABC 转运蛋白进行建模的示例应用,以及对药物发现的一般影响的讨论。