ICES, Toronto, Canada.
Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada.
Stat Med. 2019 Sep 20;38(21):4051-4065. doi: 10.1002/sim.8281. Epub 2019 Jul 3.
Assessing the calibration of methods for estimating the probability of the occurrence of a binary outcome is an important aspect of validating the performance of risk-prediction algorithms. Calibration commonly refers to the agreement between predicted and observed probabilities of the outcome. Graphical methods are an attractive approach to assess calibration, in which observed and predicted probabilities are compared using loess-based smoothing functions. We describe the Integrated Calibration Index (ICI) that is motivated by Harrell's E index, which is the maximum absolute difference between a smooth calibration curve and the diagonal line of perfect calibration. The ICI can be interpreted as weighted difference between observed and predicted probabilities, in which observations are weighted by the empirical density function of the predicted probabilities. As such, the ICI is a measure of calibration that explicitly incorporates the distribution of predicted probabilities. We also discuss two related measures of calibration, E50 and E90, which represent the median and 90th percentile of the absolute difference between observed and predicted probabilities. We illustrate the utility of the ICI, E50, and E90 by using them to compare the calibration of logistic regression with that of random forests and boosted regression trees for predicting mortality in patients hospitalized with a heart attack. The use of these numeric metrics permitted for a greater differentiation in calibration than was permissible by visual inspection of graphical calibration curves.
评估用于估计二项结果发生概率的方法的校准是验证风险预测算法性能的一个重要方面。校准通常是指预测结果与观察结果之间的概率的一致性。图形方法是评估校准的一种有吸引力的方法,其中使用基于 loess 的平滑函数比较观察结果和预测结果的概率。我们描述了综合校准指数 (ICI),该指数是受 Harrell 的 E 指数启发的,E 指数是平滑校准曲线与完美校准的对角线之间的最大绝对差异。ICI 可以解释为观察结果和预测概率之间的加权差异,其中观察结果由预测概率的经验密度函数加权。因此,ICI 是一种校准度量,它明确地包含了预测概率的分布。我们还讨论了两种相关的校准度量,E50 和 E90,它们分别代表观察结果和预测概率之间的绝对差异的中位数和 90%分位数。我们通过使用 ICI、E50 和 E90 来比较逻辑回归和随机森林以及提升回归树在预测心脏病发作住院患者死亡率时的校准,说明了 ICI、E50 和 E90 的实用性。这些数字指标的使用允许比通过对图形校准曲线的目视检查进行更精细的校准区分。