Austin Peter C, Putter Hein, Giardiello Daniele, van Klaveren David
ICES, G106, 2075 Bayview Avenue, Toronto, Ontario, M4N 3M5, Canada.
Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Ontario, Canada.
Diagn Progn Res. 2022 Jan 17;6(1):2. doi: 10.1186/s41512-021-00114-6.
Assessing calibration-the agreement between estimated risk and observed proportions-is an important component of deriving and validating clinical prediction models. Methods for assessing the calibration of prognostic models for use with competing risk data have received little attention.
We propose a method for graphically assessing the calibration of competing risk regression models. Our proposed method can be used to assess the calibration of any model for estimating incidence in the presence of competing risk (e.g., a Fine-Gray subdistribution hazard model; a combination of cause-specific hazard functions; or a random survival forest). Our method is based on using the Fine-Gray subdistribution hazard model to regress the cumulative incidence function of the cause-specific outcome of interest on the predicted outcome risk of the model whose calibration we want to assess. We provide modifications of the integrated calibration index (ICI), of E50 and of E90, which are numerical calibration metrics, for use with competing risk data. We conducted a series of Monte Carlo simulations to evaluate the performance of these calibration measures when the underlying model has been correctly specified and when the model was mis-specified and when the incidence of the cause-specific outcome differed between the derivation and validation samples. We illustrated the usefulness of calibration curves and the numerical calibration metrics by comparing the calibration of a Fine-Gray subdistribution hazards regression model with that of random survival forests for predicting cardiovascular mortality in patients hospitalized with heart failure.
The simulations indicated that the method for constructing graphical calibration curves and the associated calibration metrics performed as desired. We also demonstrated that the numerical calibration metrics can be used as optimization criteria when tuning machine learning methods for competing risk outcomes.
The calibration curves and numeric calibration metrics permit a comprehensive comparison of the calibration of different competing risk models.
评估校准——估计风险与观察比例之间的一致性——是推导和验证临床预测模型的重要组成部分。用于评估具有竞争风险数据的预后模型校准的方法很少受到关注。
我们提出了一种用于以图形方式评估竞争风险回归模型校准的方法。我们提出的方法可用于评估在存在竞争风险的情况下估计发病率的任何模型(例如,Fine-Gray 亚分布风险模型;特定病因风险函数的组合;或随机生存森林)的校准。我们的方法基于使用 Fine-Gray 亚分布风险模型,将感兴趣的特定病因结局的累积发病率函数回归到我们想要评估其校准的模型的预测结局风险上。我们对综合校准指数(ICI)、E50 和 E90 进行了修改,这些是数值校准指标,用于竞争风险数据。我们进行了一系列蒙特卡罗模拟,以评估当基础模型被正确设定、模型被错误设定以及推导样本和验证样本之间特定病因结局的发病率不同时,这些校准措施的性能。我们通过比较 Fine-Gray 亚分布风险回归模型与随机生存森林在预测心力衰竭住院患者心血管死亡率方面的校准情况,说明了校准曲线和数值校准指标的有用性。
模拟表明,构建图形校准曲线的方法和相关的校准指标按预期执行。我们还证明,在调整用于竞争风险结局的机器学习方法时,数值校准指标可以用作优化标准。
校准曲线和数值校准指标允许对不同竞争风险模型的校准进行全面比较。