Schnellinger Erin M, Yang Wei, Kimmel Stephen E
Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, USA.
Diagn Progn Res. 2021 Dec 6;5(1):20. doi: 10.1186/s41512-021-00110-w.
Prediction models inform many medical decisions, but their performance often deteriorates over time. Several discrete-time update strategies have been proposed in the literature, including model recalibration and revision. However, these strategies have not been compared in the dynamic updating setting.
We used post-lung transplant survival data during 2010-2015 and compared the Brier Score (BS), discrimination, and calibration of the following update strategies: (1) never update, (2) update using the closed testing procedure proposed in the literature, (3) always recalibrate the intercept, (4) always recalibrate the intercept and slope, and (5) always refit/revise the model. In each case, we explored update intervals of every 1, 2, 4, and 8 quarters. We also examined how the performance of the update strategies changed as the amount of old data included in the update (i.e., sliding window length) increased.
All methods of updating the model led to meaningful improvement in BS relative to never updating. More frequent updating yielded better BS, discrimination, and calibration, regardless of update strategy. Recalibration strategies led to more consistent improvements and less variability over time compared to the other updating strategies. Using longer sliding windows did not substantially impact the recalibration strategies, but did improve the discrimination and calibration of the closed testing procedure and model revision strategies.
Model updating leads to improved BS, with more frequent updating performing better than less frequent updating. Model recalibration strategies appeared to be the least sensitive to the update interval and sliding window length.
预测模型为许多医学决策提供依据,但其性能往往会随着时间的推移而下降。文献中提出了几种离散时间更新策略,包括模型重新校准和修订。然而,这些策略尚未在动态更新设置中进行比较。
我们使用了2010 - 2015年肺移植后的生存数据,并比较了以下更新策略的Brier评分(BS)、区分度和校准度:(1)从不更新;(2)使用文献中提出的封闭测试程序进行更新;(3)始终重新校准截距;(4)始终重新校准截距和斜率;(5)始终重新拟合/修订模型。在每种情况下,我们探讨了每1、2、4和8个季度的更新间隔。我们还研究了随着更新中包含的旧数据量(即滑动窗口长度)增加,更新策略的性能如何变化。
相对于从不更新,所有模型更新方法均使BS有显著改善。无论更新策略如何,更频繁的更新会产生更好的BS、区分度和校准度。与其他更新策略相比,重新校准策略随着时间的推移导致更一致的改善且变异性更小。使用更长的滑动窗口对重新校准策略没有实质性影响,但确实改善了封闭测试程序和模型修订策略的区分度和校准度。
模型更新可改善BS,更频繁的更新比不频繁的更新表现更好。模型重新校准策略似乎对更新间隔和滑动窗口长度最不敏感。