Davis Sharon E, Lasko Thomas A, Chen Guanhua, Siew Edward D, Matheny Michael E
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.
Department of Biostatistics, Vanderbilt University School of Medicine.
J Am Med Inform Assoc. 2017 Nov 1;24(6):1052-1061. doi: 10.1093/jamia/ocx030.
Predictive analytics create opportunities to incorporate personalized risk estimates into clinical decision support. Models must be well calibrated to support decision-making, yet calibration deteriorates over time. This study explored the influence of modeling methods on performance drift and connected observed drift with data shifts in the patient population.
Using 2003 admissions to Department of Veterans Affairs hospitals nationwide, we developed 7 parallel models for hospital-acquired acute kidney injury using common regression and machine learning methods, validating each over 9 subsequent years.
Discrimination was maintained for all models. Calibration declined as all models increasingly overpredicted risk. However, the random forest and neural network models maintained calibration across ranges of probability, capturing more admissions than did the regression models. The magnitude of overprediction increased over time for the regression models while remaining stable and small for the machine learning models. Changes in the rate of acute kidney injury were strongly linked to increasing overprediction, while changes in predictor-outcome associations corresponded with diverging patterns of calibration drift across methods.
Efficient and effective updating protocols will be essential for maintaining accuracy of, user confidence in, and safety of personalized risk predictions to support decision-making. Model updating protocols should be tailored to account for variations in calibration drift across methods and respond to periods of rapid performance drift rather than be limited to regularly scheduled annual or biannual intervals.
预测性分析为将个性化风险估计纳入临床决策支持创造了机会。模型必须经过良好校准以支持决策制定,但校准会随着时间推移而恶化。本研究探讨了建模方法对性能漂移的影响,并将观察到的漂移与患者群体中的数据变化联系起来。
利用2003年全国退伍军人事务部医院的入院数据,我们使用常见的回归和机器学习方法开发了7个用于医院获得性急性肾损伤的并行模型,并在随后的9年中对每个模型进行验证。
所有模型的区分度均得以维持。随着所有模型对风险的过度预测越来越多,校准度下降。然而,随机森林和神经网络模型在概率范围内保持了校准,捕获的入院病例数比回归模型更多。回归模型的过度预测幅度随时间增加,而机器学习模型的过度预测幅度保持稳定且较小。急性肾损伤发生率的变化与过度预测的增加密切相关,而预测变量与结果关联的变化与不同方法校准漂移的不同模式相对应。
高效且有效的更新协议对于维持个性化风险预测的准确性、用户信心及安全性以支持决策制定至关重要。模型更新协议应根据不同方法校准漂移的差异进行定制,并应对性能快速漂移的时期,而不仅限于定期的年度或半年间隔。