Departments of Biomedical Informatics (S.E.D., C.D., D.W., M.E.M.), Vanderbilt University Medical Center, Nashville, TN.
Departments of Epidemiology (J.R.B.), Dartmouth Geisel School of Medicine, Hanover, NH.
Circ Cardiovasc Qual Outcomes. 2022 Aug;15(8):e008635. doi: 10.1161/CIRCOUTCOMES.121.008635. Epub 2022 Aug 12.
The utility of quality dashboards to inform decision-making and improve clinical outcomes is tightly linked to the accuracy of the information they provide and, in turn, accuracy of underlying prediction models. Despite recognition of the need to update prediction models to maintain accuracy over time, there is limited guidance on updating strategies. We compare predefined and surveillance-based updating strategies applied to a model supporting quality evaluations among US veterans.
We evaluated the performance of a US Department of Veterans Affairs-specific model for postcardiac catheterization acute kidney injury using routinely collected observational data over the 6 years following model development (n=90 295 procedures in 2013-2019). Predicted probabilities were generated from the original model, an annually retrained model, and a surveillance-based approach that monitored performance to inform the timing and method of updates. We evaluated how updating the national model impacted regional quality profiles. We compared observed-to-expected outcome ratios, where values above and below 1 indicated more and fewer adverse outcomes than expected, respectively.
The original model overpredicted risk at the national level (observed-to-expected outcome ratio, 0.75 [0.74-0.77]). Annual retraining updated the model 5×; surveillance-based updating retrained once and recalibrated twice. While both strategies improved performance, the surveillance-based approach provided superior calibration (observed-to-expected outcome ratio, 1.01 [0.99-1.03] versus 0.94 [0.92-0.96]). Overprediction by the original model led to optimistic quality assessments, incorrectly indicating most of the US Department of Veterans Affairs' 18 regions observed fewer acute kidney injury events than predicted. Both updating strategies revealed 16 regions performed as expected and 2 regions increasingly underperformed, having more acute kidney injury events than predicted.
Miscalibrated clinical prediction models provide inaccurate pictures of performance across clinical units, and degrading calibration further complicates our understanding of quality. Updating strategies tailored to health system needs and capacity should be incorporated into model implementation plans to promote the utility and longevity of quality reporting tools.
质量仪表盘用于为决策提供信息并改善临床结果,其效用与提供信息的准确性密切相关,而准确性又与基础预测模型有关。尽管人们认识到需要更新预测模型以随着时间的推移保持准确性,但关于更新策略的指导有限。我们比较了应用于支持美国退伍军人质量评估的模型的预定义和基于监测的更新策略。
我们使用 2013 年至 2019 年期间开发模型后 6 年期间收集的常规观察数据,评估了美国退伍军人事务部特定的心脏导管术后急性肾损伤模型的性能(90295 例患者)。从原始模型、每年重新训练的模型和基于监测的方法生成预测概率,该方法监测性能以通知更新的时间和方法。我们评估了更新全国模型如何影响区域质量概况。我们比较了观察到的与预期结果的比值,其中大于和小于 1 分别表示不良结果比预期多和少。
原始模型在全国范围内过度预测了风险(观察到的与预期结果的比值为 0.75 [0.74-0.77])。每年重新训练更新模型 5 次;基于监测的更新仅重新训练一次并重新校准两次。虽然这两种策略都提高了性能,但基于监测的方法提供了更好的校准(观察到的与预期结果的比值为 1.01 [0.99-1.03],而 0.94 [0.92-0.96])。原始模型的过度预测导致了乐观的质量评估,错误地表明美国退伍军人事务部的 18 个地区中的大多数观察到的急性肾损伤事件少于预测。这两种更新策略都显示出 16 个地区的表现符合预期,而 2 个地区的表现逐渐下降,实际发生的急性肾损伤事件多于预测。
校准不良的临床预测模型提供了临床单位绩效的不准确情况,而校准的恶化进一步加剧了我们对质量的理解。应根据卫生系统的需求和能力制定更新策略,并将其纳入模型实施计划,以提高质量报告工具的实用性和寿命。