Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States.
Information Services Department, Lucile Packard Children's Hospital, Stanford, Palo Alto, California, United States.
Appl Clin Inform. 2022 Mar;13(2):431-438. doi: 10.1055/s-0042-1746168. Epub 2022 May 4.
The purpose of this study is to evaluate the ability of three metrics to monitor for a reduction in performance of a chronic kidney disease (CKD) model deployed at a pediatric hospital.
The CKD risk model estimates a patient's risk of developing CKD 3 to 12 months following an inpatient admission. The model was developed on a retrospective dataset of 4,879 admissions from 2014 to 2018, then run silently on 1,270 admissions from April to October, 2019. Three metrics were used to monitor its performance during the silent phase: (1) standardized mean differences (SMDs); (2) performance of a "membership model"; and (3) response distribution analysis. Observed patient outcomes for the 1,270 admissions were used to calculate prospective model performance and the ability of the three metrics to detect performance changes.
The deployed model had an area under the receiver-operator curve (AUROC) of 0.63 in the prospective evaluation, which was a significant decrease from an AUROC of 0.76 on retrospective data ( = 0.033). Among the three metrics, SMDs were significantly different for 66/75 (88%) of the model's input variables ( <0.05) between retrospective and deployment data. The membership model was able to discriminate between the two settings (AUROC = 0.71, <0.0001) and the response distributions were significantly different ( <0.0001) for the two settings.
This study suggests that the three metrics examined could provide early indication of performance deterioration in deployed models' performance.
本研究旨在评估三种指标监测部署在儿科医院的慢性肾脏病(CKD)模型性能下降的能力。
CKD 风险模型估计患者在住院后 3 至 12 个月发生 CKD 的风险。该模型基于 2014 年至 2018 年 4879 例住院患者的回顾性数据集开发,然后在 2019 年 4 月至 10 月的 1270 例住院患者中静默运行。在静默阶段使用三种指标监测其性能:(1)标准化平均差异(SMD);(2)“成员模型”的性能;(3)响应分布分析。使用 1270 例患者的观察结果来计算前瞻性模型性能和三种指标检测性能变化的能力。
在前瞻性评估中,部署的模型的接收者操作特征曲线(AUROC)为 0.63,与回顾性数据的 AUROC(0.76)相比,显著降低(=0.033)。在三种指标中,SMD 在回顾性和部署数据之间有 66/75(88%)个模型输入变量显著不同(<0.05)。成员模型能够区分两种设置(AUROC=0.71,<0.0001),并且两种设置的响应分布差异显著(<0.0001)。
本研究表明,所检查的三种指标可能提供部署模型性能恶化的早期迹象。