Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands; Department of Anesthesiology, Weill Cornell Medicine, New York, NY, USA.
Department of Anesthesiology, Erasmus University Medical Centre, Rotterdam, The Netherlands.
Br J Anaesth. 2024 Dec;133(6):1222-1233. doi: 10.1016/j.bja.2024.09.003. Epub 2024 Oct 29.
Risk prediction scores are used to guide clinical decision-making. Our primary objective was to externally validate two patient-specific risk scores for 30-day in-hospital mortality using the Multicenter Perioperative Outcomes Group (MPOG) registry: the Pediatric Risk Assessment (PRAm) score and the intrinsic surgical risk score. The secondary objective was to recalibrate these scores.
Data from 56 US and Dutch hospitals with paediatric caseloads were included. The primary outcome was 30-day mortality. To assess model discrimination, the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUC-PR) were calculated. Model calibration was assessed by plotting the observed and predicted probabilities. Decision analytic curves were fit.
The 30-day mortality was 0.14% (822/606 488). The AUROC for the PRAm upon external validation was 0.856 (95% confidence interval 0.844-0.869), and the AUC-PR was 0.008. Upon recalibration, the AUROC was 0.873 (0.861-0.886), and the AUC-PR was 0.031. The AUROC for the external validation of the intrinsic surgical risk score was 0.925 (0.914-0.936) and AUC-PR was 0.085. Upon recalibration, the AUROC was 0.925 (0.915-0.936), and the AUC-PR was 0.094. Calibration metrics for both scores were favourable because of the large cluster of cases with low probabilities of mortality. Decision curve analyses showed limited benefit to using either score.
The intrinsic surgical risk score performed better than the PRAm, but both resulted in large numbers of false positives. Both scores exhibited decreased performance compared with the original studies. ASA physical status scores in sicker patients drove the superior performance of the intrinsic surgical risk score, suggesting the use of a risk score does not improve prediction.
风险预测评分用于指导临床决策。我们的主要目标是使用多中心围手术期结局组(MPOG)登记处,对两种特定于患者的 30 天住院内死亡率风险评分进行外部验证:儿科风险评估(PRAm)评分和固有手术风险评分。次要目标是重新校准这些评分。
纳入了来自美国和荷兰的 56 家具有儿科病例量的医院的数据。主要结局是 30 天死亡率。为了评估模型区分度,计算了受试者工作特征曲线下面积(AUROC)和精度-召回曲线下面积(AUC-PR)。通过绘制观察到的和预测的概率来评估模型校准。拟合决策分析曲线。
30 天死亡率为 0.14%(822/606488)。PRAm 的外部验证 AUROC 为 0.856(95%置信区间 0.844-0.869),AUC-PR 为 0.008。经过重新校准后,AUROC 为 0.873(0.861-0.886),AUC-PR 为 0.031。固有手术风险评分的外部验证 AUROC 为 0.925(0.914-0.936),AUC-PR 为 0.085。经过重新校准后,AUROC 为 0.925(0.915-0.936),AUC-PR 为 0.094。由于死亡率低的病例数量庞大,这两个评分的校准指标都很理想。决策曲线分析表明,使用这两个评分的获益有限。
固有手术风险评分的表现优于 PRAm,但两者都导致了大量的假阳性。与原始研究相比,这两个评分的表现都有所下降。在更严重的患者中,ASA 身体状况评分驱动了固有手术风险评分的优异表现,这表明使用风险评分并不能改善预测。