Department of Critical Care, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
Department of Cardiology, Erasmus University Medical Center, Rotterdam, The Netherlands.
Crit Care Med. 2023 Jan 1;51(1):80-90. doi: 10.1097/CCM.0000000000005712. Epub 2022 Nov 15.
In a recent scoping review, we identified 43 mortality prediction models for critically ill patients. We aimed to assess the performances of these models through external validation.
Multicenter study.
External validation of models was performed in the Simple Intensive Care Studies-I (SICS-I) and the Finnish Acute Kidney Injury (FINNAKI) study.
The SICS-I study consisted of 1,075 patients, and the FINNAKI study consisted of 2,901 critically ill patients.
For each model, we assessed: 1) the original publications for the data needed for model reconstruction, 2) availability of the variables, 3) model performance in two independent cohorts, and 4) the effects of recalibration on model performance. The models were recalibrated using data of the SICS-I and subsequently validated using data of the FINNAKI study. We evaluated overall model performance using various indexes, including the (scaled) Brier score, discrimination (area under the curve of the receiver operating characteristics), calibration (intercepts and slopes), and decision curves. Eleven models (26%) could be externally validated. The Acute Physiology And Chronic Health Evaluation (APACHE) II, APACHE IV, Simplified Acute Physiology Score (SAPS)-Reduced (SAPS-R)' and Simplified Mortality Score for the ICU models showed the best scaled Brier scores of 0.11' 0.10' 0.10' and 0.06' respectively. SAPS II, APACHE II, and APACHE IV discriminated best; overall discrimination of models ranged from area under the curve of the receiver operating characteristics of 0.63 (0.61-0.66) to 0.83 (0.81-0.85). We observed poor calibration in most models, which improved to at least moderate after recalibration of intercepts and slopes. The decision curve showed a positive net benefit in the 0-60% threshold probability range for APACHE IV and SAPS-R.
In only 11 out of 43 available mortality prediction models, the performance could be studied using two cohorts of critically ill patients. External validation showed that the discriminative ability of APACHE II, APACHE IV, and SAPS II was acceptable to excellent, whereas calibration was poor.
在最近的一项范围界定综述中,我们确定了 43 个用于危重症患者的死亡率预测模型。我们旨在通过外部验证来评估这些模型的性能。
多中心研究。
模型的外部验证在 Simple Intensive Care Studies-I (SICS-I) 和 Finnish Acute Kidney Injury (FINNAKI) 研究中进行。
SICS-I 研究纳入了 1075 名患者,FINNAKI 研究纳入了 2901 名危重症患者。
对于每个模型,我们评估了:1) 原始出版物中用于模型重构的数据,2) 变量的可用性,3) 在两个独立队列中的模型性能,4) 重新校准对模型性能的影响。使用 SICS-I 的数据对模型进行重新校准,然后使用 FINNAKI 研究的数据进行验证。我们使用各种指标评估整体模型性能,包括(缩放)Brier 评分、区分度(接受者操作特征曲线下的面积)、校准(截距和斜率)和决策曲线。11 个模型(26%)可以进行外部验证。急性生理学和慢性健康评估(APACHE)II、APACHE IV、简化急性生理学评分(SAPS)-简化(SAPS-R)和重症监护模型的简化死亡率评分分别显示出最佳的缩放 Brier 评分 0.11、0.10、0.10 和 0.06。SAPS II、APACHE II 和 APACHE IV 的区分度最佳;模型的整体区分度范围为接受者操作特征曲线下的面积 0.63(0.61-0.66)至 0.83(0.81-0.85)。我们观察到大多数模型的校准效果不佳,通过重新校准截距和斜率后,至少可以达到中等水平。决策曲线显示在 0-60%阈值概率范围内,APACHE IV 和 SAPS-R 的净获益为正。
在 43 个可用的死亡率预测模型中,只有 11 个模型可以使用两个危重症患者队列进行研究。外部验证表明,APACHE II、APACHE IV 和 SAPS II 的判别能力可接受至优秀,而校准效果不佳。