Vassar M J, Lewis F R, Chambers J A, Mullins R J, O'Brien P E, Weigelt J A, Hoang M T, Holcroft J W
San Francisco Injury Center, University of California, 94110, USA.
J Trauma. 1999 Aug;47(2):324-9. doi: 10.1097/00005373-199908000-00017.
To conduct a multicenter study to validate the accuracy of the Acute Physiology and Chronic Health Evaluation (APACHE) II system, APACHE III system, Trauma and Injury Severity Score (TRISS) methodology, and a 24-hour intensive care unit (ICU) point system for prediction of mortality in ICU trauma patient admissions.
The study population consisted of retrospectively identified, consecutive ICU trauma admissions (n = 2,414) from six Level I trauma centers. Probabilities of death were calculated by using logistic regression analysis. The predictive power of each system was evaluated by using decision matrix analysis to compare observed and predicted outcomes with a decision criterion of 0.50 for risk of hospital death. The Youden Index (YI) was used to compare the proportion of patients correctly classified by each system. Measures of model calibration were based on goodness-of-fit testing (Hosmer-Lemeshow statistic less than 15.5) and model discrimination were based on the area under the receiver operating characteristic curve (AUC).
Overall, APACHE II (sensitivity, 38%; specificity, 99%; YI, 37%; H-L statistic, 92.6; AUC, 0.87) and TRISS (sensitivity, 52%; specificity, 94%; YI, 46%; H-L statistic, 228.1; AUC, 0.82) were poor predictors of aggregate mortality, because they did not meet the acceptable thresholds for both model calibration and discrimination. APACHE III (sensitivity, 60%; specificity, 98%; YI, 58%; H-L statistic, 7.0; AUC, 0.89) was comparable to the 24-hour ICU point system (sensitivity, 51%; specificity, 98%; YI, 50%; H-L statistic, 14.7; AUC, 0.89) with both systems showing strong agreement between the observed and predicted outcomes based on acceptable thresholds for both model calibration and discrimination. The APACHE III system significantly improved upon APACHE II for estimating risk of death in ICU trauma patients (p < 0.001). Compared with the overall performance, for the subset of patients with nonoperative head trauma, the percentage correctly classified was decreased to 46% for APACHE II; increased to 71% for APACHE III (p < 0.001 vs. APACHE II); increased to 59% for TRISS; and increased to 62% for 24-hour ICU points. For operative head trauma, the percentage correctly classified was increased to 60% for APACHE II; increased to 61% for APACHE III; decreased to 43% for TRISS (p < 0.004 vs. APACHE III); and increased to 54% for 24-hour ICU points. For patients without head injuries, all of the systems were unreliable and considerably underestimated the risk of death. The percentage of nonoperative and operative patients without head trauma who were correctly classified was decreased, respectively, to 26% and 30% for APACHE II; 33% and 29% for APACHE III; 33% and 19% for TRISS; 20% and 23% for 24-hour ICU points.
For the overall estimation of aggregate ICU mortality, the APACHE III system was the most reliable; however, performance was most accurate for subsets of patients with head trauma. The 24-hour ICU point system also demonstrated acceptable overall performance with improved performance for patients with head trauma. Overall, APACHE II and TRISS did not meet acceptable thresholds of performance. When estimating ICU mortality for subsets of patients without head trauma, none of these systems had an acceptable level of performance. Further multicenter studies aimed at developing better outcome prediction models for patients without head injuries are warranted, which would allow trauma care providers to set uniform standards for judging institutional performance.
开展一项多中心研究,以验证急性生理学与慢性健康状况评估(APACHE)II系统、APACHE III系统、创伤和损伤严重程度评分(TRISS)方法以及一个24小时重症监护病房(ICU)评分系统预测ICU创伤患者死亡率的准确性。
研究人群包括来自6个I级创伤中心的回顾性确定的连续ICU创伤入院患者(n = 2414)。通过逻辑回归分析计算死亡概率。使用决策矩阵分析评估每个系统的预测能力,将观察到的和预测的结果与医院死亡风险的决策标准0.50进行比较。约登指数(YI)用于比较每个系统正确分类的患者比例。模型校准的测量基于拟合优度检验(Hosmer-Lemeshow统计量小于15.5),模型判别基于受试者工作特征曲线下面积(AUC)。
总体而言,APACHE II(敏感性38%;特异性99%;YI 37%;H-L统计量92.6;AUC 0.87)和TRISS(敏感性52%;特异性94%;YI 46%;H-L统计量228.1;AUC 0.82)对总体死亡率的预测效果不佳,因为它们未达到模型校准和判别的可接受阈值。APACHE III(敏感性60%;特异性98%;YI 58%;H-L统计量7.0;AUC 0.89)与24小时ICU评分系统(敏感性51%;特异性98%;YI 50%;H-L统计量14.7;AUC 0.89)相当,两个系统在基于模型校准和判别的可接受阈值的观察结果和预测结果之间均显示出高度一致性。APACHE III系统在估计ICU创伤患者死亡风险方面比APACHE II有显著改善(p < 0.001)。与总体表现相比,对于非手术性头部创伤患者亚组,APACHE II正确分类的百分比降至46%;APACHE III增至71%(与APACHE II相比,p < 0.001);TRISS增至59%;24小时ICU评分增至62%。对于手术性头部创伤,APACHE II正确分类的百分比增至60%;APACHE III增至61%;TRISS降至43%(与APACHE III相比,p < 0.004);24小时ICU评分增至54%。对于无头部损伤的患者,所有系统均不可靠,且大大低估了死亡风险。无头部创伤的非手术和手术患者正确分类的百分比分别降至:APACHE II为26%和30%;APACHE III为33%和29%;TRISS为33%和19%;24小时ICU评分为20%和23%。
对于总体ICU死亡率的估计,APACHE III系统最可靠;然而,对于头部创伤患者亚组,其表现最为准确。24小时ICU评分系统也显示出可接受的总体表现,对于头部创伤患者表现有所改善。总体而言,APACHE II和TRISS未达到可接受的表现阈值。在估计无头部创伤患者亚组的ICU死亡率时,这些系统均未达到可接受的表现水平。有必要开展进一步的多中心研究,以开发针对无头部损伤患者的更好的预后预测模型,这将使创伤护理提供者能够设定统一标准来评判机构表现。