Unit of General and Digestive Surgery, Hôpital Louis Mourier, Assistance Publique Hôpitaux de Paris, Colombes, France.
World J Surg. 2012 Oct;36(10):2320-7. doi: 10.1007/s00268-012-1683-0.
The P-POSSUM score, the most well known of predictive scores for postoperative mortality, requires validation for population and setting.
Validation methods included discrimination (C-index statistic), observed:expected (O:E) ratio, calibration with the Hosmer-Lemeshow test, and subgroup analysis (emergency surgery, cancer, age, organs). The study included 3,881 multisite patients undergoing major digestive surgery in France.
Discrimination via the receiver operating characteristic curve was good (C-index = 0.87). The overall O:E ratio was 1 (95% confidence interval ([95 % CI]: 0.88-1.13), and therefore the quality of the surgical performance is within normal ranges. The O:E ratio, calculated by risk ranges, showed overestimation in the low risk range, especially in the 3 % to 6 % and 6 % to 10 % ranges. Calibration was poor (p < 0.001). The model deviated from the normal pattern of calibration, with mortality lower than expected in the high-risk range. Subgroup analysis found reasonable to good discrimination of populations (C-index ranging from 0.78 to 0.93 except for liver surgery [0.67]) while calibration of individuals remained poor (p < 0.001 to 0.02).
Good discrimination, as well as nonsignificant overall O:E values, makes P-POSSUM a valuable tool when it is used for surgical audit to compare mortality between populations for major digestive surgery. Conversely, poor calibration (goodness-of-fit), especially in subgroup analysis, and underestimation or overestimation of O:E ratios considerably limits the value of P-POSSUM for prediction of mortality in individuals. Therefore P-POSSUM should not be used to predict outcomes for one particular patient.
预测术后死亡率最著名的预测评分 P-POSSUM 需要在人群和环境中进行验证。
验证方法包括区分度(C 指数统计)、观察到的:预期(O:E)比值、Hosmer-Lemeshow 检验校准和亚组分析(急诊手术、癌症、年龄、器官)。该研究纳入了法国 3881 例多中心接受主要消化系统手术的患者。
通过接受者操作特征曲线的区分度较好(C 指数=0.87)。整体 O:E 比值为 1(95%置信区间(95%CI):0.88-1.13),因此手术表现质量处于正常范围内。通过风险范围计算的 O:E 比值显示在低风险范围内存在高估,尤其是在 3%-6%和 6%-10%范围内。校准较差(p<0.001)。该模型偏离了校准的正常模式,高风险范围内的死亡率低于预期。亚组分析发现人群的区分度合理到较好(C 指数范围为 0.78-0.93,除了肝脏手术为 0.67),而个体的校准仍然较差(p<0.001 至 0.02)。
当用于外科审核以比较主要消化系统手术的人群之间的死亡率时,良好的区分度以及整体 O:E 值无统计学意义使 P-POSSUM 成为一种有价值的工具。相反,较差的校准(拟合优度),特别是在亚组分析中,以及 O:E 比值的低估或高估极大地限制了 P-POSSUM 预测个体死亡率的价值。因此,不应使用 P-POSSUM 来预测特定患者的预后。