Suppr超能文献

危重症患者预后模型的外部验证需要大量样本量。

External validation of prognostic models for critically ill patients required substantial sample sizes.

作者信息

Peek N, Arts D G T, Bosman R J, van der Voort P H J, de Keizer N F

机构信息

Department of Medical Informatics, Academic Medical Center--Universiteit van Amsterdam, Amsterdam, the Netherlands.

出版信息

J Clin Epidemiol. 2007 May;60(5):491-501. doi: 10.1016/j.jclinepi.2006.08.011. Epub 2007 Feb 5.

Abstract

OBJECTIVE

To investigate the behavior of predictive performance measures that are commonly used in external validation of prognostic models for outcome at intensive care units (ICUs).

STUDY DESIGN AND SETTING

Four prognostic models (Simplified Acute Physiology Score II, the Acute Physiology and Chronic Health Evaluation II, and the Mortality Probability Models II) were evaluated in the Dutch National Intensive Care Evaluation registry database. For each model discrimination (AUC), accuracy (Brier score), and two calibration measures were assessed on data from 41,239 ICU admissions. This validation procedure was repeated with smaller subsamples randomly drawn from the database, and the results were compared with those obtained on the entire data set.

RESULTS

Differences in performance between the models were small. The AUC and Brier score showed large variation with small samples. Standard errors of AUC values were accurate but the power to detect differences in performance was low. Calibration tests were extremely sensitive to sample size. Direct comparison of performance, without statistical analysis, was unreliable with either measure.

CONCLUSION

Substantial sample sizes are required for performance assessment and model comparison in external validation. Calibration statistics and significance tests should not be used in these settings. Instead, a simple customization method to repair lack-of-fit problems is recommended.

摘要

目的

研究重症监护病房(ICU)预后模型外部验证中常用的预测性能指标的表现。

研究设计与设置

在荷兰国家重症监护评估注册数据库中评估了四种预后模型(简化急性生理学评分II、急性生理学与慢性健康状况评估II以及死亡概率模型II)。针对每个模型,对41239例ICU入院患者的数据评估了区分度(AUC)、准确性(Brier评分)以及两种校准指标。使用从数据库中随机抽取的较小子样本重复此验证过程,并将结果与在整个数据集上获得的结果进行比较。

结果

模型之间的性能差异很小。AUC和Brier评分在小样本时显示出很大的变异性。AUC值的标准误差准确,但检测性能差异的效能较低。校准测试对样本量极其敏感。无论使用哪种指标,在不进行统计分析的情况下直接比较性能都是不可靠的。

结论

在外部验证中进行性能评估和模型比较需要大量样本。在这些情况下不应使用校准统计和显著性检验。相反,建议采用一种简单的定制方法来修复拟合不足问题。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验