Suppr超能文献

理解模型性能指标的增量。

Understanding increments in model performance metrics.

作者信息

Pencina Michael J, D'Agostino Ralph B, Massaro Joseph M

机构信息

Department of Biostatistics, Harvard Clinical Research Institute, Boston University, CrossTown, 801 Massachusetts Ave., Boston, MA 02118, USA.

出版信息

Lifetime Data Anal. 2013 Apr;19(2):202-18. doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.

Abstract

The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.

摘要

受试者工作特征曲线(AUC)下的面积是二元结局预测模型中最常报告的区分度度量。然而,最近它受到了批评,因为当将重要风险因素添加到具有良好区分度的基线模型中时,它无法增加。这导致有人声称,依赖AUC作为区分度度量可能会错过从基线模型得出的风险预测规则在临床性能方面的重要改进。在本文中,我们通过在多元正态性假设下将AUC与基于敏感性和特异性的临床性能度量相关联来研究这一说法。将AUC的行为与区分度斜率的行为进行了对比。我们表明,除非需要具有非常高特异性的规则,否则AUC的变化作为临床性能度量变化的预测指标表现良好。然而,对于区分度良好与较差的基线模型,需要更强或更多的预测因子才能在AUC中实现相同的增量。当需要极高的特异性时,我们的结果表明,区分度斜率可能比AUC更适合作为模型改进的度量。使用弗雷明汉心脏研究中预测房颤10年发病率模型的示例说明了理论结果。

相似文献

1
Understanding increments in model performance metrics.理解模型性能指标的增量。
Lifetime Data Anal. 2013 Apr;19(2):202-18. doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.
10
Alternative performance measures for prediction models.预测模型的替代性能指标。
PLoS One. 2014 Mar 7;9(3):e91249. doi: 10.1371/journal.pone.0091249. eCollection 2014.

引用本文的文献

本文引用的文献

1
Misuse of DeLong test to compare AUCs for nested models.误用 Delong 检验比较嵌套模型的 AUC。
Stat Med. 2012 Oct 15;31(23):2577-87. doi: 10.1002/sim.5328. Epub 2012 Mar 13.
5
Using relative utility curves to evaluate risk prediction.使用相对效用曲线评估风险预测。
J R Stat Soc Ser A Stat Soc. 2009 Oct 1;172(4):729-748. doi: 10.1111/j.1467-985X.2009.00592.x.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验