理解模型性能指标的增量。

Understanding increments in model performance metrics.

作者信息

Pencina Michael J, D'Agostino Ralph B, Massaro Joseph M

机构信息

Department of Biostatistics, Harvard Clinical Research Institute, Boston University, CrossTown, 801 Massachusetts Ave., Boston, MA 02118, USA.

出版信息

Lifetime Data Anal. 2013 Apr;19(2):202-18. doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.

DOI:10.1007/s10985-012-9238-0

PMID:23242535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3656609/

Abstract

The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.

摘要

受试者工作特征曲线（AUC）下的面积是二元结局预测模型中最常报告的区分度度量。然而，最近它受到了批评，因为当将重要风险因素添加到具有良好区分度的基线模型中时，它无法增加。这导致有人声称，依赖AUC作为区分度度量可能会错过从基线模型得出的风险预测规则在临床性能方面的重要改进。在本文中，我们通过在多元正态性假设下将AUC与基于敏感性和特异性的临床性能度量相关联来研究这一说法。将AUC的行为与区分度斜率的行为进行了对比。我们表明，除非需要具有非常高特异性的规则，否则AUC的变化作为临床性能度量变化的预测指标表现良好。然而，对于区分度良好与较差的基线模型，需要更强或更多的预测因子才能在AUC中实现相同的增量。当需要极高的特异性时，我们的结果表明，区分度斜率可能比AUC更适合作为模型改进的度量。使用弗雷明汉心脏研究中预测房颤10年发病率模型的示例说明了理论结果。

相似文献

Understanding increments in model performance metrics.理解模型性能指标的增量。

Lifetime Data Anal. 2013 Apr;19(2):202-18. doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.

Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality.在正态假设下，ROC 曲线下面积的改善与线性判别分析系数的等价性。

Stat Med. 2011 May 30;30(12):1410-8. doi: 10.1002/sim.4196. Epub 2011 Feb 21.

Interpreting incremental value of markers added to risk prediction models.解读风险预测模型中新增标志物的增量价值。

Am J Epidemiol. 2012 Sep 15;176(6):473-81. doi: 10.1093/aje/kws207. Epub 2012 Aug 8.

Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models.用于评估判别改善的新指标：适用于正态变量和嵌套模型的净重新分类和综合判别改善。

Stat Med. 2012 Jan 30;31(2):101-13. doi: 10.1002/sim.4348. Epub 2011 Dec 7.

Improved cardiovascular risk prediction using nonparametric regression and electronic health record data.使用非参数回归和电子健康记录数据改善心血管风险预测。

Med Care. 2013 Mar;51(3):251-8. doi: 10.1097/MLR.0b013e31827da594.

Small improvement in the area under the receiver operating characteristic curve indicated small changes in predicted risks.受试者工作特征曲线下面积的小幅改善表明预测风险的微小变化。

J Clin Epidemiol. 2016 Nov;79:159-164. doi: 10.1016/j.jclinepi.2016.07.002. Epub 2016 Jul 16.

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.评估新标志物的附加预测能力：从ROC曲线下面积到重新分类及其他。

Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.

Measures for evaluation of prognostic improvement under multivariate normality for nested and nonnested models.多变量正态性下嵌套和非嵌套模型预后改善评估的措施。

Stat Med. 2019 Sep 10;38(20):3817-3831. doi: 10.1002/sim.8204. Epub 2019 Jun 18.

Evaluation of polygenic risk models using multiple performance measures: a critical assessment of discordant results.使用多种性能指标评估多基因风险模型：对不一致结果的批判性评估。

Genet Med. 2019 Feb;21(2):391-397. doi: 10.1038/s41436-018-0058-9. Epub 2018 Jun 12.

Alternative performance measures for prediction models.预测模型的替代性能指标。

PLoS One. 2014 Mar 7;9(3):e91249. doi: 10.1371/journal.pone.0091249. eCollection 2014.

引用本文的文献

Prognostic accuracy of 70 individual frailty biomarkers in predicting mortality in the Canadian Longitudinal Study on Aging.70 种个体脆弱性生物标志物预测加拿大老龄化纵向研究中死亡率的预后准确性。

Geroscience. 2024 Jun;46(3):3061-3069. doi: 10.1007/s11357-023-01055-2. Epub 2024 Jan 6.

Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.使用多项逻辑回归开发多变量预测模型的最小样本量。

Stat Methods Med Res. 2023 Mar;32(3):555-571. doi: 10.1177/09622802231151220. Epub 2023 Jan 19.

Polygenic risk scores for prediction of breast cancer in Korean women.多基因风险评分在韩国女性乳腺癌预测中的应用。

Int J Epidemiol. 2023 Jun 6;52(3):796-805. doi: 10.1093/ije/dyac206.

Predictive Utility of a Validated Polygenic Risk Score for Long-Term Risk of Coronary Heart Disease in Young and Middle-Aged Adults.一种经验证的多基因风险评分对中青年长期冠心病风险的预测效用。

Circulation. 2022 Aug 23;146(8):587-596. doi: 10.1161/CIRCULATIONAHA.121.058426. Epub 2022 Jul 26.

Longitudinal validation of an electronic health record delirium prediction model applied at admission in COVID-19 patients.电子病历谵妄预测模型在 COVID-19 患者入院时的纵向验证。

Gen Hosp Psychiatry. 2022 Jan-Feb;74:9-17. doi: 10.1016/j.genhosppsych.2021.10.005. Epub 2021 Nov 2.

A computational model for GPCR-ligand interaction prediction.一种用于预测 GPCR-配体相互作用的计算模型。

J Integr Bioinform. 2020 Dec 29;18(2):155-165. doi: 10.1515/jib-2019-0084.

Effectiveness of antimicrobial prophylaxis at 30 versus 60 min before cesarean delivery.剖宫产术前 30 分钟与 60 分钟行抗菌预防的效果比较。

Sci Rep. 2021 Apr 16;11(1):8401. doi: 10.1038/s41598-021-87846-z.

Acoustic and language analysis of speech for suicidal ideation among US veterans.美国退伍军人自杀意念的语音声学和语言分析

BioData Min. 2021 Feb 2;14(1):11. doi: 10.1186/s13040-021-00245-y.

Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction.泛癌分析表明，将多基因风险评分与可改变的风险因素相结合可以提高风险预测。

Nat Commun. 2020 Nov 27;11(1):6084. doi: 10.1038/s41467-020-19600-4.

Transparent Reporting on Research Using Unstructured Electronic Health Record Data to Generate 'Real World' Evidence of Comparative Effectiveness and Safety.基于非结构化电子健康记录数据开展研究以生成比较有效性和安全性的“真实世界”证据的透明报告。

Drug Saf. 2019 Nov;42(11):1297-1309. doi: 10.1007/s40264-019-00851-0.

本文引用的文献

Misuse of DeLong test to compare AUCs for nested models.误用 Delong 检验比较嵌套模型的 AUC。

Stat Med. 2012 Oct 15;31(23):2577-87. doi: 10.1002/sim.5328. Epub 2012 Mar 13.

Stat Med. 2012 Jan 30;31(2):101-13. doi: 10.1002/sim.4348. Epub 2011 Dec 7.

Assessing the incremental value of diagnostic and prognostic markers: a review and illustration.评估诊断和预后标志物的附加价值：综述与实例。

Eur J Clin Invest. 2012 Feb;42(2):216-28. doi: 10.1111/j.1365-2362.2011.02562.x. Epub 2011 Jul 5.

Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers.将净重新分类改进计算扩展到测量新生物标志物的有用性。

Stat Med. 2011 Jan 15;30(1):11-21. doi: 10.1002/sim.4085. Epub 2010 Nov 5.

Using relative utility curves to evaluate risk prediction.使用相对效用曲线评估风险预测。

J R Stat Soc Ser A Stat Soc. 2009 Oct 1;172(4):729-748. doi: 10.1111/j.1467-985X.2009.00592.x.

Relations of biomarkers of distinct pathophysiological pathways and atrial fibrillation incidence in the community.不同病理生理途径生物标志物与社区人群心房颤动发生率的关系。

Circulation. 2010 Jan 19;121(2):200-7. doi: 10.1161/CIRCULATIONAHA.109.882241. Epub 2010 Jan 4.

Assessing the performance of prediction models: a framework for traditional and novel measures.评估预测模型的性能：传统和新型指标的框架。

Epidemiology. 2010 Jan;21(1):128-38. doi: 10.1097/EDE.0b013e3181c30fb2.

Assessment of claims of improved prediction beyond the Framingham risk score.对超越弗雷明汉姆风险评分的预测改善声明的评估。

JAMA. 2009 Dec 2;302(21):2345-52. doi: 10.1001/jama.2009.1757.

Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.

Use and misuse of the receiver operating characteristic curve in risk prediction.风险预测中受试者工作特征曲线的应用与误用

Circulation. 2007 Feb 20;115(7):928-35. doi: 10.1161/CIRCULATIONAHA.106.672402.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。