评估预测模型性能的改善情况。

Testing for improvement in prediction model performance.

机构信息

Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.

出版信息

Stat Med. 2013 Apr 30;32(9):1467-82. doi: 10.1002/sim.5727. Epub 2013 Jan 7.

DOI:10.1002/sim.5727

PMID:23296397

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3625503/

Abstract

Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance.

摘要

近年来，作者提出了新的方法来评估通过向包含一组基线预测因子 X 的风险模型中添加新的预测因子 Y 来提高对二项结局 D 的预测性能。我们从理论上证明，关于没有性能提高的零假设等同于当控制 X 时 Y 不是风险因素的简单零假设，H0：P(D=1|X,Y)=P(D=1|X)。因此，如果已经证明 Y 是一个风险因素，那么对预测性能提高的检验就是多余的。我们还通过模拟研究调查了检验的性质，重点关注 ROC 曲线下面积 (AUC) 的变化。一个意外的发现是，不调整估计回归系数变异性的标准检验程序极其保守。这可能解释了为什么 AUC 被广泛认为对预测性能的提高不敏感，并表明不敏感的问题与用于推断的无效程序有关，而不是与该措施本身有关。为了避免冗余检验和使用可能存在问题的推断方法，我们建议将无改进检验的假设限制在 Y 作为风险因素的评估，因为已经开发并广泛提供了针对该因素的方法。预测性能的度量分析应侧重于估计，而不是对性能无改进的检验。

相似文献

Testing for improvement in prediction model performance.评估预测模型性能的改善情况。

Stat Med. 2013 Apr 30;32(9):1467-82. doi: 10.1002/sim.5727. Epub 2013 Jan 7.

Estimating the capacity for improvement in risk prediction with a marker.评估利用一个标志物改善风险预测的能力。

Biostatistics. 2009 Jan;10(1):172-86. doi: 10.1093/biostatistics/kxn025. Epub 2008 Aug 19.

Misuse of DeLong test to compare AUCs for nested models.误用 Delong 检验比较嵌套模型的 AUC。

Stat Med. 2012 Oct 15;31(23):2577-87. doi: 10.1002/sim.5328. Epub 2012 Mar 13.

Inference for the difference in the area under the ROC curve derived from nested binary regression models.基于嵌套二元回归模型的ROC曲线下面积差异的推断。

Biostatistics. 2017 Apr 1;18(2):260-274. doi: 10.1093/biostatistics/kxw045.

Impact of correlation on predictive ability of biomarkers.相关性对生物标志物预测能力的影响。

Stat Med. 2013 Oct 30;32(24):4196-210. doi: 10.1002/sim.5824. Epub 2013 May 3.

Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.基于U统计量理论的ΔAUC、NRI和IDI的渐近分布。

Stat Med. 2017 Sep 20;36(21):3334-3360. doi: 10.1002/sim.7333. Epub 2017 Jun 19.

Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality.在正态假设下，ROC 曲线下面积的改善与线性判别分析系数的等价性。

Stat Med. 2011 May 30;30(12):1410-8. doi: 10.1002/sim.4196. Epub 2011 Feb 21.

Assessing risk prediction models in case-control studies using semiparametric and nonparametric methods.应用半参数和非参数方法评估病例对照研究中的风险预测模型。

Stat Med. 2010 Jun 15;29(13):1391-410. doi: 10.1002/sim.3876.

Joint modeling, covariate adjustment, and interaction: contrasting notions in risk prediction models and risk prediction performance.联合建模、协变量调整和交互作用：风险预测模型和风险预测性能中的对比概念。

Epidemiology. 2011 Nov;22(6):805-12. doi: 10.1097/EDE.0b013e31823035fb.

Interpreting incremental value of markers added to risk prediction models.解读风险预测模型中新增标志物的增量价值。

Am J Epidemiol. 2012 Sep 15;176(6):473-81. doi: 10.1093/aje/kws207. Epub 2012 Aug 8.

引用本文的文献

Improving stroke risk prediction in atrial fibrillation with circulating biomarkers: the CHADS-VASc-Biomarkers model.利用循环生物标志物改善心房颤动患者的卒中风险预测：CHADS-VASc-生物标志物模型

J Thromb Haemost. 2025 Aug 1. doi: 10.1016/j.jtha.2025.06.007.

A Weighted Survival Regression Framework for Incorporating External Prediction Information.一种用于纳入外部预测信息的加权生存回归框架。

J Stat Theory Pract. 2025;19(4):61. doi: 10.1007/s42519-025-00471-1. Epub 2025 Jul 25.

Radiomic Parenchymal Phenotypes of Breast Texture from Mammography and Association with Risk of Breast Cancer.乳腺钼靶检查中乳腺纹理的影像组学实质表型及其与乳腺癌风险的关联

Radiology. 2025 May;315(2):e240281. doi: 10.1148/radiol.240281.

Improving the Estimation of Prediction Increment Measures in Logistic and Survival Analysis.改进逻辑回归和生存分析中预测增量指标的估计

Cancers (Basel). 2025 Apr 8;17(8):1259. doi: 10.3390/cancers17081259.

Plasma proteomic signatures for type 2 diabetes and related traits in the UK Biobank cohort.英国生物银行队列中2型糖尿病及相关特征的血浆蛋白质组学特征

Diabetes Res Clin Pract. 2025 Jun;224:112194. doi: 10.1016/j.diabres.2025.112194. Epub 2025 Apr 22.

Visualizing a marker's degrees of necessity and of sufficiency in the predictiveness curve.在预测性曲线中可视化一个标志物的必要性程度和充分性程度。

BMC Med Res Methodol. 2025 Apr 23;25(1):107. doi: 10.1186/s12874-025-02544-y.

EMI-LTI: An enhanced integrated model for lung tumor identification using Gabor filter and ROI.EMI-LTI：一种使用Gabor滤波器和感兴趣区域进行肺肿瘤识别的增强集成模型。

MethodsX. 2025 Feb 27;14:103247. doi: 10.1016/j.mex.2025.103247. eCollection 2025 Jun.

Hypothesis: Net benefit as an objective function during development of machine learning algorithms for medical applications.假设：在开发用于医学应用的机器学习算法过程中，将净效益作为目标函数。

Int J Med Inform. 2025 May;197:105844. doi: 10.1016/j.ijmedinf.2025.105844. Epub 2025 Feb 23.

Intratumoral and peritumoral radiomics for forecasting microsatellite status in gastric cancer: a multicenter study.用于预测胃癌微卫星状态的瘤内和瘤周放射组学：一项多中心研究

BMC Cancer. 2025 Jan 11;25(1):66. doi: 10.1186/s12885-025-13450-3.

Early detection of pancreatic cancer: Study design and analytical considerations in biomarker discovery and early phase validation studies.胰腺癌的早期检测：生物标志物发现及早期验证研究中的研究设计与分析考量

Pancreatology. 2024 Dec;24(8):1265-1279. doi: 10.1016/j.pan.2024.10.012. Epub 2024 Oct 29.

本文引用的文献

Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve.使用协变量调整后的受试者工作特征曲线来调整协变量对分类准确性的影响。

Biometrika. 2009 Jun;96(2):371-382. doi: 10.1093/biomet/asp002. Epub 2009 Apr 1.

Misuse of DeLong test to compare AUCs for nested models.误用 Delong 检验比较嵌套模型的 AUC。

Stat Med. 2012 Oct 15;31(23):2577-87. doi: 10.1002/sim.5328. Epub 2012 Mar 13.

A framework for quantifying net benefits of alternative prognostic models.用于量化替代预后模型净收益的框架。

Stat Med. 2012 Jan 30;31(2):114-30. doi: 10.1002/sim.4362. Epub 2011 Sep 9.

Evaluating the incremental value of new biomarkers with integrated discrimination improvement.评估具有综合判别改善的新生物标志物的增量价值。

Am J Epidemiol. 2011 Aug 1;174(3):364-74. doi: 10.1093/aje/kwr086. Epub 2011 Jun 14.

Problems with risk reclassification methods for evaluating prediction models.评估预测模型的风险重新分类方法存在问题。

Am J Epidemiol. 2011 Jun 1;173(11):1327-35. doi: 10.1093/aje/kwr013. Epub 2011 May 9.

Performance of reclassification statistics in comparing risk prediction models.比较风险预测模型时重新分类统计的性能

Biom J. 2011 Mar;53(2):237-58. doi: 10.1002/bimj.201000078. Epub 2011 Feb 3.

One statistical test is sufficient for assessing new predictive markers.一项统计检验足以评估新的预测标志物。

BMC Med Res Methodol. 2011 Jan 28;11:13. doi: 10.1186/1471-2288-11-13.

Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers.将净重新分类改进计算扩展到测量新生物标志物的有用性。

Stat Med. 2011 Jan 15;30(1):11-21. doi: 10.1002/sim.4085. Epub 2010 Nov 5.

Two criteria for evaluating risk prediction models.评估风险预测模型的两个标准。

Biometrics. 2011 Sep;67(3):1057-65. doi: 10.1111/j.1541-0420.2010.01523.x. Epub 2010 Dec 14.

Performance of common genetic variants in breast-cancer risk models.常见遗传变异在乳腺癌风险模型中的表现。

N Engl J Med. 2010 Mar 18;362(11):986-93. doi: 10.1056/NEJMoa0907727.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验