一项统计检验足以评估新的预测标志物。

One statistical test is sufficient for assessing new predictive markers.

机构信息

Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, Box 44, New York, NY 10065 USA.

出版信息

BMC Med Res Methodol. 2011 Jan 28;11:13. doi: 10.1186/1471-2288-11-13.

DOI:10.1186/1471-2288-11-13

PMID:21276237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3042425/

Abstract

BACKGROUND

We have observed that the area under the receiver operating characteristic curve (AUC) is increasingly being used to evaluate whether a novel predictor should be incorporated in a multivariable model to predict risk of disease. Frequently, investigators will approach the issue in two distinct stages: first, by testing whether the new predictor variable is significant in a multivariable regression model; second, by testing differences between the AUC of models with and without the predictor using the same data from which the predictive models were derived. These two steps often lead to discordant conclusions.

DISCUSSION

We conducted a simulation study in which two predictors, X and X*, were generated as standard normal variables with varying levels of predictive strength, represented by means that differed depending on the binary outcome Y. The data sets were analyzed using logistic regression, and likelihood ratio and Wald tests for the incremental contribution of X* were performed. The patient-specific predictors for each of the models were then used as data for a test comparing the two AUCs. Under the null, the size of the likelihood ratio and Wald tests were close to nominal, but the area test was extremely conservative, with test sizes less than 0.006 for all configurations studied. Where X* was associated with outcome, the area test had much lower power than the likelihood ratio and Wald tests.

SUMMARY

Evaluation of the statistical significance of a new predictor when there are existing clinical predictors is most appropriately accomplished in the context of a regression model. Although comparison of AUCs is a conceptually equivalent approach to the likelihood ratio and Wald test, it has vastly inferior statistical properties. Use of both approaches will frequently lead to inconsistent conclusions. Nonetheless, comparison of receiver operating characteristic curves remains a useful descriptive tool for initial evaluation of whether a new predictor might be of clinical relevance.

摘要

背景

我们已经观察到，接收器工作特性曲线（AUC）下的面积越来越多地被用于评估新的预测因子是否应该纳入多变量模型以预测疾病风险。通常，研究人员会分两个阶段来解决这个问题：首先，通过检验新预测变量在多变量回归模型中的显著性；其次，使用从预测模型得出的数据来检验有无预测因子的模型的 AUC 之间的差异。这两个步骤常常导致不一致的结论。

讨论

我们进行了一项模拟研究，其中两个预测因子 X 和 X* 作为标准正态变量生成，具有不同的预测强度水平，其均值取决于二项式结果 Y。使用逻辑回归分析数据集，并对 X* 的增量贡献进行似然比和 Wald 检验。然后，将每个模型的患者特定预测因子用作比较两个 AUC 的检验数据。在零假设下，似然比和 Wald 检验的大小接近名义值，但面积检验非常保守，在所有研究的配置中，检验大小均小于 0.006。当 X*与结果相关时，面积检验的功效远低于似然比和 Wald 检验。

总结

当存在现有临床预测因子时，评估新预测因子的统计显著性最适合在回归模型的背景下进行。尽管比较 AUC 是似然比和 Wald 检验的概念上等效方法，但它具有较差的统计性质。两种方法的使用通常会导致不一致的结论。尽管如此，比较接收器工作特性曲线仍然是评估新预测因子是否具有临床相关性的有用描述性工具。

相似文献

One statistical test is sufficient for assessing new predictive markers.一项统计检验足以评估新的预测标志物。

BMC Med Res Methodol. 2011 Jan 28;11:13. doi: 10.1186/1471-2288-11-13.

On the assessment of the added value of new predictive biomarkers.关于新预测生物标志物的附加值评估。

BMC Med Res Methodol. 2013 Jul 29;13:98. doi: 10.1186/1471-2288-13-98.

Comparing ROC curves derived from regression models.比较回归模型得出的 ROC 曲线。

Stat Med. 2013 Apr 30;32(9):1483-93. doi: 10.1002/sim.5648. Epub 2012 Oct 3.

Misuse of DeLong test to compare AUCs for nested models.误用 Delong 检验比较嵌套模型的 AUC。

Stat Med. 2012 Oct 15;31(23):2577-87. doi: 10.1002/sim.5328. Epub 2012 Mar 13.

A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size.小样本量连续诊断试验中ROC曲线下面积的置信/可信区间方法比较

Stat Methods Med Res. 2017 Dec;26(6):2603-2621. doi: 10.1177/0962280215602040. Epub 2015 Aug 30.

A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies.诊断性病例对照研究中 ROC 曲线下面积（AUC）的修正 Wald 区间。

BMC Med Res Methodol. 2014 Feb 19;14:26. doi: 10.1186/1471-2288-14-26.

Estimating the Area Under ROC Curve When the Fitted Binormal Curves Demonstrate Improper Shape.当拟合的双正态曲线呈现不合适的形状时估计ROC曲线下的面积。

Acad Radiol. 2017 Feb;24(2):209-219. doi: 10.1016/j.acra.2016.09.020. Epub 2016 Nov 21.

A semiparametric method for comparing the discriminatory ability of biomarkers subject to limit of detection.一种用于比较受检测限影响的生物标志物鉴别能力的半参数方法。

Stat Med. 2017 Nov 20;36(26):4141-4152. doi: 10.1002/sim.7415. Epub 2017 Jul 25.

Sufficient dimension reduction for longitudinally measured predictors.具有充分维数缩减的纵向测量预测因子。

Stat Med. 2012 Sep 28;31(22):2414-27. doi: 10.1002/sim.4437. Epub 2011 Dec 9.

Comparison of Paired ROC Curves through a Two-Stage Test.通过两阶段检验比较配对ROC曲线

J Biopharm Stat. 2015;25(5):881-902. doi: 10.1080/10543406.2014.920874. Epub 2014 Jun 6.

引用本文的文献

A Comprehensive Assessment of Plasma CXCL9 and CXCL10 in Improving Clinical Prediction Models for Kidney Allograft Rejection.血浆CXCL9和CXCL10在改善肾移植排斥反应临床预测模型中的综合评估

Clin Transplant. 2025 Sep;39(9):e70299. doi: 10.1111/ctr.70299.

DCNN models with post-hoc interpretability for the automated detection of glossitis and OSCC on the tongue.具有事后可解释性的深度卷积神经网络模型用于舌部舌炎和口腔鳞状细胞癌的自动检测。

Sci Rep. 2025 Aug 29;15(1):31940. doi: 10.1038/s41598-025-16760-5.

Prediction of acute kidney injury in the immediate postoperative period following liver resection: a retrospective cohort study.肝切除术后即刻急性肾损伤的预测：一项回顾性队列研究。

Can J Anaesth. 2025 Jul 14. doi: 10.1007/s12630-025-02996-2.

Adjusting for covariates representing potential confounders, mediators, or competing predictors in the presence of measurement error: Dispelling a potential misapprehension and insights for optimal study design with nutritional epidemiology examples.在存在测量误差的情况下，对代表潜在混杂因素、中介因素或竞争预测因素的协变量进行调整：消除一种潜在误解，并以营养流行病学实例阐述优化研究设计的见解。

F1000Res. 2025 May 19;13:827. doi: 10.12688/f1000research.152466.2. eCollection 2024.

Association between the preoperative N-terminal pro-B-type natriuretic peptide and acute kidney injury in gastrointestinal surgery patients managed with enhanced recovery strategy: a retrospective cohort study.采用加速康复策略管理的胃肠手术患者术前N末端B型利钠肽前体与急性肾损伤的相关性：一项回顾性队列研究

Perioper Med (Lond). 2025 Apr 22;14(1):45. doi: 10.1186/s13741-025-00528-6.

Genomic and Developmental Models to Predict Cognitive and Adaptive Outcomes in Autistic Children.用于预测自闭症儿童认知和适应性结果的基因组与发育模型

JAMA Pediatr. 2025 Apr 21. doi: 10.1001/jamapediatrics.2025.0205.

Intratumoral and peritumoral radiomics for forecasting microsatellite status in gastric cancer: a multicenter study.用于预测胃癌微卫星状态的瘤内和瘤周放射组学：一项多中心研究

BMC Cancer. 2025 Jan 11;25(1):66. doi: 10.1186/s12885-025-13450-3.

Predicting recovery in patients with mild traumatic brain injury and a normal CT using serum biomarkers and diffusion tensor imaging (CENTER-TBI): an observational cohort study.利用血清生物标志物和弥散张量成像预测轻度创伤性脑损伤且CT正常患者的恢复情况（CENTER-TBI）：一项观察性队列研究

EClinicalMedicine. 2024 Aug 8;75:102751. doi: 10.1016/j.eclinm.2024.102751. eCollection 2024 Sep.

A multi-center, multi-organ, multi-omic prediction model for treatment-induced severe oral mucositis in nasopharyngeal carcinoma.一种用于预测鼻咽癌治疗引起的严重口腔黏膜炎的多中心、多器官、多组学预测模型。

Radiol Med. 2025 Feb;130(2):161-178. doi: 10.1007/s11547-024-01901-z. Epub 2024 Nov 21.

Development of an Extended Cardiovascular SOFA Score Component Reflecting Cardiac Dysfunction with Improved Survival Prediction in Sepsis: An Exploratory Analysis in the Sepsis and Elevated Troponin (SET) Study.开发一个反映心脏功能障碍的扩展心血管序贯器官衰竭评估（SOFA）评分组件，以改善脓毒症患者生存预测：脓毒症与肌钙蛋白升高（SET）研究中的探索性分析

J Intensive Care Med. 2025 Mar;40(3):320-330. doi: 10.1177/08850666241282294. Epub 2024 Oct 1.

本文引用的文献

Prostate-specific antigen (PSA) isoform p2PSA in combination with total PSA and free PSA improves diagnostic accuracy in prostate cancer detection.前列腺特异性抗原（PSA）同工型 p2PSA 与总 PSA 和游离 PSA 联合使用可提高前列腺癌检测的诊断准确性。

Eur Urol. 2010 Jun;57(6):921-7. doi: 10.1016/j.eururo.2010.02.003. Epub 2010 Feb 13.

Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: towards a decision analytic framework.传统的评估预测模型的统计学方法在临床价值方面缺乏信息：迈向决策分析框架。

Semin Oncol. 2010 Feb;37(1):31-8. doi: 10.1053/j.seminoncol.2009.12.004.

Using relative utility curves to evaluate risk prediction.使用相对效用曲线评估风险预测。

J R Stat Soc Ser A Stat Soc. 2009 Oct 1;172(4):729-748. doi: 10.1111/j.1467-985X.2009.00592.x.

Assessing the performance of prediction models: a framework for traditional and novel measures.评估预测模型的性能：传统和新型指标的框架。

Epidemiology. 2010 Jan;21(1):128-38. doi: 10.1097/EDE.0b013e3181c30fb2.

Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association.心血管风险新标志物评估标准：美国心脏协会的科学声明

Circulation. 2009 May 5;119(17):2408-16. doi: 10.1161/CIRCULATIONAHA.109.192278. Epub 2009 Apr 13.

Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers.决策曲线分析的扩展，一种评估诊断试验、预测模型和分子标志物的新方法。

BMC Med Inform Decis Mak. 2008 Nov 26;8:53. doi: 10.1186/1472-6947-8-53.

Early invasive cervical cancer: MRI and CT predictors of lymphatic metastases in the ACRIN 6651/GOG 183 intergroup study.早期浸润性宫颈癌：ACRIN 6651/GOG 183组间研究中淋巴结转移的MRI和CT预测因素

Gynecol Oncol. 2009 Jan;112(1):95-103. doi: 10.1016/j.ygyno.2008.10.005. Epub 2008 Nov 20.

The relationship between preoperative prostate-specific antigen and biopsy Gleason sum in men undergoing radical retropubic prostatectomy: a novel assessment of traditional predictors of outcome.耻骨后根治性前列腺切除术患者术前前列腺特异性抗原与活检Gleason评分之和的关系：对传统预后预测指标的新评估

BJU Int. 2009 Jan;103(1):38-42. doi: 10.1111/j.1464-410X.2008.07952.x. Epub 2008 Sep 3.

Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve.预后模型与诊断模型的统计学评估：超越ROC曲线

Clin Chem. 2008 Jan;54(1):17-23. doi: 10.1373/clinchem.2007.096529. Epub 2007 Nov 16.

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.评估新标志物的附加预测能力：从ROC曲线下面积到重新分类及其他。

Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验