Suppr超能文献

关于新预测生物标志物的附加值评估。

On the assessment of the added value of new predictive biomarkers.

机构信息

Division of Imaging and Applied Mathematics, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD 20993, USA.

出版信息

BMC Med Res Methodol. 2013 Jul 29;13:98. doi: 10.1186/1471-2288-13-98.

Abstract

BACKGROUND

The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC "has vastly inferior statistical properties," i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests.

DISCUSSION

We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper.

SUMMARY

We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.

摘要

背景

生物标志物的开发热潮要求对统计评估方法进行研究,以严格评估新兴生物标志物和分类模型。最近,有几位作者报告了一个令人费解的观察结果,即在逻辑回归模型中评估新生物标志物对现有生物标志物的附加价值时,新预测变量的统计学意义不一定转化为ROC 曲线下面积(AUC)的统计学显著增加。Vickers 等人得出的结论是,这种不一致性是因为 AUC“具有极差的统计特性”,即它极其保守。这一说法是基于对 DeLong 等人方法的错误使用的模拟。我们的目的是提供似然比(LR)检验和 Wald 检验与诊断准确性(AUC)检验的公平比较。

讨论

我们提出了一种通过 F 检验比较嵌套线性判别函数理想 AUC 的检验方法。我们将其与逻辑回归模型的 LR 检验和 Wald 检验进行了比较。这三个检验的零假设是等效的;然而,F 检验是一个精确检验,而 LR 检验和 Wald 检验是渐近检验。我们的模拟表明,即使样本量较小,F 检验也具有名义第一类错误。我们的结果还表明,当样本量较小时,LR 检验和 Wald 检验的第一类错误会膨胀,而随着样本量的增加,第一类错误会渐近收敛到名义值,这符合预期。我们进一步表明,当 DeLong 等人的方法在其设计范围内使用时,它会检验一个不同的假设,并且具有名义第一类错误。最后,我们总结了本文中我们考虑的所有四种方法的优缺点。

总结

我们表明,对于显示新生物标志物的有用性或描述分类模型的性能,ROC 分析并没有内在的力量不足或令人不快。每种用于评估生物标志物和分类模型的统计方法都有其自身的优点和缺点。研究人员需要根据评估目的、评估所处的生物标志物开发阶段、可用的患者数据以及方法背后的有效性假设来选择方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a69d/3733611/ee5cdee02e99/1471-2288-13-98-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验