Suppr超能文献

针对误分类二元结局的校正ROC分析。

Corrected ROC analysis for misclassified binary outcomes.

作者信息

Zawistowski Matthew, Sussman Jeremy B, Hofer Timothy P, Bentley Douglas, Hayward Rodney A, Wiitala Wyndy L

机构信息

Veterans Affairs Center for Clinical Management Research, Ann Arbor, 48105, MI, U.S.A.

Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, U.S.A.

出版信息

Stat Med. 2017 Jun 15;36(13):2148-2160. doi: 10.1002/sim.7260. Epub 2017 Feb 28.

Abstract

Creating accurate risk prediction models from Big Data resources such as Electronic Health Records (EHRs) is a critical step toward achieving precision medicine. A major challenge in developing these tools is accounting for imperfect aspects of EHR data, particularly the potential for misclassified outcomes. Misclassification, the swapping of case and control outcome labels, is well known to bias effect size estimates for regression prediction models. In this paper, we study the effect of misclassification on accuracy assessment for risk prediction models and find that it leads to bias in the area under the curve (AUC) metric from standard ROC analysis. The extent of the bias is determined by the false positive and false negative misclassification rates as well as disease prevalence. Notably, we show that simply correcting for misclassification while building the prediction model is not sufficient to remove the bias in AUC. We therefore introduce an intuitive misclassification-adjusted ROC procedure that accounts for uncertainty in observed outcomes and produces bias-corrected estimates of the true AUC. The method requires that misclassification rates are either known or can be estimated, quantities typically required for the modeling step. The computational simplicity of our method is a key advantage, making it ideal for efficiently comparing multiple prediction models on very large datasets. Finally, we apply the correction method to a hospitalization prediction model from a cohort of over 1 million patients from the Veterans Health Administrations EHR. Implementations of the ROC correction are provided for Stata and R. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

摘要

从电子健康记录(EHR)等大数据资源中创建准确的风险预测模型是迈向精准医疗的关键一步。开发这些工具的一个主要挑战是考虑EHR数据的不完美之处,特别是结果误分类的可能性。误分类,即病例和对照结果标签的互换,众所周知会使回归预测模型的效应量估计产生偏差。在本文中,我们研究了误分类对风险预测模型准确性评估的影响,发现它会导致标准ROC分析中曲线下面积(AUC)指标出现偏差。偏差程度由假阳性和假阴性误分类率以及疾病患病率决定。值得注意的是,我们表明在构建预测模型时简单地校正误分类不足以消除AUC中的偏差。因此,我们引入了一种直观的误分类调整ROC程序,该程序考虑了观察到的结果中的不确定性,并产生了对真实AUC的偏差校正估计。该方法要求误分类率是已知的或可以估计的,这是建模步骤通常所需的量。我们方法的计算简单性是一个关键优势,使其非常适合在非常大的数据集上高效比较多个预测模型。最后,我们将校正方法应用于来自退伍军人健康管理局EHR的100多万患者队列的住院预测模型。提供了针对Stata和R的ROC校正实现。2017年发表。本文是美国政府工作,在美国属于公共领域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验