Pencina Michael J, D'Agostino Ralph B, Massaro Joseph M
Department of Biostatistics, Harvard Clinical Research Institute, Boston University, CrossTown, 801 Massachusetts Ave., Boston, MA 02118, USA.
Lifetime Data Anal. 2013 Apr;19(2):202-18. doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.
The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.
受试者工作特征曲线(AUC)下的面积是二元结局预测模型中最常报告的区分度度量。然而,最近它受到了批评,因为当将重要风险因素添加到具有良好区分度的基线模型中时,它无法增加。这导致有人声称,依赖AUC作为区分度度量可能会错过从基线模型得出的风险预测规则在临床性能方面的重要改进。在本文中,我们通过在多元正态性假设下将AUC与基于敏感性和特异性的临床性能度量相关联来研究这一说法。将AUC的行为与区分度斜率的行为进行了对比。我们表明,除非需要具有非常高特异性的规则,否则AUC的变化作为临床性能度量变化的预测指标表现良好。然而,对于区分度良好与较差的基线模型,需要更强或更多的预测因子才能在AUC中实现相同的增量。当需要极高的特异性时,我们的结果表明,区分度斜率可能比AUC更适合作为模型改进的度量。使用弗雷明汉心脏研究中预测房颤10年发病率模型的示例说明了理论结果。