Assel Melissa, Sjoberg Daniel D, Vickers Andrew J
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.
Diagn Progn Res. 2017 Dec 2;1:19. doi: 10.1186/s41512-017-0020-3. eCollection 2017.
A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence.
We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions.
In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model.
Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.
Not applicable.
已提出多种统计方法作为工具,以帮助研究人员评估诊断试验或预测模型的价值。推荐使用Brier评分,因为它是一种恰当的评分规则,受区分度和校准的影响。然而,Brier评分依赖于患病率,以至于试验或模型的排序可能会因患病率而不适当地变化。
我们探讨了四种常见的临床情况:将高度准确的二元试验与预测性中等的连续预测模型进行比较;比较两种二元试验,其中敏感性与特异性的重要性与患病率呈负相关;将模型和试验与假设所有患者或无患者为阳性的默认策略进行比较;比较两个校准方向相反的模型。
在每种情况下,我们发现Brier评分对试验和模型的排序都不合适。相反,净效益作为一种决策分析指标,其结果总是有利于更优的试验或模型。
Brier评分不能评估诊断试验或预测模型的临床价值。作为替代方法,我们提倡使用净效益等决策分析指标。
不适用。