Pencina Michael J, D'Agostino Ralph B, D'Agostino Ralph B, Vasan Ramachandran S
Department of Mathematics and Statistics, Framingham Heart Study, Boston University, Boston, MA 02215, USA.
Stat Med. 2008 Jan 30;27(2):157-72; discussion 207-12. doi: 10.1002/sim.2929.
Identification of key factors associated with the risk of developing cardiovascular disease and quantification of this risk using multivariable prediction algorithms are among the major advances made in preventive cardiology and cardiovascular epidemiology in the 20th century. The ongoing discovery of new risk markers by scientists presents opportunities and challenges for statisticians and clinicians to evaluate these biomarkers and to develop new risk formulations that incorporate them. One of the key questions is how best to assess and quantify the improvement in risk prediction offered by these new models. Demonstration of a statistically significant association of a new biomarker with cardiovascular risk is not enough. Some researchers have advanced that the improvement in the area under the receiver-operating-characteristic curve (AUC) should be the main criterion, whereas others argue that better measures of performance of prediction models are needed. In this paper, we address this question by introducing two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables. These new measures offer incremental information over the AUC. We discuss the properties of these new measures and contrast them with the AUC. We also develop simple asymptotic tests of significance. We illustrate the use of these measures with an example from the Framingham Heart Study. We propose that scientists consider these types of measures in addition to the AUC when assessing the performance of newer biomarkers.
识别与心血管疾病发生风险相关的关键因素,并使用多变量预测算法对该风险进行量化,是20世纪预防心脏病学和心血管流行病学取得的主要进展之一。科学家们不断发现新的风险标志物,这为统计学家和临床医生评估这些生物标志物以及开发包含它们的新风险公式带来了机遇和挑战。关键问题之一是如何最好地评估和量化这些新模型在风险预测方面的改进。仅仅证明一种新的生物标志物与心血管风险存在统计学上的显著关联是不够的。一些研究人员提出,受试者工作特征曲线下面积(AUC)的改善应作为主要标准,而另一些人则认为需要更好的预测模型性能衡量指标。在本文中,我们通过引入两种新的衡量指标来解决这个问题,一种基于综合灵敏度和特异性,另一种基于重新分类表。这些新指标比AUC提供了更多的信息。我们讨论了这些新指标的特性,并将它们与AUC进行对比。我们还开发了简单的显著性渐近检验。我们用弗明汉心脏研究的一个例子说明了这些指标的使用。我们建议科学家们在评估更新的生物标志物性能时,除了考虑AUC之外,还应考虑这类指标。