Department of Biostatistics, Boston University, Boston, MA, USA.
Stat Med. 2012 Jul 10;31(15):1543-53. doi: 10.1002/sim.4508. Epub 2012 Feb 17.
Cardiovascular risk prediction functions offer an important diagnostic tool for clinicians and patients themselves. They are usually constructed with the use of parametric or semi-parametric survival regression models. It is essential to be able to evaluate the performance of these models, preferably with summaries that offer natural and intuitive interpretations. The concept of discrimination, popular in the logistic regression context, has been extended to survival analysis. However, the extension is not unique. In this paper, we define discrimination in survival analysis as the model's ability to separate those with longer event-free survival from those with shorter event-free survival within some time horizon of interest. This definition remains consistent with that used in logistic regression, in the sense that it assesses how well the model-based predictions match the observed data. Practical and conceptual examples and numerical simulations are employed to examine four C statistics proposed in the literature to evaluate the performance of survival models. We observe that they differ in the numerical values and aspects of discrimination that they capture. We conclude that the index proposed by Harrell is the most appropriate to capture discrimination described by the above definition. We suggest researchers report which C statistic they are using, provide a rationale for their selection, and be aware that comparing different indices across studies may not be meaningful.
心血管风险预测函数为临床医生和患者自身提供了重要的诊断工具。它们通常使用参数或半参数生存回归模型构建。评估这些模型的性能至关重要,最好使用提供自然和直观解释的摘要。在逻辑回归背景下流行的判别概念已扩展到生存分析中。然而,这种扩展并不唯一。在本文中,我们将生存分析中的判别定义为模型在某个感兴趣的时间范围内将具有更长无事件生存的个体与具有较短无事件生存的个体区分开来的能力。从评估模型预测与观测数据匹配程度的角度来看,这种定义与逻辑回归中使用的定义保持一致。我们使用实际和概念示例以及数值模拟来检查文献中提出的四个用于评估生存模型性能的 C 统计量。我们观察到它们在捕捉数值和判别方面存在差异。我们得出的结论是,哈雷尔提出的指数最适合捕捉上述定义中描述的判别。我们建议研究人员报告他们正在使用的 C 统计量,为他们的选择提供理由,并意识到在不同研究之间比较不同的指标可能没有意义。