Division of Biostatistics, German Cancer Research Centre, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany.
Stat Med. 2010 Mar 30;29(7-8):818-29. doi: 10.1002/sim.3768.
As part of the validation of any statistical model, it is a good statistical practice to quantify the prediction accuracy and the amount of prognostic information represented by the model; this includes gene expression signatures derived from high-dimensional microarray data. Several approaches exist for right-censored survival data measuring the gain in prognostic information compared with established clinical parameters or biomarkers in terms of explained variation or explained randomness. They are either model-based or use estimates of prediction accuracy.As these measures differ in their underlying mechanisms, they vary in their interpretation, assumptions and properties, in particular in how they deal with the presence of censoring. It remains unclear, under what conditions and to what extent they are comparable. We present a comparison of several common measures and illustrate their behaviour in high-dimensional situations in simulation examples as well as in applications to real gene expression microarray data sets. An overview of available software implementations in R is given.
作为任何统计模型验证的一部分,量化模型的预测准确性和表示的预后信息的数量是一种良好的统计实践;这包括来自高维微阵列数据的基因表达特征。有几种方法可用于右删失生存数据,根据解释的变化或解释的随机性,以与既定的临床参数或生物标志物相比,衡量预后信息的增益。它们要么基于模型,要么使用预测准确性的估计值。由于这些措施在其潜在机制上有所不同,因此在解释、假设和特性方面存在差异,特别是在如何处理删失的存在方面。在什么条件下以及在何种程度上它们可以进行比较,仍然不清楚。我们比较了几种常见的措施,并在模拟示例以及对真实基因表达微阵列数据集的应用中说明了它们在高维情况下的行为。还给出了 R 中可用软件实现的概述。