Gerds Thomas A, van de Wiel Mark A
Biom J. 2011 Mar;53(2):259-74. doi: 10.1002/bimj.201000157. Epub 2011 Feb 17.
In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation, then rival strategies can still be compared based on repeated bootstraps of the same data. Often, however, the overall performance of rival strategies is similar and it is thus difficult to decide for one model. Here, we investigate the variability of the prediction models that results when the same modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same repeated bootstraps. A new decomposition of the expected Brier score is obtained, as well as the estimates of population average confidence scores. The latter can be used to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer studies, also with high-dimensional predictor space.
在医学统计学中,有许多可供选择的策略可用于基于训练数据构建预测模型。预测模型通常通过其在独立验证数据中的预测性能进行比较。如果只有一个数据集可用于训练和验证,那么仍然可以基于对同一数据的重复自助法来比较竞争策略。然而,通常情况下,竞争策略的整体性能相似,因此很难确定采用哪种模型。在此,我们研究了将相同建模策略应用于不同训练集时预测模型的变异性。对于每种建模策略,我们基于相同的重复自助法估计一个置信分数。得到了预期布里尔分数的一种新分解,以及总体平均置信分数的估计值。后者可用于区分具有相似预测性能的竞争预测模型。此外,在个体层面,置信分数可能为那些希望基于预测风险做出医疗决策的新患者提供有用的补充信息。我们使用癌症研究的数据(包括高维预测变量空间的数据)对这些想法进行了说明和讨论。