1 Research Institute for Primary Care and Health Sciences, Keele University, Staffordshire, UK.
2 Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
Stat Methods Med Res. 2018 Nov;27(11):3505-3522. doi: 10.1177/0962280217705678. Epub 2017 May 8.
If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
如果可以从多个研究或群组获得个体参与者数据,那么可以多次对预测模型进行外部验证。这允许在不同的环境中检查模型的区分度和校准性能。然后,可以使用随机效应荟萃分析来量化总体(平均)性能和性能的异质性。这通常假设研究之间“真实”性能呈正态分布。我们进行了一项模拟研究,以检查与逻辑回归预测模型相关的各种性能指标的正态性假设。我们在多个研究中模拟了数据,这些研究具有不同程度的基线风险或预测因子效应的可变性,然后评估了 C 统计量、校准斜率、大校准和 E/O 统计量以及可能的转换的研究间分布的形状。我们发现,对于校准斜率和大校准,研究之间的正态分布通常是合理的;然而,C 统计量和 E/O 的分布在研究之间往往是偏态的,特别是在预测因子效应变化较大的情况下。当对 C 统计量使用对数变换,对 E/O 使用对数变换时,正态性得到了极大的改善,因此我们建议在荟萃分析中使用这些尺度。通过对 25 家普通实践中 QRISK2 的性能进行随机效应荟萃分析,给出了一个说明性示例。