Department of Development and Regeneration, KU Leuven, Leuven, Belgium.
Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.
Stat Methods Med Res. 2020 Nov;29(11):3166-3178. doi: 10.1177/0962280220921415. Epub 2020 May 13.
When developing risk prediction models on datasets with limited sample size, shrinkage methods are recommended. Earlier studies showed that shrinkage results in better predictive performance on average. This simulation study aimed to investigate the variability of regression shrinkage on predictive performance for a binary outcome. We compared standard maximum likelihood with the following shrinkage methods: uniform shrinkage (likelihood-based and bootstrap-based), penalized maximum likelihood (ridge) methods, LASSO logistic regression, adaptive LASSO, and Firth's correction. In the simulation study, we varied the number of predictors and their strength, the correlation between predictors, the event rate of the outcome, and the events per variable. In terms of results, we focused on the calibration slope. The slope indicates whether risk predictions are too extreme (slope < 1) or not extreme enough (slope > 1). The results can be summarized into three main findings. First, shrinkage improved calibration slopes on average. Second, the between-sample variability of calibration slopes was often increased relative to maximum likelihood. In contrast to other shrinkage approaches, Firth's correction had a small shrinkage effect but showed low variability. Third, the correlation between the estimated shrinkage and the optimal shrinkage to remove overfitting was typically negative, with Firth's correction as the exception. We conclude that, despite improved performance on average, shrinkage often worked poorly in individual datasets, in particular when it was most needed. The results imply that shrinkage methods do not solve problems associated with small sample size or low number of events per variable.
当在样本量有限的数据集上开发风险预测模型时,建议使用收缩方法。早期的研究表明,收缩平均会提高预测性能。本模拟研究旨在研究二元结果的预测性能上回归收缩的可变性。我们将标准最大似然与以下收缩方法进行了比较:均匀收缩(基于似然和基于引导的)、惩罚最大似然(岭)方法、LASSO 逻辑回归、自适应 LASSO 和 Firth 校正。在模拟研究中,我们改变了预测因子的数量及其强度、预测因子之间的相关性、结果的事件发生率和变量的事件数。在结果方面,我们主要关注校准斜率。斜率表示风险预测是否过于极端(斜率 < 1)或不够极端(斜率 > 1)。结果可以总结为三个主要发现。首先,收缩平均提高了校准斜率。其次,与最大似然相比,校准斜率的样本间可变性通常增加。与其他收缩方法不同,Firth 校正的收缩效果较小,但变异性较低。第三,估计的收缩与去除过拟合的最佳收缩之间的相关性通常为负,Firth 校正则是例外。我们得出结论,尽管平均性能有所提高,但收缩方法在单个数据集上的表现通常不佳,尤其是在最需要时。结果表明,收缩方法并不能解决样本量小或每个变量的事件数少的问题。