Musoro Jammbe Z, Zwinderman Aeilko H, Puhan Milo A, ter Riet Gerben, Geskus Ronald B
Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 Amsterdam, the Netherlands.
BMC Med Res Methodol. 2014 Oct 16;14:116. doi: 10.1186/1471-2288-14-116.
In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data.
The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI.
The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.
Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.
在预后研究中,套索技术很有吸引力,因为与基于无惩罚最大似然估计拟合的模型相比,它通过收缩回归系数提高了预测质量。由于一些系数被设置为零,还实现了简约性。尚不清楚使用套索拟合的模型性能是否仍表现出一定的乐观性。有人主张使用自助法来量化乐观性并将模型性能推广到新的研究对象。尚不清楚在存在多重填补数据的情况下应如何进行重采样。
数据基于一组慢性阻塞性肺疾病患者。我们构建了模型来预测6个月后的慢性呼吸问卷呼吸困难情况。通过比较在自助程序中处理多重填补数据的4种方法,利用研究数据和模拟数据集,研究套索模型的乐观性。在前3种方法中,对通过多重填补(MI)完成的数据集进行重采样,而第4种方法对不完整数据集进行重采样,然后进行MI。
套索的判别模型性能是乐观的。由于过度收缩,存在校准欠佳的情况。乐观性估计对自助重采样程序中处理填补数据的选择很敏感。对完成的数据集进行重采样会低估乐观性,尤其是在一个自助步骤中,如果所选个体在填补数据集之间存在差异。在验证中纳入MI程序会得出更接近真实值的乐观性估计,尽管略大一点。
使用套索技术构建的预后模型性能也可能是乐观的。内部验证结果对自助重采样的执行方式很敏感。