Suppr超能文献

使用自助法改进通过向后变量消除法选择的回归系数的估计和置信区间。

Using the bootstrap to improve estimation and confidence intervals for regression coefficients selected using backwards variable elimination.

作者信息

Austin Peter C

机构信息

Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada.

出版信息

Stat Med. 2008 Jul 30;27(17):3286-300. doi: 10.1002/sim.3104.

Abstract

Applied researchers frequently use automated model selection methods, such as backwards variable elimination, to develop parsimonious regression models. Statisticians have criticized the use of these methods for several reasons, amongst them are the facts that the estimated regression coefficients are biased and that the derived confidence intervals do not have the advertised coverage rates. We developed a method to improve estimation of regression coefficients and confidence intervals which employs backwards variable elimination in multiple bootstrap samples. In a given bootstrap sample, predictor variables that are not selected for inclusion in the final regression model have their regression coefficient set to zero. Regression coefficients are averaged across the bootstrap samples, and non-parametric percentile bootstrap confidence intervals are then constructed for each regression coefficient. We conducted a series of Monte Carlo simulations to examine the performance of this method for estimating regression coefficients and constructing confidence intervals for variables selected using backwards variable elimination. We demonstrated that this method results in confidence intervals with superior coverage compared with those developed from conventional backwards variable elimination. We illustrate the utility of our method by applying it to a large sample of subjects hospitalized with a heart attack.

摘要

应用研究人员经常使用自动模型选择方法,如向后变量剔除,来开发简约的回归模型。统计学家因多种原因批评了这些方法的使用,其中包括估计的回归系数存在偏差,以及导出的置信区间没有所宣称的覆盖率。我们开发了一种方法来改进回归系数和置信区间的估计,该方法在多个自助抽样样本中采用向后变量剔除。在给定的自助抽样样本中,未被选入最终回归模型的预测变量的回归系数被设为零。对自助抽样样本的回归系数进行平均,然后为每个回归系数构建非参数百分位数自助置信区间。我们进行了一系列蒙特卡罗模拟,以检验该方法在估计回归系数以及为使用向后变量剔除所选变量构建置信区间方面的性能。我们证明,与传统向后变量剔除所构建的置信区间相比,该方法得到的置信区间具有更好的覆盖率。我们通过将该方法应用于大量因心脏病发作住院的受试者样本,来说明我们方法的实用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验