Suppr超能文献

论模型选择对预测变量识别和参数推断的影响。

On the impact of model selection on predictor identification and parameter inference.

作者信息

Pfeiffer Ruth M, Redd Andrew, Carroll Raymond J

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Room 7E142, Bethesda, MD 20892 USA.

Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT 84132 USA.

出版信息

Comput Stat. 2017;32(2):667-690. doi: 10.1007/s00180-016-0690-2. Epub 2016 Oct 22.

Abstract

We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models, penalized linear regression, elastic net, smoothly clipped absolute deviation (SCAD), least angle regression and LASSO had a low false negative (FN) predictor selection rates but false positive (FP) rates above 20 % for all sample and effect sizes. Partial least squares regression had few FPs but many FNs. Only relaxo had low FP and FN rates. For logistic models, LASSO and penalized logistic regression had many FPs and few FNs for all sample and effect sizes. SCAD and adaptive logistic regression had low or moderate FP rates but many FNs. 95 % confidence interval coverage of predictors with null effects was approximately 100 % for Algorithm 1 for all methods, and 95 % for Algorithm 2 for large sample and effect sizes. Coverage was low only for penalized partial least squares (linear regression). For outcome-associated predictors, coverage was close to 95 % for Algorithm 2 for large sample and effect sizes for all methods except penalized partial least squares and penalized logistic regression. Coverage was sub-nominal for Algorithm 1. In conclusion, many methods performed comparably, and while Algorithm 2 is preferred to Algorithm 1 for estimation, it yields valid inference only for large effect and sample sizes.

摘要

我们评估了几种用于线性模型和逻辑模型的惩罚回归方法识别与结果相关的预测变量的能力,以及预测变量选择对实际样本量参数推断的影响。我们研究了直接从惩罚方法(算法1)获得的效应估计值,或通过用标准回归重新拟合选定的预测变量(算法2)获得的效应估计值。对于线性模型,惩罚线性回归、弹性网络、平滑截断绝对偏差(SCAD)、最小角回归和套索回归的假阴性(FN)预测变量选择率较低,但对于所有样本量和效应量,假阳性(FP)率均高于20%。偏最小二乘回归的FP较少,但FN较多。只有relaxo的FP和FN率较低。对于逻辑模型,对于所有样本量和效应量,套索回归和惩罚逻辑回归的FP较多,FN较少。SCAD和自适应逻辑回归的FP率较低或适中,但FN较多。对于所有方法,算法1中零效应预测变量的95%置信区间覆盖率约为100%,对于大样本量和效应量,算法2的覆盖率为95%。只有惩罚偏最小二乘(线性回归)的覆盖率较低。对于与结果相关的预测变量,对于大样本量和效应量,除惩罚偏最小二乘和惩罚逻辑回归外,所有方法的算法2的覆盖率接近95%。算法1的覆盖率低于标称值。总之,许多方法的表现相当,虽然算法2在估计方面比算法1更可取,但它仅对大效应量和样本量产生有效的推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a88/5480098/41b04ef01e7d/180_2016_690_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验