Suppr超能文献

俄罗斯健康相关生活质量与药物滥用背景下线性回归中子集选择方法的比较

Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia.

作者信息

Morozova Olga, Levina Olga, Uusküla Anneli, Heimer Robert

机构信息

Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA.

Department of Public Health, University of Tartu, Tartu, Estonia.

出版信息

BMC Med Res Methodol. 2015 Aug 30;15:71. doi: 10.1186/s12874-015-0066-2.

Abstract

BACKGROUND

Automatic stepwise subset selection methods in linear regression often perform poorly, both in terms of variable selection and estimation of coefficients and standard errors, especially when number of independent variables is large and multicollinearity is present. Yet, stepwise algorithms remain the dominant method in medical and epidemiological research.

METHODS

Performance of stepwise (backward elimination and forward selection algorithms using AIC, BIC, and Likelihood Ratio Test, p = 0.05 (LRT)) and alternative subset selection methods in linear regression, including Bayesian model averaging (BMA) and penalized regression (lasso, adaptive lasso, and adaptive elastic net) was investigated in a dataset from a cross-sectional study of drug users in St. Petersburg, Russia in 2012-2013. Dependent variable measured health-related quality of life, and independent correlates included 44 variables measuring demographics, behavioral, and structural factors.

RESULTS

In our case study all methods returned models of different size and composition varying from 41 to 11 variables. The percentage of significant variables among those selected in final model varied from 100 % to 27 %. Model selection with stepwise methods was highly unstable, with most (and all in case of backward elimination: BIC, forward selection: BIC, and backward elimination: LRT) of the selected variables being significant (95 % confidence interval for coefficient did not include zero). Adaptive elastic net demonstrated improved stability and more conservative estimates of coefficients and standard errors compared to stepwise. By incorporating model uncertainty into subset selection and estimation of coefficients and their standard deviations, BMA returned a parsimonious model with the most conservative results in terms of covariates significance.

CONCLUSIONS

BMA and adaptive elastic net performed best in our analysis. Based on our results and previous theoretical studies the use of stepwise methods in medical and epidemiological research may be outperformed by alternative methods in cases such as ours. In situations of high uncertainty it is beneficial to apply different methodologically sound subset selection methods, and explore where their outputs do and do not agree. We recommend that researchers, at a minimum, should explore model uncertainty and stability as part of their analyses, and report these details in epidemiological papers.

摘要

背景

线性回归中的自动逐步子集选择方法通常表现不佳,无论是在变量选择还是系数及标准误差估计方面,尤其是当自变量数量众多且存在多重共线性时。然而,逐步算法仍然是医学和流行病学研究中的主导方法。

方法

在2012 - 2013年俄罗斯圣彼得堡吸毒者横断面研究的数据集中,研究了逐步回归(使用AIC、BIC和似然比检验(p = 0.05,LRT)的向后剔除法和向前选择法)以及线性回归中的其他子集选择方法,包括贝叶斯模型平均法(BMA)和惩罚回归法(lasso、自适应lasso和自适应弹性网)。因变量为健康相关生活质量,独立相关变量包括44个测量人口统计学、行为和结构因素的变量。

结果

在我们的案例研究中,所有方法返回的模型大小和组成各不相同,变量数量从41个到11个不等。最终模型中所选变量中显著变量的百分比从100%到27%不等。逐步方法进行模型选择时高度不稳定,大多数(向后剔除法:BIC、向前选择法:BIC和向后剔除法:LRT的所有情况)所选变量都具有显著性(系数的95%置信区间不包括零)。与逐步回归相比,自适应弹性网显示出更高的稳定性以及对系数和标准误差更保守的估计。通过将模型不确定性纳入子集选择以及系数及其标准差的估计中,BMA返回了一个简约模型,在协变量显著性方面结果最为保守。

结论

在我们的分析中,BMA和自适应弹性网表现最佳。基于我们的结果和先前的理论研究,在我们这样的案例中,医学和流行病学研究中使用逐步方法可能不如其他方法。在高度不确定的情况下,应用不同的方法合理的子集选择方法并探索它们的输出在哪些方面一致和不一致是有益的。我们建议研究人员至少应将模型不确定性和稳定性作为分析的一部分进行探索,并在流行病学论文中报告这些细节。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3f5/4553217/0b3ee8169487/12874_2015_66_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验