Koslovsky M D, Swartz M D, Leon-Novelo L, Chan W, Wilkinson A V
Department of Biostatistics, UTHealth, Houston, TX, USA.
Department of Epidemiology, UTHealth, Austin, TX, USA.
J Stat Comput Simul. 2018;88(3):575-596. doi: 10.1080/00949655.2017.1398255. Epub 2017 Nov 8.
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.
我们为逻辑回归模型开发了一种贝叶斯变量选择方法,该方法可以在各种遗传约束下同时处理定性协变量和交互项。由于其已被证明的灵活性和效率,我们使用带有确定性退火变体的期望最大化变量选择(EMVS)作为我们方法的平台。我们提出了一种对定性协变量系数先验的方差调整,用于控制假阳性率,以及一种对交互项的灵活参数化,以适应用户指定的遗传约束。该方法可以处理所有成对的交互项以及特定交互的一个子集。通过模拟,我们表明在流行病学研究中遇到的各种探索性研究场景下,该方法比分组LASSO和带有遗传约束的LASSO能更好地选择相关协变量。我们将我们的方法应用于识别一组墨西哥裔青少年中与吸烟尝试相关的遗传和非遗传风险因素。