Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Department of Statistics, Kansas State University, Manhattan, Kansas, USA.
Biometrics. 2023 Jun;79(2):684-694. doi: 10.1111/biom.13670. Epub 2022 Apr 16.
Gene-environment (G× E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G× E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G× E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies, analysis of diabetes data with single-nucleotide polymorphism measurements from the Nurses' Health Study, and The Cancer Genome Atlas melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.
基因-环境(G×E)相互作用对于阐明复杂疾病的病因具有重要意义,超出了主要遗传和环境效应的范围。在 G×E 研究的疾病表型中,异常值和数据污染是常见的,这导致了广泛的稳健正则化方法的发展。然而,在现有的研究中,贝叶斯框架并没有考虑到这个问题。我们为 G×E 相互作用研究开发了一种完全贝叶斯稳健变量选择方法。所提出的贝叶斯方法可以有效地适应响应变量中的重尾误差和异常值,同时通过考虑结构稀疏性进行变量选择。特别是,对于稳健稀疏组选择,在个体和组两个层面上都施加了尖峰和板条先验,以稳健地识别重要的主效应和交互效应。开发了一种有效的 Gibbs 抽样器来促进快速计算。广泛的模拟研究、对来自护士健康研究的单核苷酸多态性测量的糖尿病数据的分析以及对带有基因表达测量的癌症基因组图谱黑色素瘤数据的分析表明,该方法优于多种竞争方法。