St-Pierre Julien, Oualkacha Karim, Rai Bhatnagar Sahir
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.
Département de Mathématiques, Faculté des Sciences, Université du Québec à Montréal, Montreal, QC, Canada.
Stat Methods Med Res. 2025 Jan;34(1):180-198. doi: 10.1177/09622802241293768. Epub 2024 Dec 10.
Interactions between genes and environmental factors may play a key role in the etiology of many common disorders. Several regularized generalized linear models have been proposed for hierarchical selection of gene by environment interaction effects, where a gene-environment interaction effect is selected only if the corresponding genetic main effect is also selected in the model. However, none of these methods allow to include random effects to account for population structure, subject relatedness and shared environmental exposure. In this article, we develop a unified approach based on regularized penalized quasi-likelihood estimation to perform hierarchical selection of gene-environment interaction effects in sparse regularized mixed models. We compare the selection and prediction accuracy of our proposed model with existing methods through simulations under the presence of population structure and shared environmental exposure. We show that for all simulation scenarios, including and additional random effect to account for the shared environmental exposure reduces the false positive rate and false discovery rate of our proposed method for selection of both gene-environment interaction and main effects. Using the score as a balanced measure of the false discovery rate and true positive rate, we further show that in the hierarchical simulation scenarios, our method outperforms other methods for retrieving important gene-environment interaction effects. Finally, we apply our method to a real data application using the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, and found that our method retrieves previously reported significant loci.
基因与环境因素之间的相互作用可能在许多常见疾病的病因学中起关键作用。已经提出了几种正则化广义线性模型用于通过基因 - 环境相互作用效应进行基因的分层选择,其中仅当模型中也选择了相应的基因主效应时才选择基因 - 环境相互作用效应。然而,这些方法都不允许纳入随机效应来解释群体结构、个体相关性和共享环境暴露。在本文中,我们开发了一种基于正则化惩罚拟似然估计的统一方法,以在稀疏正则化混合模型中对基因 - 环境相互作用效应进行分层选择。我们通过在存在群体结构和共享环境暴露的情况下进行模拟,将我们提出的模型的选择和预测准确性与现有方法进行比较。我们表明,对于所有模拟场景,包括纳入一个额外的随机效应来解释共享环境暴露,都降低了我们提出的用于选择基因 - 环境相互作用和主效应的方法的假阳性率和错误发现率。使用得分作为错误发现率和真阳性率的平衡度量,我们进一步表明,在分层模拟场景中,我们的方法在检索重要的基因 - 环境相互作用效应方面优于其他方法。最后,我们将我们的方法应用于使用口面部疼痛:前瞻性评估和风险评估(OPPERA)研究的真实数据应用中,发现我们的方法检索到了先前报道的显著位点。