Moss Lilit C, Gauderman William J, Lewinger Juan Pablo, Conti David V
Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California.
Genet Epidemiol. 2019 Mar;43(2):150-165. doi: 10.1002/gepi.22171. Epub 2018 Nov 19.
Genome-wide association studies typically search for marginal associations between a single-nucleotide polymorphism (SNP) and a disease trait while gene-environment (G × E) interactions remain generally unexplored. More powerful methods beyond the simple case-control (CC) approach leverage either marginal effects or CC ascertainment to increase power. However, these potential gains depend on assumptions whose aptness is often unclear a priori. Here, we review G × E methods and use simulations to highlight performance as a function of main and interaction effects and the association of the two factors in the source population. Substantial variation in performance between methods leads to uncertainty as to which approach is most appropriate for any given analysis. We present a framework that (a) balances the robustness of a CC approach with the power of the case-only (CO) approach; (b) incorporates main SNP effects; (c) allows for incorporation of prior information; and (d) allows the data to determine the most appropriate model. Our framework is based on Bayes model averaging, which provides a principled statistical method for incorporating model uncertainty. We average over inclusion of parameters corresponding to the main and G × E interaction effects and the G-E association in controls. The resulting method exploits the joint evidence for main and interaction effects while gaining power from a CO equivalent analysis. Through simulations, we demonstrate that our approach detects SNPs within a wide range of scenarios with increased power over current methods. We illustrate the approach on a gene-environment scan in the USC Children's Health Study.
全基因组关联研究通常搜索单核苷酸多态性(SNP)与疾病性状之间的边际关联,而基因-环境(G×E)相互作用通常仍未得到充分探索。除了简单的病例对照(CC)方法之外,更强大的方法利用边际效应或CC确定来提高检验效能。然而,这些潜在的收益取决于一些假设,而这些假设的适用性通常在事先并不明确。在这里,我们回顾了G×E方法,并通过模拟来突出其作为主要效应和交互效应以及源人群中两个因素关联函数的性能。方法之间性能的显著差异导致对于任何给定分析哪种方法最合适存在不确定性。我们提出了一个框架,该框架(a)平衡CC方法的稳健性和仅病例(CO)方法的检验效能;(b)纳入主要SNP效应;(c)允许纳入先验信息;以及(d)允许数据确定最合适的模型。我们的框架基于贝叶斯模型平均,它提供了一种用于纳入模型不确定性的有原则的统计方法。我们对与主要效应、G×E交互效应以及对照组中G-E关联相对应的参数纳入进行平均。由此产生的方法利用了主要效应和交互效应的联合证据,同时从等效的CO分析中获得检验效能。通过模拟,我们证明我们的方法在广泛的场景中能够检测到SNP,且检验效能高于当前方法。我们在美国南加州大学儿童健康研究的基因-环境扫描中展示了该方法。