Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA.
Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
J Hum Genet. 2017 Sep;62(9):819-829. doi: 10.1038/jhg.2017.43. Epub 2017 Apr 20.
Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.
检测罕见变异与环境的相互作用对于解析常见疾病的病因至关重要。罕见单倍型变异(rHTVs)的相互作用尤其受到关注。与此同时,复杂的抽样设计,如分层随机抽样,越来越多地用于设计病例对照研究,特别是用于招募对照。美国肾癌研究(KCS)就是一个例子,其中所有可用的病例都被包括在内,而每个地点的对照则通过基于年龄、性别和种族与病例进行频数匹配,从人群中随机选择。目前没有能够解释这种复杂抽样设计的 rHTV 关联方法。为了填补这一空白,我们考虑了逻辑贝叶斯 LASSO(LBL),这是一种现有的用于病例对照数据的 rHTV 方法,并表明其模型可以轻松适应复杂的抽样设计。我们研究了两种扩展,将分层变量作为主效应或与它们与单倍型的相互作用的额外建模包括在内。我们进行了广泛的模拟研究,比较了复杂抽样方法与原始 LBL 方法。我们发现,当单倍型和分层变量之间没有相互作用时,两种扩展都表现良好,而原始 LBL 方法会导致Ⅰ型错误率膨胀。然而,当存在这种相互作用时,有必要在模型中包含相互作用效应以控制Ⅰ型错误率。最后,我们分析了 KCS 数据,并发现 N-乙酰转移酶 2 基因中(当前)吸烟与特定 rHTV 之间存在显著相互作用。