Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
Biostatistics. 2013 Sep;14(4):667-81. doi: 10.1093/biostatistics/kxt006. Epub 2013 Mar 5.
We consider in this paper testing for interactions between a genetic marker set and an environmental variable. A common practice in studying gene-environment (GE) interactions is to analyze one single-nucleotide polymorphism (SNP) at a time. It is of significant interest to analyze SNPs in a biologically defined set simultaneously, e.g. gene or pathway. In this paper, we first show that if the main effects of multiple SNPs in a set are associated with a disease/trait, the classical single SNP-GE interaction analysis can be biased. We derive the asymptotic bias and study the conditions under which the classical single SNP-GE interaction analysis is unbiased. We further show that, the simple minimum p-value-based SNP-set GE analysis, can be biased and have an inflated Type 1 error rate. To overcome these difficulties, we propose a computationally efficient and powerful gene-environment set association test (GESAT) in generalized linear models. Our method tests for SNP-set by environment interactions using a variance component test, and estimates the main SNP effects under the null hypothesis using ridge regression. We evaluate the performance of GESAT using simulation studies, and apply GESAT to data from the Harvard lung cancer genetic study to investigate GE interactions between the SNPs in the 15q24-25.1 region and smoking on lung cancer risk.
我们在本文中考虑了检验一组遗传标记与环境变量之间的相互作用。在研究基因-环境(GE)相互作用时,一种常见的做法是一次分析一个单核苷酸多态性(SNP)。同时分析生物定义的一组 SNP(例如基因或途径)具有重要意义。在本文中,我们首先表明,如果一组中多个 SNP 的主要效应与疾病/特征相关,则经典的单 SNP-GE 相互作用分析可能存在偏差。我们推导出渐近偏差,并研究了经典单 SNP-GE 相互作用分析无偏的条件。我们进一步表明,简单的基于最小 p 值的 SNP 集 GE 分析可能存在偏差,并具有膨胀的Ⅰ型错误率。为了克服这些困难,我们在广义线性模型中提出了一种计算效率高且功能强大的基因-环境集关联测试(GESAT)。我们的方法使用方差分量检验来检验 SNP 集与环境的相互作用,并使用岭回归在零假设下估计主要 SNP 效应。我们使用模拟研究评估了 GESAT 的性能,并将 GESAT 应用于哈佛肺癌遗传研究的数据,以研究 15q24-25.1 区域中 SNP 与吸烟对肺癌风险的相互作用。