Li Yun, O'Connor George T, Dupuis Josée, Kolaczyk Eric
Stat Appl Genet Mol Biol. 2015 Jun;14(3):265-77. doi: 10.1515/sagmb-2014-0073.
In genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.
在全基因组关联研究(GWAS)中,识别与表型相关的基因变异是很有意义的。对于给定的表型,相关的基因变异通常是所有可能变异中的一个稀疏子集。因此,传统的套索型估计方法可用于检测重要基因。但是,一个变异位点的基因型与表型之间的关系可能会受到其他变量的影响,比如性别和生活方式。因此,能够将基因-协变量相互作用纳入稀疏回归模型是很重要的。此外,由于存在关于基因在结构化组中共同作用方式的生物学知识,也希望将此信息纳入其中。在本文中,我们提出了一种用于关联研究中基因-协变量模型的新型稀疏回归方法,该方法不仅允许这种相互作用,还考虑了生物组结构。模拟结果表明,我们的方法显著优于另一种考虑了相互作用但忽略了组结构的方法。将其应用于弗雷明汉心脏研究(FHS)中总血浆免疫球蛋白E(IgE)浓度的数据,以性别和吸烟状况作为协变量,得到了几个潜在有趣的基因-协变量相互作用。