Kooperberg Charles, Leblanc Michael, Dai James Y, Rajapakse Indika
Fred Hutchinson Cancer Research Center Division of Public Health Sciences.
Stat Sci. 2009;24(4):472-488. doi: 10.1214/09-sts287.
Genome-wide association studies, in which as many as a million single nucleotide polymorphisms (SNP) are measured on several thousand samples, are quickly becoming a common type of study for identifying genetic factors associated with many phenotypes. There is a strong assumption that interactions between SNPs or genes and interactions between genes and environmental factors substantially contribute to the genetic risk of a disease. Identification of such interactions could potentially lead to increased understanding about disease mechanisms; drug × gene interactions could have profound applications for personalized medicine; strong interaction effects could be beneficial for risk prediction models. In this paper we provide an overview of different approaches to model interactions, emphasizing approaches that make specific use of the structure of genetic data, and those that make specific modeling assumptions that may (or may not) be reasonable to make. We conclude that to identify interactions it is often necessary to do some selection of SNPs, for example, based on prior hypothesis or marginal significance, but that to identify SNPs that are marginally associated with a disease it may also be useful to consider larger numbers of interactions.
全基因组关联研究对数千个样本测量多达一百万个单核苷酸多态性(SNP),正迅速成为识别与多种表型相关的遗传因素的常见研究类型。有一种强烈的假设,即SNP之间或基因之间的相互作用以及基因与环境因素之间的相互作用在很大程度上导致了疾病的遗传风险。识别此类相互作用可能会增进对疾病机制的理解;药物×基因相互作用可能对个性化医疗有深远应用;强相互作用效应可能有利于风险预测模型。在本文中,我们概述了不同的相互作用建模方法,重点介绍了那些特别利用遗传数据结构的方法,以及那些做出特定建模假设(可能合理也可能不合理)的方法。我们得出结论,为了识别相互作用,通常有必要对SNP进行一些选择,例如基于先前的假设或边际显著性,但为了识别与疾病边缘相关的SNP,考虑更多的相互作用可能也有用。