Hancock Dana B, Martin Eden R, Li Yi-Ju, Scott William K
Center for Human Genetics, Duke University Medical Center, Durham, NC, USA.
Genet Epidemiol. 2007 Dec;31(8):883-93. doi: 10.1002/gepi.20249.
A complex web of gene-gene and gene-environment interactions likely underlies late-onset disease development. We compared conditional logistic regression (CLR) and generalized estimating equations (GEE) in modeling such interactions in pedigrees with missing parents. Using the simulation of linkage and association (SIMLA) program, disease genes, an environmental risk factor, gene-gene interaction, and gene-environment interaction were generated in family-based data sets. Four scenarios for the relationship between the marker and disease loci were examined: linkage and association, linkage without association, association without linkage, and absence of both linkage and association. Models for CLR and GEE (with exchangeable and independence correlation matrices) were built, and type I error, power, average odds ratio (OR), standard deviation, and 95% confidence intervals were estimated. CLR and GEE were valid tests of association in the presence of linkage, but type I error was inflated for association without linkage, particularly with GEE. CLR generated estimates of the OR with lower bias but often more variability than the OR estimates observed for GEE. Further, GEE was more powerful than CLR in detecting main and interactive effects. Although GEE with both matrices had similar power, use of the independence matrix resulted in lower type I error and less biased OR estimation as compared to the exchangeable matrix. Our findings support the use of GEE in maximizing power to detect gene-gene and gene-environment interactions but caution its use under potential association without linkage (e.g., population stratification) and the interpretation of its OR estimates.
基因与基因以及基因与环境之间复杂的相互作用网络可能是迟发性疾病发生的基础。我们比较了条件逻辑回归(CLR)和广义估计方程(GEE)在模拟父母缺失的家系中的此类相互作用时的情况。使用连锁与关联模拟(SIMLA)程序,在基于家庭的数据集里生成了疾病基因、环境风险因素、基因与基因的相互作用以及基因与环境的相互作用。研究了标记与疾病位点之间关系的四种情况:连锁与关联、连锁但无关联、关联但无连锁以及既无连锁也无关联。构建了CLR和GEE的模型(具有可交换和独立相关矩阵),并估计了I型错误、检验效能、平均优势比(OR)、标准差以及95%置信区间。在存在连锁的情况下,CLR和GEE是有效的关联检验,但在无连锁的关联中I型错误会膨胀,尤其是对于GEE。与GEE观察到的OR估计值相比,CLR生成的OR估计值偏差较小,但通常变异性更大。此外,在检测主效应和交互效应方面,GEE比CLR更具检验效能。尽管使用两种矩阵的GEE检验效能相似,但与可交换矩阵相比,使用独立矩阵会导致I型错误更低且OR估计偏差更小。我们的研究结果支持使用GEE来最大限度地提高检测基因与基因以及基因与环境相互作用的效能,但在潜在的无连锁关联情况(例如群体分层)下使用时需谨慎,并对其OR估计值的解释也要谨慎。