Lobach Iryna, Sampson Joshua, Lobach Siarhei, Alekseyenko Alexander, Piryatinska Alexandra, He Tao, Zhang Li
Department of Epidemiology and Biostatistics, University of California, San Francisco, California.
Biostatistics Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
Genet Epidemiol. 2019 Apr;43(3):292-299. doi: 10.1002/gepi.22186. Epub 2019 Jan 8.
One of the most important research areas in case-control Genome-Wide Association Studies is to determine how the effect of a genotype varies across the environment or to measure the gene-environment interaction (G × E). We consider the scenario when some of the "healthy" controls actually have the disease and when the frequency of these latent cases varies by the environmental variable of interest. In this scenario, performing logistic regression with the clinically diagnosed disease status as an outcome variable and will result in biased estimates of G × E interaction. Here, we derive a general theoretical approximation to the bias in the estimates of the G × E interaction and show, through extensive simulation, that this approximation is accurate in finite samples. Moreover, we apply this approximation to evaluate the bias in the effect estimates of the genetic variants related to mitochondrial proteins a large-scale prostate cancer study.
病例对照全基因组关联研究中最重要的研究领域之一是确定基因型的效应如何随环境变化,或衡量基因-环境相互作用(G×E)。我们考虑这样一种情况:一些“健康”对照实际上患有该疾病,并且这些潜在病例的频率因感兴趣的环境变量而异。在这种情况下,以临床诊断的疾病状态作为结果变量进行逻辑回归,将导致对G×E相互作用的估计产生偏差。在此,我们推导出G×E相互作用估计偏差的一般理论近似值,并通过广泛的模拟表明,这种近似值在有限样本中是准确的。此外,我们应用这种近似值来评估一项大规模前列腺癌研究中与线粒体蛋白相关的基因变异效应估计值的偏差。