Ratnasekera Pulindu, Graham Jinko, McNeney Brad
Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada.
Front Genet. 2023 Jan 4;13:1065568. doi: 10.3389/fgene.2022.1065568. eCollection 2022.
In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposure-related genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true gene-environment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in the population. Unfortunately, when the genetic structure is exposure-related, the protection against confounding bias for the genetic main effect does not extend to the gene-environment interaction term. We show that current methods to reduce the bias in estimated gene-environment interactions from case-parent trio data can only account for simple population structure involving two strata. To fill this gap, we propose to directly accommodate multiple population strata by adjusting for genetic principal components (PCs). Through simulations, we show that our PC adjustment maintains the nominal type-1 error rate and has nearly identical power to detect gene-environment interaction as an oracle approach based directly on population strata. We also apply the PC-adjustment approach to data from a study of genetic modifiers of cleft palate comprised primarily of case-parent trios of European and East Asian ancestry. Consistent with earlier analyses, our results suggest that the gene-environment interaction signal in these data is due to the self-reported European trios.
在遗传流行病学中,群体风险的对数线性模型可用于研究基因型和暴露因素对疾病相对风险的影响。此类模型还可纳入基因 - 环境交互项,以允许基因型改变暴露因素的效应,或者等效地,使暴露因素改变基因型对相对风险的效应。当一个已测量的测试位点与一个未测量的致病位点处于连锁不平衡状态时,群体中与暴露相关的遗传结构可能导致虚假的基因 - 环境交互作用;也就是说,在致病位点不存在真正的基因 - 环境交互作用的情况下,但在测试位点却出现了明显的基因 - 环境交互作用。当测试位点和致病位点的暴露分布及单倍型分布在不同群体分层中均存在差异时,就会出现与暴露相关的遗传结构。病例 - 亲代三联体设计可以保护遗传主效应的推断免受群体中遗传结构造成的混杂偏倚影响。不幸的是,当遗传结构与暴露相关时,针对遗传主效应的混杂偏倚保护并不能扩展到基因 - 环境交互项。我们表明,当前用于减少病例 - 亲代三联体数据中估计的基因 - 环境交互作用偏差的方法仅能考虑涉及两个分层的简单群体结构。为了填补这一空白,我们建议通过调整遗传主成分(PC)直接纳入多个群体分层。通过模拟,我们表明我们的PC调整维持了名义上的I型错误率,并且具有与直接基于群体分层的神谕方法几乎相同的检测基因 - 环境交互作用的功效。我们还将PC调整方法应用于一项主要由欧洲和东亚血统的病例 - 亲代三联体组成的数据研究中,该研究旨在研究腭裂的遗传修饰因子。与早期分析一致,我们的结果表明这些数据中的基因 - 环境交互作用信号是由于自我报告的欧洲三联体导致的。