Huque Md Hamidul, Carroll Raymond J, Diao Nancy, Christiani David C, Ryan Louise M
School of Mathematical and Physical Sciences, University of Technology Sydney, New South Wales, Australia.
Department of Statistics, Texas A&M University, College Station, Texas, United States of American.
Genet Epidemiol. 2016 Nov;40(7):570-578. doi: 10.1002/gepi.21986. Epub 2016 Jun 17.
Genetic susceptibility and environmental exposure both play an important role in the aetiology of many diseases. Case-control studies are often the first choice to explore the joint influence of genetic and environmental factors on the risk of developing a rare disease. In practice, however, such studies may have limited power, especially when susceptibility genes are rare and exposure distributions are highly skewed. We propose a variant of the classical case-control study, the exposure enriched case-control (EECC) design, where not only cases, but also high (or low) exposed individuals are oversampled, depending on the skewness of the exposure distribution. Of course, a traditional logistic regression model is no longer valid and results in biased parameter estimation. We show that addition of a simple covariate to the regression model removes this bias and yields reliable estimates of main and interaction effects of interest. We also discuss optimal design, showing that judicious oversampling of high/low exposed individuals can boost study power considerably. We illustrate our results using data from a study involving arsenic exposure and detoxification genes in Bangladesh.
遗传易感性和环境暴露在许多疾病的病因学中都起着重要作用。病例对照研究通常是探索遗传和环境因素对罕见病发病风险的联合影响的首选方法。然而,在实际中,此类研究的效能可能有限,尤其是当易感基因罕见且暴露分布高度偏态时。我们提出了经典病例对照研究的一种变体,即暴露富集病例对照(EECC)设计,根据暴露分布的偏态情况,不仅对病例,而且对高(或低)暴露个体进行过采样。当然,传统的逻辑回归模型不再有效,会导致参数估计有偏差。我们表明,在回归模型中添加一个简单的协变量可以消除这种偏差,并得出感兴趣的主要效应和交互效应的可靠估计值。我们还讨论了最优设计,表明对高/低暴露个体进行明智的过采样可以显著提高研究效能。我们使用来自孟加拉国一项涉及砷暴露和解毒基因的研究数据来说明我们的结果。