Tao Yebin, Sánchez Brisa N, Mukherjee Bhramar
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, 48109, U.S.A.
Stat Med. 2015 Mar 30;34(7):1227-41. doi: 10.1002/sim.6401. Epub 2014 Dec 29.
Many existing cohort studies designed to investigate health effects of environmental exposures also collect data on genetic markers. The Early Life Exposures in Mexico to Environmental Toxicants project, for instance, has been genotyping single nucleotide polymorphisms on candidate genes involved in mental and nutrient metabolism and also in potentially shared metabolic pathways with the environmental exposures. Given the longitudinal nature of these cohort studies, rich exposure and outcome data are available to address novel questions regarding gene-environment interaction (G × E). Latent variable (LV) models have been effectively used for dimension reduction, helping with multiple testing and multicollinearity issues in the presence of correlated multivariate exposures and outcomes. In this paper, we first propose a modeling strategy, based on LV models, to examine the association between repeated outcome measures (e.g., child weight) and a set of correlated exposure biomarkers (e.g., prenatal lead exposure). We then construct novel tests for G × E effects within the LV framework to examine effect modification of outcome-exposure association by genetic factors (e.g., the hemochromatosis gene). We consider two scenarios: one allowing dependence of the LV models on genes and the other assuming independence between the LV models and genes. We combine the two sets of estimates by shrinkage estimation to trade off bias and efficiency in a data-adaptive way. Using simulations, we evaluate the properties of the shrinkage estimates, and in particular, we demonstrate the need for this data-adaptive shrinkage given repeated outcome measures, exposure measures possibly repeated and time-varying gene-environment association.
许多旨在调查环境暴露对健康影响的现有队列研究也收集了基因标记数据。例如,墨西哥早期生活环境毒物暴露项目一直在对参与精神和营养代谢以及可能与环境暴露共享代谢途径的候选基因上的单核苷酸多态性进行基因分型。鉴于这些队列研究的纵向性质,丰富的暴露和结局数据可用于解决有关基因 - 环境相互作用(G×E)的新问题。潜在变量(LV)模型已被有效地用于降维,有助于在存在相关多变量暴露和结局的情况下解决多重检验和多重共线性问题。在本文中,我们首先提出一种基于LV模型的建模策略,以检验重复结局测量(例如儿童体重)与一组相关暴露生物标志物(例如产前铅暴露)之间的关联。然后,我们在LV框架内构建用于G×E效应的新检验,以检验遗传因素(例如血色素沉着病基因)对结局 - 暴露关联的效应修正。我们考虑两种情况:一种允许LV模型依赖于基因,另一种假设LV模型与基因之间独立。我们通过收缩估计将两组估计值结合起来,以数据自适应的方式权衡偏差和效率。通过模拟,我们评估了收缩估计的性质,特别是,鉴于重复的结局测量、可能重复的暴露测量以及随时间变化的基因 - 环境关联,我们证明了这种数据自适应收缩的必要性。