Chen Yi-Hau, Chatterjee Nilanjan, Carroll Raymond J
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan, People's Republic of China.
Biostatistics. 2008 Jan;9(1):81-99. doi: 10.1093/biostatistics/kxm011. Epub 2007 May 8.
Genetic epidemiologic studies often involve investigation of the association of a disease with a genomic region in terms of the underlying haplotypes, that is the combination of alleles at multiple loci along homologous chromosomes. In this article, we consider the problem of estimating haplotype-environment interactions from case-control studies when some of the environmental exposures themselves may be influenced by genetic susceptibility. We specify the distribution of the diplotypes (haplotype pair) given environmental exposures for the underlying population based on a novel semiparametric model that allows haplotypes to be potentially related with environmental exposures, while allowing the marginal distribution of the diplotypes to maintain certain population genetics constraints such as Hardy-Weinberg equilibrium. The marginal distribution of the environmental exposures is allowed to remain completely nonparametric. We develop a semiparametric estimating equation methodology and related asymptotic theory for estimation of the disease odds ratios associated with the haplotypes, environmental exposures, and their interactions, parameters that characterize haplotype-environment associations and the marginal haplotype frequencies. The problem of phase ambiguity of genotype data is handled using a suitable expectation-maximization algorithm. We study the finite-sample performance of the proposed methodology using simulated data. An application of the methodology is illustrated using a case-control study of colorectal adenoma, designed to investigate how the smoking-related risk of colorectal adenoma can be modified by "NAT2," a smoking-metabolism gene that may potentially influence susceptibility to smoking itself.
遗传流行病学研究通常涉及根据潜在单倍型来调查疾病与基因组区域之间的关联,即同源染色体上多个位点的等位基因组合。在本文中,我们考虑当某些环境暴露本身可能受到遗传易感性影响时,如何从病例对照研究中估计单倍型与环境的相互作用。我们基于一个新的半参数模型指定了潜在人群中给定环境暴露的双倍型(单倍型对)分布,该模型允许单倍型与环境暴露潜在相关,同时允许双倍型的边际分布保持某些群体遗传学约束,如哈迪 - 温伯格平衡。环境暴露的边际分布可以完全是非参数的。我们开发了一种半参数估计方程方法和相关的渐近理论,用于估计与单倍型、环境暴露及其相互作用相关的疾病优势比,这些参数表征了单倍型 - 环境关联和边际单倍型频率。使用合适的期望最大化算法处理基因型数据的相位模糊问题。我们使用模拟数据研究了所提出方法的有限样本性能。通过一项结直肠腺瘤病例对照研究说明了该方法的应用,该研究旨在调查“NAT2”(一种可能影响对吸烟本身易感性的吸烟代谢基因)如何改变与吸烟相关的结直肠腺瘤风险。