Epidemiology, Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah, USA.
BMC Med Genet. 2010 Dec 3;11:170. doi: 10.1186/1471-2350-11-170.
In candidate-gene association studies of single nucleotide polymorphisms (SNPs), multilocus analyses are frequently of high dimensionality when considering haplotypes or haplotype pairs (diplotypes) and differing modes of expression. Often, while candidate genes are selected based on their biological involvement in a given pathway, little is known about the functionality of SNPs to guide association studies. Investigators face the challenge of exploring multiple SNP models to elucidate which variants, independently or in combination, might be associated with a disease of interest. A data mining module, hapConstructor (freely-available in Genie software) performs systematic construction and association testing of multilocus genotype data in a Monte Carlo framework. Our objective was to assess its utility to guide statistical analyses of haplotypes within a candidate region (or combined genotypes across candidate genes) beyond that offered by a standard logistic regression approach.
We applied the hapConstructor method to a multilocus investigation of candidate genes involved in pro-inflammatory cytokine IL6 production, IKBKB, IL6, and NFKB1 (16 SNPs total) hypothesized to operate together to alter colorectal cancer risk. Data come from two U.S. multicenter studies, one of colon cancer (1,556 cases and 1,956 matched controls) and one of rectal cancer (754 cases and 959 matched controls).
hapConstructor enabled us to identify important associations that were further analyzed in logistic regression models to simultaneously adjust for confounders. The most significant finding (nominal P = 0.0004; false discovery rate q = 0.037) was a combined genotype association across IKBKB SNP rs5029748 (1 or 2 variant alleles), IL6 rs1800797 (1 or 2 variant alleles), and NFKB1 rs4648110 (2 variant alleles) which conferred an ~80% decreased risk of colon cancer.
Strengths of hapConstructor were: systematic identification of multiple loci within and across genes important in CRC risk; false discovery rate assessment; and efficient guidance of subsequent logistic regression analyses.
在单核苷酸多态性(SNP)的候选基因关联研究中,当考虑单倍型或单倍型对(双单倍型)以及不同的表达模式时,多基因分析通常具有很高的维度。通常,候选基因是根据它们在特定途径中的生物学参与来选择的,但对于 SNP 指导关联研究的功能知之甚少。研究人员面临着探索多种 SNP 模型的挑战,以阐明哪些变体(独立或组合)可能与感兴趣的疾病相关。数据挖掘模块 hapConstructor(Genie 软件中免费提供)在蒙特卡罗框架中执行多基因座基因型数据的系统构建和关联测试。我们的目的是评估其在候选区域内的单倍型(或候选基因的组合基因型)统计分析中的效用,而不仅仅是标准逻辑回归方法提供的效用。
我们将 hapConstructor 方法应用于一个多基因座候选基因的研究,这些候选基因涉及促炎细胞因子 IL6 产生的 IKBKB、IL6 和 NFKB1(共 16 个 SNP),这些基因被假设共同作用以改变结直肠癌风险。数据来自两个美国多中心研究,一个是结肠癌(1556 例病例和 1956 例匹配对照),另一个是直肠癌(754 例病例和 959 例匹配对照)。
hapConstructor 使我们能够识别出重要的关联,这些关联进一步在逻辑回归模型中进行了分析,以同时调整混杂因素。最显著的发现(名义 P = 0.0004;错误发现率 q = 0.037)是 IKBKB SNP rs5029748(1 或 2 个变异等位基因)、IL6 rs1800797(1 或 2 个变异等位基因)和 NFKB1 rs4648110(2 个变异等位基因)跨越基因的组合基因型关联,这降低了约 80%的结肠癌风险。
hapConstructor 的优势在于:系统地识别出与 CRC 风险相关的基因内和基因间重要的多个基因座;错误发现率评估;以及对随后的逻辑回归分析的有效指导。