Department of Statistics, The University of Chicago, 5734 S. University Ave., Chicago, IL 60637.
Department of Medicine, The University of Chicago, 5734 S. University Ave., Chicago, IL 60637.
Genet Epidemiol. 2010 Apr;34(3):222-231. doi: 10.1002/gepi.20452.
In the setting of genome-wide association studies, we propose a method for assigning a measure of significance to pre-defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene-level P-values for association is the first step in developing methods that integrate information from pathways and networks with genome-wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohn's disease associations are found using the WTCCC data.
在全基因组关联研究中,我们提出了一种方法,用于为基因组中预先定义的标记集分配显著性度量。这些标记集可以是基因、保守区域或基因群,如途径。使用提出的方法和算法,可以通过特定功能单元与疾病状态之间的关联的证据,不仅来自其中 SNP 的强信号,还可以来自几个同时存在但相关性不强的较弱信号的组合。这种方法有几个优点。首先,来自不同 SNP 的中等强度信号被组合在一起,为集合获得了更强的信号,从而提高了功效。其次,与提供未分型标记信息的方法相结合,它可以得到易于在不同研究和平台之间组合的结果,这些平台可能使用不同的 SNP。第三,结果易于解释,因为它们涉及到功能标记集,这些标记集在表型效应中可能表现为一个单元。最后,基因水平关联的 P 值的可用性是开发将途径和网络信息与全基因组关联数据集成的方法的第一步,这些方法可以更好地理解复杂性状的遗传结构。在模拟和真实数据集上研究了该方法的功效。使用 WTCCC 数据发现了新的克罗恩病关联。