Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, North Carolina, USA.
Genet Epidemiol. 2010 Nov;34(7):725-38. doi: 10.1002/gepi.20536.
An appealing genome-wide association study design compares one large control group against several disease samples. A pioneering study by the Wellcome Trust Case Control Consortium that employed such a design has identified multiple susceptibility regions, many of which have been independently replicated. While reusing a control sample provides effective utilization of data, it also creates correlation between association statistics across diseases. An observation of a large association statistic for one of the diseases may greatly increase chances of observing a spuriously large association for a different disease. Accounting for the correlation is also particularly important when screening for SNPs that might be involved in a set of diseases with overlapping etiology. We describe methods that correct association statistics for dependency due to shared controls, and we describe ways to obtain a measure of overall evidence and to combine association signals across multiple diseases. The methods we describe require no access to individual subject data, instead, they efficiently utilize information contained in P-values for association reported for individual diseases. P-value based combined tests for association are flexible and essentially as powerful as the approach based on aggregating the individual subject data.
一种有吸引力的全基因组关联研究设计是将一个大型对照组与几个疾病样本进行比较。由 Wellcome Trust Case Control Consortium 进行的一项开创性研究采用了这种设计,该研究已经确定了多个易感性区域,其中许多已经得到了独立的复制。虽然重复使用对照样本可以有效地利用数据,但它也会导致不同疾病之间的关联统计数据之间存在相关性。对于一种疾病的关联统计数据的一个大观察结果可能会大大增加对不同疾病中出现假性大关联的可能性。在筛选可能与重叠病因的一组疾病相关的 SNPs 时,考虑相关性也尤为重要。我们描述了校正因共享对照而导致的关联统计数据的方法,以及获取整体证据的度量和组合多个疾病的关联信号的方法。我们描述的方法不需要访问个人主体数据,而是有效地利用了针对个体疾病报告的关联 P 值中包含的信息。基于 P 值的联合关联检验非常灵活,并且与基于聚合个体数据的方法一样强大。