全基因组关联研究的荟萃分析

Meta-analysis of genome-wide association studies.

作者信息

de Bakker Paul I W, Neale Benjamin M, Daly Mark J

出版信息

Cold Spring Harb Protoc. 2010 Jun;2010(6):pdb.top81. doi: 10.1101/pdb.top81.

PMID:20516189

Abstract

Individual genome-wide association studies have only limited power to find novel loci underlying complex traits and common diseases. With relatively modest sample and effect sizes, a true association between genotype and phenotype may never meet genome-wide statistical significance (P < 5 x 10(-8)) in a single study. Through meta-analysis, novel susceptibility loci can be discovered by effectively summing the statistical evidence of individually underpowered studies. Most genetic discoveries for complex traits are now made through meta-analysis collaborations, which so far have been restricted to single-locus analyses, testing for main effects at a single polymorphism at a time. A key benefit of this approach is that individual-level genotype (and phenotype) data do not need to be exchanged between research groups. In this article, we focus on meta-analysis at individual single-nucleotide polymorphisms (SNPs), paying particular attention to how imputation uncertainty can be incorporated into the association analysis and subsequent meta-analysis. Probably the most important aspect of genome-wide association meta-analysis is harmonization of the study results. As studies differ in design, sample collection, genotyping platforms, and association analysis methods, it is important that the association results (per SNP) of each study can be formatted, exchanged, and analyzed in such a way that the statistical evidence can be combined appropriately and that no valuable information is lost. Without minimizing the importance of having a clear phenotype definition (and corresponding measurements), we will assume that investigators representing the various studies have made sensible agreements about phenotype definitions, necessary sample exclusions, and appropriate covariate modeling.

摘要

单个全基因组关联研究发现复杂性状和常见疾病潜在新基因座的能力有限。由于样本量和效应量相对较小，在单个研究中，基因型与表型之间的真实关联可能永远无法达到全基因组统计学显著性（P < 5×10⁻⁸）。通过荟萃分析，通过有效地汇总单个功效不足的研究的统计证据，可以发现新的易感基因座。目前，大多数复杂性状的遗传学发现都是通过荟萃分析合作完成的，到目前为止，这些合作仅限于单基因座分析，即一次在一个单核苷酸多态性处检测主效应。这种方法的一个关键优势是，研究组之间无需交换个体水平的基因型（和表型）数据。在本文中，我们专注于单个单核苷酸多态性（SNP）的荟萃分析，特别关注如何将归因不确定性纳入关联分析及后续的荟萃分析。全基因组关联荟萃分析最重要的方面可能是研究结果的协调一致。由于各项研究在设计、样本收集、基因分型平台和关联分析方法上存在差异，重要的是每项研究的关联结果（每个SNP）能够以这样一种方式进行格式化、交换和分析，即统计证据能够得到适当整合，且不会丢失任何有价值的信息。在不低估明确表型定义（及相应测量）重要性的情况下，我们将假设代表各项研究的研究者已就表型定义、必要的样本排除和适当的协变量建模达成了合理共识。