Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America.
PLoS Comput Biol. 2012;8(7):e1002600. doi: 10.1371/journal.pcbi.1002600. Epub 2012 Jul 5.
Genome-wide association studies (GWAS) have in recent years discovered thousands of associated markers for hundreds of phenotypes. However, associated loci often only explain a relatively small fraction of heritability and the link between association and causality has yet to be uncovered for most loci. Rare causal variants have been suggested as one scenario that may partially explain these shortcomings. Specifically, Dickson et al. recently reported simulations of rare causal variants that lead to association signals of common, tag single nucleotide polymorphisms, dubbed "synthetic associations". However, an open question is what practical implications synthetic associations have for GWAS. Here, we explore the signatures exhibited by such "synthetic associations" and their implications based on patterns of genetic variation observed in human populations, thus accounting for human evolutionary history -a force disregarded in previous simulation studies. This is made possible by human population genetic data from HapMap 3 consisting of both resequencing and array-based genotyping data for the same set of individuals from multiple populations. We report that synthetic associations tend to be further away from the underlying risk alleles compared to "natural associations" (i.e. associations due to underlying common causal variants), but to a much lesser extent than previously predicted, with both the age and the effect size of the risk allele playing a part in this phenomenon. We find that while a synthetic association has a lower probability of capturing causal variants within its linkage disequilibrium block, sequencing around the associated variant need not extend substantially to have a high probability of capturing at least one causal variant. We also show that the minor allele frequency of synthetic associations is lower than of natural associations for most, but not all, loci that we explored. Finally, we find the variance in associated allele frequency to be a potential indicator of synthetic associations.
全基因组关联研究(GWAS)近年来发现了数百种表型的数千个相关标记。然而,相关的基因座通常只解释了遗传力的相对较小部分,并且大多数基因座的关联与因果关系尚未被揭示。稀有因果变异被认为是可以部分解释这些缺陷的一种情况。具体来说,Dickson 等人最近报告了稀有因果变异的模拟,这些变异导致常见的标签单核苷酸多态性(SNP)的关联信号,被称为“合成关联”。然而,一个悬而未决的问题是,合成关联对 GWAS 有什么实际影响。在这里,我们根据人类群体中观察到的遗传变异模式探索了这些“合成关联”的特征及其意义,从而考虑到了人类进化史——这是以前的模拟研究中忽略的力量。这是通过 HapMap 3 中的人类群体遗传数据实现的,这些数据包括来自多个群体的相同个体的重测序和基于阵列的基因分型数据。我们报告说,与“自然关联”(即由于潜在常见因果变异引起的关联)相比,合成关联往往离潜在的风险等位基因更远,但远不及以前预测的那么远,风险等位基因的年龄和效应大小都在这种现象中发挥作用。我们发现,虽然合成关联在其连锁不平衡块内捕获因果变异的可能性较低,但在关联变体周围进行测序不必大大扩展,就可以有很高的可能性捕获至少一个因果变体。我们还表明,对于我们探索的大多数(但不是全部)基因座,合成关联的次要等位基因频率低于自然关联。最后,我们发现关联等位基因频率的方差可能是合成关联的一个指标。