Bochdanovits Zoltán, Simón-Sánchez Javier, Jonker Marianne, Hoogendijk Witte J, van der Vaart Aad, Heutink Peter
Department of Clinical Genetics, VU University Medical Center, Amsterdam, The Netherlands.
Section Stochastics, Department of Mathematics, Faculty of Sciences, Vrije Universiteit, Amsterdam, The Netherlands.
Eur J Hum Genet. 2014 Feb;22(2):238-42. doi: 10.1038/ejhg.2013.115. Epub 2013 Jun 5.
In recent years, genome-wide association studies have been very successful in identifying loci for complex traits. However, typically these findings involve noncoding and/or intergenic SNPs without a clear functional effect that do not directly point to a gene. Hence, the challenge is to identify the causal variant responsible for the association signal. Typically, the first step is to identify all genetic variation in the locus region, usually by resequencing a large number of case chromosomes. Among all variants, the causal one needs to be identified in further functional studies. Because the experimental follow up can be very laborious, restricting the number of variants to be scrutinized can yield a great advantage. An objective method for choosing the size of the region to be followed up would be highly valuable. Here, we propose a simple method to call the minimal region around a significant association peak that is very likely to contain the causal variant. We model linkage disequilibrium (LD) in cases from the observed single SNP association signals, and predict the location of the causal variant by quantifying how well this relationship fits the data. Simulations showed that our approach identifies genomic regions of on average ∼50 kb with up to 90% probability to contain the causal variant. We apply our method to two genome-wide association data sets and localize both the functional variant REP1 in the α-synuclein gene that conveys susceptibility to Parkinson's disease and the APOE gene responsible for the association signal in the Alzheimer's disease data set.
近年来,全基因组关联研究在识别复杂性状的基因座方面非常成功。然而,通常这些发现涉及没有明确功能效应的非编码和/或基因间单核苷酸多态性(SNP),它们并不直接指向某个基因。因此,挑战在于识别导致关联信号的因果变异。通常,第一步是识别基因座区域内的所有遗传变异,通常通过对大量病例染色体进行重测序来实现。在所有变异中,需要通过进一步的功能研究来识别因果变异。由于实验后续工作可能非常费力,限制需要仔细研究的变异数量会带来很大优势。一种选择后续研究区域大小的客观方法将非常有价值。在这里,我们提出一种简单的方法来确定显著关联峰周围极有可能包含因果变异的最小区域。我们根据观察到的单个SNP关联信号对病例中的连锁不平衡(LD)进行建模,并通过量化这种关系与数据的拟合程度来预测因果变异的位置。模拟表明,我们的方法识别出的基因组区域平均约为50 kb,包含因果变异的概率高达90%。我们将我们的方法应用于两个全基因组关联数据集,并定位了α-突触核蛋白基因中传达帕金森病易感性 的功能性变异REP1以及阿尔茨海默病数据集中负责关联信号的APOE基因。