Minichiello Mark J, Durbin Richard
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, United Kingdom.
Am J Hum Genet. 2006 Nov;79(5):910-22. doi: 10.1086/508901. Epub 2006 Sep 27.
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data.
目前正在开展大规模关联研究,以期揭示复杂疾病的遗传决定因素。我们描述了一种计算效率高的方法,可从群体基因型数据推断谱系,并展示了如何利用这些谱系来精细定位疾病位点并解释关联信号。这些谱系采用祖先重组图(ARG)的形式。ARG为每个位点定义了一个谱系树,并且,当沿着染色体移动时,连续树的拓扑结构会根据历史重组事件的影响而发生变化。我们的分析有两个阶段。首先,我们使用一种启发式算法推断合理的ARG,该算法能够处理未分型和缺失的数据,并且速度足够快,可以应用于大规模研究。其次,我们在每个位点测试谱系树,看疾病病例是否在一个分支下聚集,这表明在该分支上发生了致病突变。由于真实的ARG是未知的,我们在一组推断出的ARG上对该分析进行平均。我们已经在广泛的模拟疾病模型中刻画了我们方法的性能。与更简单的测试相比,我们的方法在定位未分型致病位点时具有更高的准确性,还可用于估计未分型致病等位基因的频率。我们已将我们的方法应用于上田等人关于CTLA4与格雷夫斯病的关联研究,展示了它如何用于剖析关联信号,得出了关于等位基因异质性和相互作用的潜在有趣结果。使用我们的方法推断出的一组ARG进行类似分析,可能适用于从群体基因型数据进行推断的许多其他问题。