Department of Computer Science, Columbia University, New York, NY 10027, USA.
Genome Res. 2012 Nov;22(11):2230-40. doi: 10.1101/gr.137885.112. Epub 2012 Jul 5.
Long-range gene-gene interactions are biologically compelling models for disease genetics and can provide insights on relevant mechanisms and pathways. Despite considerable effort, rigorous interaction mapping in humans has remained prohibitively difficult due to computational and statistical limitations. We introduce a novel algorithmic approach to find long-range interactions in common diseases using a standard two-locus test that contrasts the linkage disequilibrium between SNPs in cases and controls. Our ultrafast method overcomes the computational burden of a genome × genome scan by using a novel randomization technique that requires 10× to 100× fewer tests than a brute-force approach. By sampling small groups of cases and highlighting combinations of alleles carried by all individuals in the group, this algorithm drastically trims the universe of combinations while simultaneously guaranteeing that all statistically significant pairs are reported. Our implementation can comprehensively scan large data sets (2K cases, 3K controls, 500K SNPs) to find all candidate pairwise interactions (LD-contrast ) in a few hours-a task that typically took days or weeks to complete by methods running on equivalent desktop computers. We applied our method to the Wellcome Trust bipolar disorder data and found a significant interaction between SNPs located within genes encoding two calcium channel subunits: RYR2 on chr1q43 and CACNA2D4 on chr12p13 (LD-contrast test, ). We replicated this pattern of interchromosomal LD between the genes in a separate bipolar data set from the GAIN project, demonstrating an example of gene-gene interaction that plays a role in the largely uncharted genetic landscape of bipolar disorder.
长程基因-基因相互作用是疾病遗传学中具有生物学吸引力的模型,可深入了解相关的机制和途径。尽管已经付出了相当大的努力,但由于计算和统计方面的限制,人类严格的相互作用图谱绘制仍然非常困难。我们引入了一种新颖的算法方法,使用标准的双位点检验来发现常见疾病中的长程相互作用,该检验对比了病例和对照中 SNP 之间的连锁不平衡。我们的超快方法通过使用一种新颖的随机化技术克服了全基因组×基因组扫描的计算负担,该技术比暴力方法需要少 10 到 100 倍的测试。通过对小的病例组进行抽样,并突出显示组中所有个体携带的等位基因组合,该算法可以极大地削减组合的数量,同时保证报告所有具有统计学意义的对。我们的实现可以全面扫描大型数据集(2K 病例,3K 对照,500K SNP),以在数小时内找到所有候选的成对相互作用(LD-contrast),而通过在等效台式计算机上运行的方法完成这项任务通常需要数天或数周的时间。我们将我们的方法应用于惠康信托基金会的双相情感障碍数据,发现位于编码两个钙通道亚基的基因内的 SNP 之间存在显著的相互作用:chr1q43 上的 RYR2 和 chr12p13 上的 CACNA2D4(LD-contrast 检验,)。我们在来自 GAIN 项目的另一个双相情感障碍数据集重现了这两个基因之间的染色体间 LD 模式,证明了在双相情感障碍的大部分未知遗传景观中起作用的基因-基因相互作用的一个例子。