Life Technologies, Foster City, CA, USA.
Bioinformatics. 2010 Nov 15;26(22):2856-62. doi: 10.1093/bioinformatics/btq529. Epub 2010 Sep 24.
In complex disorders, independently evolving locus pairs might interact to confer disease susceptibility, with only a modest effect at each locus. With genome-wide association studies on large cohorts, testing all pairs for interaction confers a heavy computational burden, and a loss of power due to large Bonferroni-like corrections. Correspondingly, limiting the tests to pairs that show marginal effect at either locus, also has reduced power. Here, we describe an algorithm that discovers interacting locus pairs without explicitly testing all pairs, or requiring a marginal effect at each locus. The central idea is a mathematical transformation that maps 'statistical correlation between locus pairs' to 'distance between two points in a Euclidean space'. This enables the use of geometric properties to identify proximal points (correlated locus pairs), without testing each pair explicitly. For large datasets (∼ 10(6) SNPs), this reduces the number of tests from 10(12) to 10(6), significantly reducing the computational burden, without loss of power. The speed of the test allows for correction using permutation-based tests. The algorithm is encoded in a tool called RAPID (RApid Pair IDentification) for identifying paired interactions in case-control GWAS.
We validated RAPID with extensive tests on simulated and real datasets. On simulated models of interaction, RAPID easily identified pairs with small marginal effects. On the benchmark disease, datasets from The Wellcome Trust Case Control Consortium, RAPID ran in about 1 CPU-hour per dataset, and identified many significant interactions. In many cases, the interacting loci were known to be important for the disease, but were not individually associated in the genome-wide scan.
在复杂的疾病中,独立进化的基因座对可能相互作用,赋予疾病易感性,而每个基因座的影响都很微小。对于大型队列的全基因组关联研究,对所有对进行相互作用测试会带来沉重的计算负担,并且由于像 Bonferroni 那样的大型校正而导致功效降低。相应地,将测试仅限于在任一位点上显示边际效应的对,也会降低功效。在这里,我们描述了一种算法,该算法可以发现相互作用的基因座对,而无需显式测试所有对,也无需在每个基因座上都具有边际效应。核心思想是一种数学变换,它将“基因座对之间的统计相关性”映射到“欧几里得空间中的两个点之间的距离”。这使得可以使用几何属性来识别近邻点(相关的基因座对),而无需显式测试每个对。对于大型数据集(约 10^6 SNPs),这将测试数量从 10^12 减少到 10^6,大大降低了计算负担,而不会降低功效。测试的速度允许使用基于置换的测试进行校正。该算法被编码为一个名为 RAPID(快速配对识别)的工具,用于识别病例对照 GWAS 中的配对相互作用。
我们使用模拟和真实数据集进行了广泛的测试来验证 RAPID。在相互作用的模拟模型上,RAPID 很容易识别出具有小边际效应的对。在基准疾病上,来自 Wellcome Trust 病例对照协作组的数据集,RAPID 每个数据集大约需要 1 个 CPU 小时,并且识别出了许多显著的相互作用。在许多情况下,相互作用的基因座已知对疾病很重要,但在全基因组扫描中并未单独关联。