Zhang Xiang, Zou Fei, Wang Wei
Department of Computer Science, University of North Carolina at Chapel Hill, USA.
Pac Symp Biocomput. 2009:528-39.
Recent advances in high-throughput genotyping have inspired increasing research interests in genome-wide association study for diseases. To understand underlying biological mechanisms of many diseases, we need to consider simultaneously the genetic effects across multiple loci. The large number of SNPs often makes multilocus association study very computationally challenging because it needs to explicitly enumerate all possible SNP combinations at the genome-wide scale. Moreover, with the large number of SNPs correlated, permutation procedure is often needed for properly controlling family-wise error rates. This makes the problem even more computationally demanding, since the test procedure needs to be repeated for each permuted data. In this paper, we present FastChi, an exhaustive yet efficient algorithm for genome-wide two-locus chi-square test. FastChi utilizes an upper bound of the two-locus chi-square test, which can be expressed as the sum of two terms--both are efficient to compute: the first term is based on the single-locus chi-square test for the given phenotype; and the second term only depends on the genotypes and is independent of the phenotype. This upper bound enables the algorithm to only perform the two-locus chi-square test on a small number of candidate SNP pairs without the risk of missing any significant ones. Since the second part of the upper bound only needs to be precomputed once and stored for subsequence uses, the advantage is more prominent in large permutation tests. Extensive experimental results demonstrate that our method is an order of magnitude faster than the brute force alternative.
高通量基因分型技术的最新进展激发了人们对疾病全基因组关联研究的浓厚兴趣。为了理解多种疾病的潜在生物学机制,我们需要同时考虑多个位点的遗传效应。大量的单核苷酸多态性(SNP)常常使得多位点关联研究在计算上极具挑战性,因为它需要在全基因组范围内明确枚举所有可能的SNP组合。此外,由于大量SNP之间存在相关性,通常需要采用置换程序来适当控制家族性错误率。这使得问题在计算上的要求更高,因为测试程序需要对每个置换后的数据重复进行。在本文中,我们提出了FastChi,一种用于全基因组两位点卡方检验的详尽而高效的算法。FastChi利用了两位点卡方检验的一个上界,该上界可以表示为两个项的和——这两个项都易于计算:第一项基于给定表型的单位点卡方检验;第二项仅取决于基因型,与表型无关。这个上界使得该算法只需对少量候选SNP对进行两位点卡方检验,而不会有遗漏任何显著SNP对的风险。由于上界的第二部分只需要预先计算一次并存储以供后续使用,在大型置换检验中这种优势更为突出。大量实验结果表明,我们的方法比暴力方法快一个数量级。