Suppr超能文献

快速方差分析:一种用于全基因组关联研究的高效算法。

FastANOVA: an Efficient Algorithm for Genome-Wide Association Study.

作者信息

Zhang Xiang, Zou Fei, Wang Wei

机构信息

Department of Computer Science, University of North Carolina at Chapel Hill.

出版信息

KDD. 2008:821-829.

Abstract

Studying the association between quantitative phenotype (such as height or weight) and single nucleotide polymorphisms (SNPs) is an important problem in biology. To understand underlying mechanisms of complex phenotypes, it is often necessary to consider joint genetic effects across multiple SNPs. ANOVA (analysis of variance) test is routinely used in association study. Important findings from studying gene-gene (SNP-pair) interactions are appearing in the literature. However, the number of SNPs can be up to millions. Evaluating joint effects of SNPs is a challenging task even for SNP-pairs. Moreover, with large number of SNPs correlated, permutation procedure is preferred over simple Bonferroni correction for properly controlling family-wise error rate and retaining mapping power, which dramatically increases the computational cost of association study.In this paper, we study the problem of finding SNP-pairs that have significant associations with a given quantitative phenotype. We propose an efficient algorithm, FastANOVA, for performing ANOVA tests on SNP-pairs in a batch mode, which also supports large permutation test. We derive an upper bound of SNP-pair ANOVA test, which can be expressed as the sum of two terms. The first term is based on single-SNP ANOVA test. The second term is based on the SNPs and independent of any phenotype permutation. Furthermore, SNP-pairs can be organized into groups, each of which shares a common upper bound. This allows for maximum reuse of intermediate computation, efficient upper bound estimation, and effective SNP-pair pruning. Consequently, FastANOVA only needs to perform the ANOVA test on a small number of candidate SNP-pairs without the risk of missing any significant ones. Extensive experiments demonstrate that FastANOVA is orders of magnitude faster than the brute-force implementation of ANOVA tests on all SNP pairs.

摘要

研究定量表型(如身高或体重)与单核苷酸多态性(SNP)之间的关联是生物学中的一个重要问题。为了理解复杂表型的潜在机制,通常需要考虑多个SNP的联合遗传效应。方差分析(ANOVA)测试在关联研究中经常使用。研究基因-基因(SNP对)相互作用的重要发现不断出现在文献中。然而,SNP的数量可能多达数百万个。即使对于SNP对,评估SNP的联合效应也是一项具有挑战性的任务。此外,由于大量SNP之间存在相关性,与简单的Bonferroni校正相比,置换程序更适合用于正确控制家族性错误率并保留定位能力,这大大增加了关联研究的计算成本。在本文中,我们研究了寻找与给定定量表型具有显著关联的SNP对的问题。我们提出了一种高效算法FastANOVA,用于批量对SNP对进行ANOVA测试,该算法还支持大型置换测试。我们推导了SNP对ANOVA测试的一个上界,它可以表示为两项之和。第一项基于单SNP ANOVA测试。第二项基于SNP且与任何表型置换无关。此外,SNP对可以组织成组,每个组共享一个共同的上界。这允许最大程度地重用中间计算、高效的上界估计和有效的SNP对修剪。因此,FastANOVA只需要对少量候选SNP对进行ANOVA测试,而不会有遗漏任何显著SNP对的风险。大量实验表明,FastANOVA比在所有SNP对上进行ANOVA测试所需的暴力实现快几个数量级。

相似文献

4
Gene-Gene Interactions Detection Using a Two-stage Model.使用两阶段模型检测基因-基因相互作用
J Comput Biol. 2015 Jun;22(6):563-76. doi: 10.1089/cmb.2014.0163. Epub 2015 Apr 14.

引用本文的文献

10
eQTL Epistasis - Challenges and Computational Approaches.eQTL 上位性 - 挑战与计算方法。
Front Genet. 2013 May 31;4:51. doi: 10.3389/fgene.2013.00051. eCollection 2013.

本文引用的文献

7
Genetic variation in laboratory mice.实验小鼠的基因变异
Nat Genet. 2005 Nov;37(11):1175-80. doi: 10.1038/ng1666.
9
Modular epistasis in yeast metabolism.酵母代谢中的模块化上位性
Nat Genet. 2005 Jan;37(1):77-83. doi: 10.1038/ng1489. Epub 2004 Dec 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验