Guan Yongtao, Stephens Matthew
Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America.
PLoS Genet. 2008 Dec;4(12):e1000279. doi: 10.1371/journal.pgen.1000279. Epub 2008 Dec 5.
Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption--specifically, that difficult-to-impute SNPs tend to have larger effects--and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate--their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.
基于插补的关联方法为检测未分型变异与表型之间的关联以及整合来自使用不同基因分型平台的多项研究结果提供了一个强大的框架。在此,我们考虑在实际应用这些方法时出现的几个问题,包括:(i)影响插补准确性的因素,包括参考面板的选择;(ii)插补准确性对检测关联效能的影响;(iii)贝叶斯方法和频率论方法在检测插补基因型与表型关联方面的相对优点;以及(iv)如何快速准确地计算用于检测插补单核苷酸多态性(SNP)的贝叶斯因子。我们发现基于插补的方法对插补准确性具有稳健性,并且即使平均插补准确性较差也能提高检测关联的效能。我们解释了通过标准似然比检验对SNP进行关联排序如何与使用不自然先验假设的贝叶斯程序得出相同的结果——具体而言,即难以插补的SNP往往具有更大的效应——并评估使用不做此假设的贝叶斯方法所获得的效能。在贝叶斯框架内,我们发现通过简单地用点估计——其后验均值——替换未知基因型,可以实现对完整分析的良好近似。与已发表的基于抽样的方法相比,这种近似大大降低了计算成本,并且我们提出的方法在全基因组规模上使用非常有限计算资源(例如,一台台式计算机)时是可行的。这种近似还便于仅使用每个SNP的汇总数据跨研究整合信息。这里讨论的方法在软件包BIMBAM中实现,可从http://stephenslab.uchicago.edu/software.html获取。