Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark.
Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, 2200 Copenhagen N, Denmark.
G3 (Bethesda). 2022 Jan 4;12(1). doi: 10.1093/g3journal/jkab385.
Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
利用基于 SNP 芯片的 imputation 或低深度测序数据的遗传关联研究为大规模关联研究提供了一种具有成本效益的设计。我们探索了适用于此类遗传数据的关联研究方法,并研究了当估计基因型概率时使用不同先验如何影响关联结果。我们提出的方法 ANGSD-asso 的潜在模型,将未观察到的基因型建模为广义线性模型框架中的一个潜在变量。该软件是用 C/C++ 实现的,可以多线程运行。ANGSD-asso 基于基因型概率,可以使用样本等位基因频率或个体等位基因频率作为先验来估计。我们通过模拟来探索基于基因型概率的方法与使用遗传剂量相比的情况。我们的模拟表明,在结构群体中,使用个体等位基因频率作为先验比使用样本等位基因频率具有更高的功效。在测序深度和表型相关性的情况下,ANGSD-asso 的潜在模型比使用剂量具有更高的统计功效和更小的偏差。在 ANGSD-asso 的潜在模型的线性模型中添加额外的协变量比其他适应基因型不确定性的方法具有更高的统计功效和更小的偏差,同时也快得多。这是用英国生物库的 imputed 数据和模拟来证明的。