Mancin Enrico, Lourenco Daniela, Bermann Matias, Mantovani Roberto, Misztal Ignacy
Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Padua, Italy.
Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States.
Front Genet. 2021 Apr 29;12:642065. doi: 10.3389/fgene.2021.642065. eCollection 2021.
Population structure or genetic relatedness should be considered in genome association studies to avoid spurious association. The most used methods for genome-wide association studies (GWAS) account for population structure but are limited to genotyped individuals with phenotypes. Single-step GWAS (ssGWAS) can use phenotypes from non-genotyped relatives; however, its ability to account for population structure has not been explored. Here we investigate the equivalence among ssGWAS, efficient mixed-model association expedited (EMMAX), and genomic best linear unbiased prediction GWAS (GBLUP-GWAS), and how they differ from the single-SNP analysis without correction for population structure (SSA-NoCor). We used simulated, structured populations that mimicked fish, beef cattle, and dairy cattle populations with 1040, 5525, and 1,400 genotyped individuals, respectively. Larger populations were also simulated that had up to 10-fold more genotyped animals. The genomes were composed by 29 chromosomes, each harboring one QTN, and the number of simulated SNPs was 35,000 for the fish and 65,000 for the beef and dairy cattle populations. Males and females were genotyped in the fish and beef cattle populations, whereas only males had genotypes in the dairy population. Phenotypes for a trait with heritability varying from 0.25 to 0.35 were available in both sexes for the fish population, but only for females in the beef and dairy cattle populations. In the latter, phenotypes of daughters were projected into genotyped sires (i.e., deregressed proofs) before applying EMMAX and SSA-NoCor. Although SSA-NoCor had the largest number of true positive SNPs among the four methods, the number of false negatives was two-fivefold that of true positives. GBLUP-GWAS and EMMAX had a similar number of true positives, which was slightly smaller than in ssGWAS, although the difference was not significant. Additionally, no significant differences were observed when deregressed proofs were used as pseudo-phenotypes in EMMAX compared to daughter phenotypes in ssGWAS for the dairy cattle population. Single-step GWAS accounts for population structure and is a straightforward method for association analysis when only a fraction of the population is genotyped and/or when phenotypes are available on non-genotyped relatives.
在基因组关联研究中应考虑群体结构或遗传相关性,以避免虚假关联。全基因组关联研究(GWAS)最常用的方法考虑了群体结构,但仅限于具有表型的基因分型个体。单步GWAS(ssGWAS)可以使用非基因分型亲属的表型;然而,其考虑群体结构的能力尚未得到探索。在这里,我们研究了ssGWAS、高效混合模型关联加速法(EMMAX)和基因组最佳线性无偏预测GWAS(GBLUP-GWAS)之间的等效性,以及它们与未校正群体结构的单核苷酸多态性分析(SSA-NoCor)有何不同。我们使用了模拟的结构化群体,分别模拟了鱼类、肉牛和奶牛群体,其中有1040、5525和1400个基因分型个体。还模拟了更大的群体,其基因分型动物数量多达10倍。基因组由29条染色体组成,每条染色体包含一个数量性状核苷酸(QTN),鱼类群体模拟的单核苷酸多态性(SNP)数量为35000个,肉牛和奶牛群体为65000个。在鱼类和肉牛群体中对雄性和雌性进行了基因分型,而在奶牛群体中只有雄性有基因型。鱼类群体中,遗传力在0.25至0.35之间的性状的表型在两性中均有;但在肉牛和奶牛群体中,仅雌性有该性状的表型。在后者中,在应用EMMAX和SSA-NoCor之前,将女儿的表型投影到基因分型的父系中(即逆回归证明)。尽管在这四种方法中,SSA-NoCor的真阳性SNP数量最多,但其假阴性数量是真阳性的两到五倍。GBLUP-GWAS和EMMAX的真阳性数量相似,略小于ssGWAS中的数量,尽管差异不显著。此外,在奶牛群体中,与ssGWAS中使用女儿表型相比,在EMMAX中使用逆回归证明作为伪表型时,未观察到显著差异。单步GWAS考虑了群体结构,当只有一部分群体进行基因分型和/或非基因分型亲属有表型时,它是一种直接的关联分析方法。