Agriculture Victoria Research, AgriBio, 5 Ring Road, Bundoora, VIC, 3083, Australia.
Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia.
Genet Sel Evol. 2020 Jul 7;52(1):37. doi: 10.1186/s12711-020-00556-4.
Sequence-based genome-wide association studies (GWAS) provide high statistical power to identify candidate causal mutations when a large number of individuals with both sequence variant genotypes and phenotypes is available. A meta-analysis combines summary statistics from multiple GWAS and increases the power to detect trait-associated variants without requiring access to data at the individual level of the GWAS mapping cohorts. Because linkage disequilibrium between adjacent markers is conserved only over short distances across breeds, a multi-breed meta-analysis can improve mapping precision.
To maximise the power to identify quantitative trait loci (QTL), we combined the results of nine within-population GWAS that used imputed sequence variant genotypes of 94,321 cattle from eight breeds, to perform a large-scale meta-analysis for fat and protein percentage in cattle. The meta-analysis detected (p ≤ 10) 138 QTL for fat percentage and 176 QTL for protein percentage. This was more than the number of QTL detected in all within-population GWAS together (124 QTL for fat percentage and 104 QTL for protein percentage). Among all the lead variants, 100 QTL for fat percentage and 114 QTL for protein percentage had the same direction of effect in all within-population GWAS. This indicates either persistence of the linkage phase between the causal variant and the lead variant across breeds or that some of the lead variants might indeed be causal or tightly linked with causal variants. The percentage of intergenic variants was substantially lower for significant variants than for non-significant variants, and significant variants had mostly moderate to high minor allele frequencies. Significant variants were also clustered in genes that are known to be relevant for fat and protein percentages in milk.
Our study identified a large number of QTL associated with fat and protein percentage in dairy cattle. We demonstrated that large-scale multi-breed meta-analysis reveals more QTL at the nucleotide resolution than within-population GWAS. Significant variants were more often located in genic regions than non-significant variants and a large part of them was located in potentially regulatory regions.
当有大量具有序列变异基因型和表型的个体时,基于序列的全基因组关联研究 (GWAS) 提供了识别候选因果突变的高统计能力。元分析结合了来自多个 GWAS 的汇总统计数据,增加了检测与性状相关变体的能力,而无需访问 GWAS 映射队列的个体水平数据。由于相邻标记之间的连锁不平衡仅在品种之间的短距离内保持,因此多品种元分析可以提高映射精度。
为了最大限度地提高识别数量性状基因座 (QTL) 的能力,我们结合了使用来自八个品种的 94321 头牛的已估计序列变异基因型进行的九项群体内 GWAS 的结果,对牛的脂肪和蛋白质百分比进行了大规模元分析。元分析检测到(p≤10)脂肪百分比的 138 个 QTL 和蛋白质百分比的 176 个 QTL。这比所有群体内 GWAS 一起检测到的 QTL 数量还要多(脂肪百分比的 124 个 QTL 和蛋白质百分比的 104 个 QTL)。在所有主要变体中,脂肪百分比的 100 个 QTL 和蛋白质百分比的 114 个 QTL在所有群体内 GWAS 中具有相同的效应方向。这表明在品种之间因果变异和主要变异之间的连锁相位持续存在,或者一些主要变异实际上是因果关系或与因果变异紧密相关。显著变体的基因间变体比例明显低于非显著变体,并且显著变体的次要等位基因频率大多为中度至高度。显著变体还聚集在已知与牛奶中脂肪和蛋白质百分比相关的基因中。
我们的研究确定了与奶牛脂肪和蛋白质百分比相关的大量 QTL。我们证明,大规模多品种元分析以核苷酸分辨率揭示了比群体内 GWAS 更多的 QTL。显著变体比非显著变体更常位于基因区域,其中很大一部分位于潜在的调节区域。