Fang Lingzhao, Sahana Goutam, Ma Peipei, Su Guosheng, Yu Ying, Zhang Shengli, Lund Mogens Sandø, Sørensen Peter
Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, 8830, Tjele, Denmark.
Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture & National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
BMC Genomics. 2017 Aug 10;18(1):604. doi: 10.1186/s12864-017-4004-z.
A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability.
Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020).
Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.
更好地理解复杂性状背后的遗传结构(例如,因果变异的分布及其效应)可能有助于基因组预测。在此,我们假设复杂性状的基因组变异可能在基于“基因本体论”(GO)分组的基因所定义的基因组区域子集中富集,并且将这种独立的生物学信息纳入基因组预测模型可能会提高其预测能力。
分析了荷斯坦(HOL)和泽西(JER)奶牛的四个复杂性状(即产奶量、乳脂率、乳蛋白率和乳腺炎)以及推算的序列变异。我们首先在HOL训练群体中进行了全基因组关联研究(GWAS)后的分析,以评估每个GO术语所定义的基因区域中关联信号的富集程度。然后,我们将基因组最佳线性无偏预测模型(GBLUP)扩展为基因组特征BLUP(GFBLUP)模型,包括一个额外的基因组效应,用于量化位于基因组特征中的一组变异的联合效应。使用单个随机效应的GBLUP模型假设所有基因组变异对基因组关系的贡献均等,而GFBLUP根据估计的基因组参数在预测方程中为各个基因组关系赋予不同的权重。我们的结果表明,与产奶量相比,与免疫相关的GO术语与乳腺炎的关联更强,并且与GBLUP相比,几个具有生物学意义 的GO术语通过GFBLUP提高了四个性状的预测准确性。品种间基因组预测的改善(四个性状的平均增幅为0.161)比荷斯坦牛群体内的改善(四个性状的平均增幅为0.020)更为明显。
我们的基因组特征建模方法提供了一个框架,通过利用独立的生物学知识同时探索复杂性状的遗传结构和基因组预测。