Sarup Pernille, Jensen Just, Ostersen Tage, Henryon Mark, Sørensen Peter
Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Blichers Allé 20, 8830, Tjele, Denmark.
SEGES Danish Pig Research Centre, Axeltorv 3, 1609, Copenhagen V, Denmark.
BMC Genet. 2016 Jan 5;17:11. doi: 10.1186/s12863-015-0322-9.
In animal breeding, genetic variance for complex traits is often estimated using linear mixed models that incorporate information from single nucleotide polymorphism (SNP) markers using a realized genomic relationship matrix. In such models, individual genetic markers are weighted equally and genomic variation is treated as a "black box." This approach is useful for selecting animals with high genetic potential, but it does not generate or utilise knowledge of the biological mechanisms underlying trait variation. Here we propose a linear mixed-model approach that can evaluate the collective effects of sets of SNPs and thereby open the "black box." The described genomic feature best linear unbiased prediction (GFBLUP) model has two components that are defined by genomic features.
We analysed data on average daily gain, feed efficiency, and lean meat percentage from 3,085 Duroc boars, along with genotypes from a 60 K SNP chip. In addition information on known quantitative trait loci (QTL) from the animal QTL database was integrated in the GFBLUP as a genomic feature. Our results showed that the most significant QTL categories were indeed biologically meaningful. Additionally, for high heritability traits, prediction accuracy was improved by the incorporation of biological knowledge in prediction models. A simulation study using the real genotypes and simulated phenotypes demonstrated challenges regarding detection of causal variants in low to medium heritability traits.
The GFBLUP model showed increased predictive ability when enough causal variants were included in the genomic feature to explain over 10 % of the genomic variance, and when dilution by non-causal markers was minimal. In the observed data set, predictive ability was increased by the inclusion of prior QTL information obtained outside the training data set, but only for the trait with highest heritability.
在动物育种中,复杂性状的遗传方差通常使用线性混合模型进行估计,该模型使用基于单核苷酸多态性(SNP)标记的实现基因组关系矩阵纳入信息。在这类模型中,各个遗传标记被同等加权,基因组变异被视为一个“黑匣子”。这种方法对于选择具有高遗传潜力的动物很有用,但它不会生成或利用有关性状变异潜在生物学机制的知识。在此,我们提出一种线性混合模型方法,该方法可以评估SNP集合的综合效应,从而打开这个“黑匣子”。所描述的基因组特征最佳线性无偏预测(GFBLUP)模型有两个由基因组特征定义的组成部分。
我们分析了3085头杜洛克公猪的平均日增重、饲料效率和瘦肉率数据,以及来自60K SNP芯片的基因型数据。此外,来自动物QTL数据库的已知数量性状位点(QTL)信息作为基因组特征整合到GFBLUP中。我们的结果表明,最显著的QTL类别确实具有生物学意义。此外,对于高遗传力性状,在预测模型中纳入生物学知识可提高预测准确性。一项使用真实基因型和模拟表型的模拟研究表明,在检测低至中等遗传力性状的因果变异方面存在挑战。
当基因组特征中包含足够多的因果变异以解释超过10%的基因组方差,且非因果标记的稀释作用最小时,GFBLUP模型显示出更高的预测能力。在观察到的数据集中,通过纳入训练数据集之外获得的先前QTL信息提高了预测能力,但仅针对遗传力最高的性状。