Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark.
Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands.
Genet Sel Evol. 2018 Nov 20;50(1):62. doi: 10.1186/s12711-018-0432-8.
Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle.
All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by - 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%.
Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV.
大量牛的全基因组序列数据的可用性和高效的插补方法为在奶牛基因组预测中纳入罕见和低频变异(RLFV)提供了新的机会。本研究的目的是检验在基因组预测奶牛的生育力、健康和寿命时纳入位于基因内和从全基因组序列变异中选择的 RLFV 的影响。
从已推断的序列数据中提取了所有频率低于 0.05 的基因内 RLFV,并使用不同策略创建了子集。这些子集随后与 Illumina 50k 单核苷酸多态性(SNP)数据结合,用于基因组预测。单独使用 50k SNP 数据获得的预测可靠性用作参考值,可靠性的绝对变化被称为百分点变化。在模型中添加除 50k 组件之外的包含所有基因或选择的 RLFV 子集的组件,将预测的可靠性改变了-2.2%至 1.1%,即几乎没有发现预测可靠性的变化,无论如何选择 RLFV。除了这些实证分析之外,还进行了一项模拟研究,以评估在模型中添加 RLFV 对预测可靠性的潜在影响。从不同数量的基因中随机选择了三个包含 21468、1348 和 235 个 RLFV 的因果 RLFV 集,并解释了 50kSNP 解释的估计方差的 10%额外遗传方差。当基于映射结果的基因内 RLFV 被包括在预测模型中时,可靠性提高了高达 4.0%,而当包括因果 RLFV 时,可靠性提高了高达 6.8%。
使用全基因组序列数据中的选择 RLFV 对奶牛基因组预测的经验可靠性仅有很小的影响。我们的模拟表明,为了使序列数据带来益处,关键是要识别因果 RLFV。