Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
School of Agriculture, Massey University, Palmerston North, New Zealand.
PLoS One. 2018 Mar 21;13(3):e0194683. doi: 10.1371/journal.pone.0194683. eCollection 2018.
Genomic prediction exploits single nucleotide polymorphisms (SNPs) across the whole genome for predicting genetic merit of selection candidates. In most models for genomic prediction, e.g. BayesA, B, C, R and GBLUP, independence of SNP effects is assumed. However, SNP effects are expected to be locally dependent given the presence of a nearby QTL because SNPs surrounding the QTL do not segregate independently. A consequence of ignoring this dependence is that SNPs with small effects may be overly shrunk, e.g. effects from markers with high minor allele frequencies (MAF) that flank QTL with low MAF. A nested mixture model (BayesN) is developed to account for the dependence of effects of SNPs that are closely linked, where the effects of SNPs in every non-overlapping genomic window a priori follow a point mass at zero for all SNPs or a mixture of some SNPs with nonzero effects and others with zero effects. It can be regarded as a parsimonious alternative to the existing antedependence model, antiBayesB, which allow a nonstationary dependence of SNP effects. Illumina 777K BovineHD genotypes from 948 Angus cattle were used to simulate 5,000 offspring, with 4,000 used for training and 1,000 for validation. Scenarios with 300 common (MAF > 0.05) or rare (MAF < 0.05) QTL randomly selected from segregating SNPs were replicated 8 times. SNPs corresponding to QTL were masked from a 600k panel comprising SNPs with MAF > 0.05 or a 50k evenly spaced subset of these. Compared with BayesB and a modified antiBayesB, BayesN improved the accuracy of prediction up to 2.0% with 50k SNPs and up to 7.0% with 600k SNPs, most improvements occurring in the rare QTL scenario. Computing time was reduced up to 60% with 50k SNPs and up to 75% with 600k SNPs. BayesN is an accurate and computationally efficient method for genomic prediction with whole-genome SNPs, especially for traits with rare QTL.
基因组预测利用整个基因组中的单核苷酸多态性(SNP)来预测选择对象的遗传优势。在大多数基因组预测模型中,例如 BayesA、B、C、R 和 GBLUP,假设 SNP 效应是独立的。然而,由于存在附近的 QTL,SNP 效应预计是局部相关的,因为 QTL 周围的 SNP 不会独立分离。忽略这种相关性的后果是,具有小效应的 SNP 可能会被过度收缩,例如来自具有低 MAF 的标记的效应,这些标记侧翼的 QTL 具有低 MAF。开发嵌套混合模型(BayesN)来解释紧密连锁 SNP 效应的依赖性,其中每个非重叠基因组窗口中 SNP 的效应先验地对于所有 SNP 都遵循零的点质量,或者对于具有非零效应的一些 SNP 和具有零效应的其他 SNP 的混合物。它可以被视为现有相依模型 antiBayesB 的一种简约替代方案,后者允许 SNP 效应的非平稳相关性。使用来自 948 头安格斯牛的 Illumina 777K BovineHD 基因型模拟了 5000 头后代,其中 4000 头用于训练,1000 头用于验证。从分离的 SNP 中随机选择 300 个常见(MAF > 0.05)或罕见(MAF < 0.05)QTL 的场景重复了 8 次。对应于 QTL 的 SNP 从包含 MAF > 0.05 的 600k 面板或这些 SNP 的 50k 均匀间隔子集的 SNP 中屏蔽。与 BayesB 和修改后的 antiBayesB 相比,BayesN 提高了预测的准确性,使用 50k SNP 提高了高达 2.0%,使用 600k SNP 提高了高达 7.0%,大多数改进发生在罕见 QTL 场景中。使用 50k SNP 时计算时间减少了高达 60%,使用 600k SNP 时计算时间减少了高达 75%。BayesN 是一种准确且计算效率高的全基因组 SNP 基因组预测方法,特别是对于具有罕见 QTL 的性状。