Weber Sven E, Frisch Matthias, Snowdon Rod J, Voss-Fels Kai P
Department of Plant Breeding, Justus Liebig University, Giessen, Germany.
Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany.
Front Plant Sci. 2023 Sep 5;14:1217589. doi: 10.3389/fpls.2023.1217589. eCollection 2023.
In modern plant breeding, genomic selection is becoming the gold standard for selection of superior genotypes. The basis for genomic prediction models is a set of phenotyped lines along with their genotypic profile. With high marker density and linkage disequilibrium (LD) between markers, genotype data in breeding populations tends to exhibit considerable redundancy. Therefore, interest is growing in the use of haplotype blocks to overcome redundancy by summarizing co-inherited features. Moreover, haplotype blocks can help to capture local epistasis caused by interacting loci. Here, we compared genomic prediction methods that either used single SNPs or haplotype blocks with regards to their prediction accuracy for important traits in crop datasets. We used four published datasets from canola, maize, wheat and soybean. Different approaches to construct haplotype blocks were compared, including blocks based on LD, physical distance, number of adjacent markers and the algorithms implemented in the software "" and . The tested prediction methods included Genomic Best Linear Unbiased Prediction (GBLUP), Extended GBLUP to account for additive by additive epistasis (EGBLUP), Bayesian LASSO and Reproducing Kernel Hilbert Space (RKHS) regression. We found improved prediction accuracy in some traits when using haplotype blocks compared to SNP-based predictions, however the magnitude of improvement was very trait- and model-specific. Especially in settings with low marker density, haplotype blocks can improve genomic prediction accuracy. In most cases, physically large haplotype blocks yielded a strong decrease in prediction accuracy. Especially when prediction accuracy varies greatly across different prediction models, prediction based on haplotype blocks can improve prediction accuracy of underperforming models. However, there is no "best" method to build haplotype blocks, since prediction accuracy varied considerably across methods and traits. Hence, criteria used to define haplotype blocks should not be viewed as fixed biological parameters, but rather as hyperparameters that need to be adjusted for every dataset.
在现代植物育种中,基因组选择正成为选择优良基因型的黄金标准。基因组预测模型的基础是一组具有表型特征的品系及其基因型概况。由于标记密度高以及标记之间存在连锁不平衡(LD),育种群体中的基因型数据往往表现出相当大的冗余性。因此,人们越来越关注使用单倍型块来通过总结共同遗传特征来克服冗余性。此外,单倍型块有助于捕捉由相互作用基因座引起的局部上位性。在此,我们比较了使用单个单核苷酸多态性(SNP)或单倍型块的基因组预测方法在作物数据集中对重要性状的预测准确性。我们使用了来自油菜、玉米、小麦和大豆的四个已发表数据集。比较了构建单倍型块的不同方法,包括基于LD、物理距离、相邻标记数量以及软件“”和中实现的算法构建的块。测试的预测方法包括基因组最佳线性无偏预测(GBLUP)、考虑加性×加性上位性的扩展GBLUP(EGBLUP)、贝叶斯LASSO和再生核希尔伯特空间(RKHS)回归。我们发现,与基于SNP的预测相比,使用单倍型块时某些性状的预测准确性有所提高,然而提高的幅度因性状和模型而异。特别是在标记密度低的情况下,单倍型块可以提高基因组预测准确性。在大多数情况下,物理上较大的单倍型块会导致预测准确性大幅下降。特别是当不同预测模型的预测准确性差异很大时,基于单倍型块的预测可以提高表现不佳模型的预测准确性。然而,没有“最佳”的方法来构建单倍型块,因为预测准确性在不同方法和性状之间差异很大。因此,用于定义单倍型块的标准不应被视为固定的生物学参数,而应被视为需要针对每个数据集进行调整的超参数。