Beaulieu Jean, Doerksen Trevor K, MacKay John, Rainville André, Bousquet Jean
Natural Resources Canada, Canadian Forest Service, Canadian Wood Fibre Centre, 1055 du P,E,P,S, Stn, Sainte-Foy, P,O, Box 10380, Quebec City, QC G1V 4C7, Canada.
BMC Genomics. 2014 Dec 2;15(1):1048. doi: 10.1186/1471-2164-15-1048.
Genomic selection (GS) may improve selection response over conventional pedigree-based selection if markers capture more detailed information than pedigrees in recently domesticated tree species and/or make it more cost effective. Genomic prediction accuracies using 1748 trees and 6932 SNPs representative of as many distinct gene loci were determined for growth and wood traits in white spruce, within and between environments and breeding groups (BG), each with an effective size of Ne ≈ 20. Marker subsets were also tested.
Model fits and/or cross-validation (CV) prediction accuracies for ridge regression (RR) and the least absolute shrinkage and selection operator models approached those of pedigree-based models. With strong relatedness between CV sets, prediction accuracies for RR within environment and BG were high for wood (r = 0.71-0.79) and moderately high for growth (r = 0.52-0.69) traits, in line with trends in heritabilities. For both classes of traits, these accuracies achieved between 83% and 92% of those obtained with phenotypes and pedigree information. Prediction into untested environments remained moderately high for wood (r ≥ 0.61) but dropped significantly for growth (r ≥ 0.24) traits, emphasizing the need to phenotype in all test environments and model genotype-by-environment interactions for growth traits. Removing relatedness between CV sets sharply decreased prediction accuracies for all traits and subpopulations, falling near zero between BGs with no known shared ancestry. For marker subsets, similar patterns were observed but with lower prediction accuracies.
Given the need for high relatedness between CV sets to obtain good prediction accuracies, we recommend to build GS models for prediction within the same breeding population only. Breeding groups could be merged to build genomic prediction models as long as the total effective population size does not exceed 50 individuals in order to obtain high prediction accuracy such as that obtained in the present study. A number of markers limited to a few hundred would not negatively impact prediction accuracies, but these could decrease more rapidly over generations. The most promising short-term approach for genomic selection would likely be the selection of superior individuals within large full-sib families vegetatively propagated to implement multiclonal forestry.
如果标记能够比家系捕获更多关于近期驯化树种的详细信息,和/或使其更具成本效益,那么基因组选择(GS)可能会比传统的基于系谱的选择提高选择响应。使用代表多达6932个不同基因座的1748棵树和6932个单核苷酸多态性(SNP),测定了白云杉在环境内部和环境之间以及育种群体(BG)中生长和木材性状的基因组预测准确性,每个育种群体的有效大小为Ne≈20。还对标记子集进行了测试。
岭回归(RR)以及最小绝对收缩和选择算子模型的模型拟合和/或交叉验证(CV)预测准确性接近基于系谱的模型。由于CV集之间的亲缘关系较强,RR在环境和BG内对木材性状的预测准确性较高(r = 0.71 - 0.79),对生长性状的预测准确性中等偏高(r = 0.52 - 0.69),与遗传力趋势一致。对于这两类性状,这些准确性达到了利用表型和系谱信息所获得准确性的83%至92%。对未测试环境的预测,木材性状仍然中等偏高(r≥0.61),但生长性状显著下降(r≥0.24),这强调了在所有测试环境中进行表型测定以及对生长性状建立基因型与环境互作模型的必要性。消除CV集之间的亲缘关系会大幅降低所有性状和亚群体的预测准确性,在没有已知共同祖先的BG之间降至接近零。对于标记子集,观察到类似的模式,但预测准确性较低。
鉴于需要CV集之间有较高的亲缘关系以获得良好的预测准确性,我们建议仅在同一育种群体内构建用于预测的GS模型。只要总有效群体大小不超过50个个体,育种群体就可以合并以构建基因组预测模型,以便获得如本研究中所获得的高预测准确性。几百个数量的标记不会对预测准确性产生负面影响,但这些标记可能会在几代内更快地减少。基因组选择最有前景的短期方法可能是在通过营养繁殖的大型全同胞家系中选择优良个体,以实施多无性系林业。