Institute of Quantitative Genetics and Genomics of Plants, Heinrich Heine University, Düsseldorf, 40225, Germany.
Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University, Düsseldorf, 40225, Germany.
BMC Genomics. 2022 Mar 12;23(1):200. doi: 10.1186/s12864-022-08337-7.
Genomic prediction (GP) based on single nucleotide polymorphisms (SNP) has become a broadly used tool to increase the gain of selection in plant breeding. However, using predictors that are biologically closer to the phenotypes such as transcriptome and metabolome may increase the prediction ability in GP. The objectives of this study were to (i) assess the prediction ability for three yield-related phenotypic traits using different omic datasets as single predictors compared to a SNP array, where these omic datasets included different types of sequence variants (full-SV, deleterious-dSV, and tolerant-tSV), different types of transcriptome (expression presence/absence variation-ePAV, gene expression-GE, and transcript expression-TE) sampled from two tissues, leaf and seedling, and metabolites (M); (ii) investigate the improvement in prediction ability when combining multiple omic datasets information to predict phenotypic variation in barley breeding programs; (iii) explore the predictive performance when using SV, GE, and ePAV from simulated 3'end mRNA sequencing of different lengths as predictors.
The prediction ability from genomic best linear unbiased prediction (GBLUP) for the three traits using dSV information was higher than when using tSV, all SV information, or the SNP array. Any predictors from the transcriptome (GE, TE, as well as ePAV) and metabolome provided higher prediction abilities compared to the SNP array and SV on average across the three traits. In addition, some (di)-similarity existed between different omic datasets, and therefore provided complementary biological perspectives to phenotypic variation. Optimal combining the information of dSV, TE, ePAV, as well as metabolites into GP models could improve the prediction ability over that of the single predictors alone.
The use of integrated omic datasets in GP model is highly recommended. Furthermore, we evaluated a cost-effective approach generating 3'end mRNA sequencing with transcriptome data extracted from seedling without losing prediction ability in comparison to the full-length mRNA sequencing, paving the path for the use of such prediction methods in commercial breeding programs.
基于单核苷酸多态性(SNP)的基因组预测(GP)已成为提高植物育种选择增益的广泛应用工具。然而,使用与表型更接近的预测因子,如转录组和代谢组,可能会提高 GP 中的预测能力。本研究的目的是:(i)评估使用不同的组学数据集作为单一预测因子与 SNP 阵列相比,对三个与产量相关的表型性状的预测能力,其中这些组学数据集包括不同类型的序列变体(全-SV、有害-dSV 和耐受-tSV)、来自两个组织(叶片和幼苗)的不同类型的转录组(表达存在/缺失变异-ePAV、基因表达-GE 和转录表达-TE)和代谢物(M);(ii)研究在大麦育种计划中结合多个组学数据集信息来预测表型变异时预测能力的提高;(iii)探索使用来自不同长度的 3'端 mRNA 测序的 SV、GE 和 ePAV 作为预测因子时的预测性能。
使用 dSV 信息进行基因组最佳线性无偏预测(GBLUP)时,三个性状的预测能力高于使用 tSV、所有 SV 信息或 SNP 阵列时的预测能力。来自转录组(GE、TE 以及 ePAV)和代谢组的任何预测因子的预测能力平均而言均高于 SNP 阵列和 SV,适用于三个性状。此外,不同的组学数据集之间存在一些(二)相似性,因此为表型变异提供了互补的生物学视角。将 dSV、TE、ePAV 以及代谢物的信息最优地组合到 GP 模型中,可以提高单一预测因子的预测能力。
强烈建议在 GP 模型中使用综合组学数据集。此外,我们评估了一种具有成本效益的方法,即使用从幼苗中提取的转录组数据生成 3'端 mRNA 测序,而不会降低与全长 mRNA 测序相比的预测能力,为在商业育种计划中使用此类预测方法铺平了道路。