Arlt Christopher, van Inghelandt Delphine, Li Jinquan, Stich Benjamin
Res Sq. 2025 Mar 31:rs.3.rs-6145169. doi: 10.21203/rs.3.rs-6145169/v1.
The field of genomic selection (GS) is advancing rapidly on many fronts including the utilization of multi-omics datasets with the goal to increase prediction ability (PA) and to become an integral part of an increasing number of breeding programs ensuring future food security. In this study, we used RNA sequencing (RNA-Seq) data to perform genomic prediction (GP) on three related barley RIL populations investigating the potential of increasing PA by combining genomic and transcriptomic datasets, adding whole genome sequencing (WGS) SNP data, functional parameter filtering, and empirical quality filtering. Our RNA-Seq data were generated cost-efficiently using small footprint plant cultivation, high-throughput RNA extraction, and library preparation miniaturization. We also examined the depth of the sequencing as an additional cost-saving measure. We used five-fold cross-validation to evaluate the PA of the gene expression dataset, the RNA-Seq SNP dataset, and the consensus SNP dataset between the RNA-Seq and parental WGS data, resulting in PAs between 0.73 and 0.78. The consensus SNP dataset performed best, with five out of eight traits performing significantly better compared to a 50K SNP array, which served as a benchmark. The advantage of the consensus SNP dataset was most prominent in the inter-population predictions, in which the training- and validation-set originated from different RIL sub-populations. We could therefore not only show that RNA-Seq data alone are able to predict various complex traits in barley using RIL, but also that the performance can be further increased by WGS data for which the public availability will steadily increase.
基因组选择(GS)领域在许多方面都在迅速发展,包括利用多组学数据集,目标是提高预测能力(PA),并成为越来越多育种计划的一个组成部分,以确保未来的粮食安全。在本研究中,我们使用RNA测序(RNA-Seq)数据对三个相关的大麦重组自交系(RIL)群体进行基因组预测(GP),研究通过整合基因组和转录组数据集、添加全基因组测序(WGS)SNP数据、功能参数过滤和经验质量过滤来提高PA的潜力。我们的RNA-Seq数据是通过小规模植物种植、高通量RNA提取和文库制备小型化高效低成本生成的。我们还研究了测序深度作为一种额外的成本节约措施。我们使用五重交叉验证来评估基因表达数据集、RNA-Seq SNP数据集以及RNA-Seq与亲本WGS数据之间的一致性SNP数据集的PA,结果PA在0.73至0.78之间。一致性SNP数据集表现最佳,八个性状中有五个与作为基准的50K SNP阵列相比表现显著更好。一致性SNP数据集的优势在群体间预测中最为突出,其中训练集和验证集来自不同的RIL亚群体。因此,我们不仅可以证明仅RNA-Seq数据就能利用RIL预测大麦中的各种复杂性状,而且通过WGS数据可以进一步提高性能,而WGS数据的公开可用性将稳步增加。