Suppr超能文献

多年生黑麦草(Lolium perenne L.)多亲本家系池和单株基因组预测中低深度测序基因分型的优化应用

Optimized Use of Low-Depth Genotyping-by-Sequencing for Genomic Prediction Among Multi-Parental Family Pools and Single Plants in Perennial Ryegrass ( L.).

作者信息

Cericola Fabio, Lenk Ingo, Fè Dario, Byrne Stephen, Jensen Christian S, Pedersen Morten G, Asp Torben, Jensen Just, Janss Luc

机构信息

Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark.

DLF Seeds A/S, Research Division, Store Heddinge, Denmark.

出版信息

Front Plant Sci. 2018 Mar 21;9:369. doi: 10.3389/fpls.2018.00369. eCollection 2018.

Abstract

Ryegrass single plants, bi-parental family pools, and multi-parental family pools are often genotyped, based on allele-frequencies using genotyping-by-sequencing (GBS) assays. GBS assays can be performed at low-coverage depth to reduce costs. However, reducing the coverage depth leads to a higher proportion of missing data, and leads to a reduction in accuracy when identifying the allele-frequency at each locus. As a consequence of the latter, genomic relationship matrices (GRMs) will be biased. This bias in GRMs affects variance estimates and the accuracy of GBLUP for genomic prediction (GBLUP-GP). We derived equations that describe the bias from low-coverage sequencing as an effect of binomial sampling of sequence reads, and allowed for any ploidy level of the sample considered. This allowed us to combine individual and pool genotypes in one GRM, treating pool-genotypes as a polyploid genotype, equal to the total ploidy-level of the parents of the pool. Using simulated data, we verified the magnitude of the GRM bias at different coverage depths for three different kinds of ryegrass breeding material: individual genotypes from single plants, pool-genotypes from F families, and pool-genotypes from synthetic varieties. To better handle missing data, we also tested imputation procedures, which are suited for analyzing allele-frequency genomic data. The relative advantages of the bias-correction and the imputation of missing data were evaluated using real data. We examined a large dataset, including single plants, F families, and synthetic varieties genotyped in three GBS assays, each with a different coverage depth, and evaluated them for heading date, crown rust resistance, and seed yield. Cross validations were used to test the accuracy using GBLUP approaches, demonstrating the feasibility of predicting among different breeding material. Bias-corrected GRMs proved to increase predictive accuracies when compared with standard approaches to construct GRMs. Among the imputation methods we tested, the random forest method yielded the highest predictive accuracy. The combinations of these two methods resulted in a meaningful increase of predictive ability (up to 0.09). The possibility of predicting across individuals and pools provides new opportunities for improving ryegrass breeding schemes.

摘要

黑麦草单株、双亲家系池和多亲家系池通常基于测序基因分型(GBS)分析的等位基因频率进行基因分型。GBS分析可以在低覆盖深度下进行以降低成本。然而,降低覆盖深度会导致更高比例的缺失数据,并在识别每个位点的等位基因频率时导致准确性降低。后者的结果是,基因组关系矩阵(GRM)将产生偏差。GRM中的这种偏差会影响方差估计以及基因组预测的GBLUP(GBLUP-GP)的准确性。我们推导了将低覆盖测序偏差描述为序列读取二项式抽样效应的方程,并考虑了所研究样本的任何倍性水平。这使我们能够在一个GRM中合并个体和池基因型,将池基因型视为多倍体基因型,等同于池亲本的总倍性水平。使用模拟数据,我们验证了三种不同类型黑麦草育种材料在不同覆盖深度下GRM偏差的大小:单株的个体基因型、F家系的池基因型和合成品种的池基因型。为了更好地处理缺失数据,我们还测试了适用于分析等位基因频率基因组数据的插补程序。使用实际数据评估了偏差校正和缺失数据插补的相对优势。我们检查了一个大型数据集,包括在三种GBS分析中进行基因分型的单株、F家系和合成品种,每种分析具有不同的覆盖深度,并对头期、冠锈病抗性和种子产量进行了评估。使用交叉验证来测试GBLUP方法的准确性,证明了在不同育种材料之间进行预测的可行性。与构建GRM的标准方法相比,偏差校正的GRM被证明可以提高预测准确性。在我们测试的插补方法中,随机森林方法产生了最高的预测准确性。这两种方法的组合导致预测能力有意义地提高(高达0.09)。跨个体和池进行预测为改进黑麦草育种方案提供了新机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7487/5871745/cf5ea2ca124c/fpls-09-00369-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验