Agriculture and Food, CSIRO, St Lucia, QLD, 4067, Australia.
Tasmanian Institute of Agriculture, University of Tasmania, Prospect, TAS, 7250, Australia.
Theor Appl Genet. 2020 Sep;133(9):2535-2544. doi: 10.1007/s00122-020-03615-y. Epub 2020 May 24.
We identified 1.844 million barley pan-genome sequence anchors from 12,306 genotypes using genetic mapping and machine learning. There is increasing evidence that genes from a given crop genotype are far to cover all genes in that species; thus, building more comprehensive pan-genomes is of great importance in genetic research and breeding. Obtaining a thousand-genotype scale pan-genome using deep-sequencing data is currently impractical for species like barley which has a huge and highly repetitive genome. To this end, we attempted to identify barley pan-genome sequence anchors from a large quantity of genotype-by-sequencing (GBS) datasets by combining genetic mapping and machine learning algorithms. Based on the GBS sequences from 11,166 domesticated and 1140 wild barley genotypes, we identified 1.844 million pan-genome sequence anchors. Of them, 532,253 were identified as presence/absence variation (PAV) tags. Through aligning these PAV tags to the genome of hulless barley genotype Zangqing320, our analysis resulted in a validation of 83.6% of them from the domesticated genotypes and 88.6% from the wild barley genotypes. Association analyses against flowering time, plant height and kernel size showed that the relative importance of the PAV and non-PAV tags varied for different traits. The pan-genome sequence anchors based on GBS tags can facilitate the construction of a comprehensive pan-genome and greatly assist various genetic studies including identification of structural variation, genetic mapping and breeding in barley.
我们使用遗传图谱和机器学习从 12306 个基因型中鉴定出 184.4 万个大麦泛基因组序列锚。越来越多的证据表明,给定作物基因型的基因远远不能覆盖该物种的所有基因;因此,构建更全面的泛基因组在遗传研究和育种中非常重要。对于像大麦这样基因组巨大且高度重复的物种,使用深度测序数据获得千个基因型规模的泛基因组目前是不切实际的。为此,我们试图通过结合遗传图谱和机器学习算法,从大量基因型测序(GBS)数据中鉴定大麦泛基因组序列锚。基于 11166 个驯化和 1140 个野生大麦基因型的 GBS 序列,我们鉴定出了 184.4 万个泛基因组序列锚。其中,532253 个被鉴定为存在/缺失变异(PAV)标记。通过将这些 PAV 标记与无壳大麦基因型 Zangqing320 的基因组进行比对,我们的分析验证了其中 83.6%来自驯化基因型,88.6%来自野生大麦基因型。对开花时间、株高和籽粒大小的关联分析表明,PAV 和非 PAV 标记的相对重要性因不同性状而异。基于 GBS 标记的泛基因组序列锚可以促进全面泛基因组的构建,并极大地协助大麦中的各种遗传研究,包括结构变异的鉴定、遗传图谱和育种。