Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
Chibas and Department of Agriculture and Environmental Sciences, Quisqueya University, Port-au-Prince, Haiti.
Plant Genome. 2020 Mar;13(1):e20009. doi: 10.1002/tpg2.20009. Epub 2020 Mar 25.
Successful management and utilization of increasingly large genomic datasets is essential for breeding programs to accelerate cultivar development. To help with this, we developed a Sorghum bicolor Practical Haplotype Graph (PHG) pangenome database that stores haplotypes and variant information. We developed two PHGs in sorghum that were used to identify genome-wide variants for 24 founders of the Chibas sorghum breeding program from 0.01x sequence coverage. The PHG called single nucleotide polymorphisms (SNPs) with 5.9% error at 0.01x coverage-only 3% higher than PHG error when calling SNPs from 8x coverage sequence. Additionally, 207 progenies from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes were imputed from PHG parental haplotypes and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from .57-.73 and are similar to prediction accuracies obtained with genotyping-by-sequencing or targeted amplicon sequencing (rhAmpSeq) markers. This study demonstrates the use of a sorghum PHG to impute SNPs from low-coverage sequence data and shows that the PHG can unify genotype calls across multiple sequencing platforms. By reducing input sequence requirements, the PHG can decrease the cost of genotyping, make GS more feasible, and facilitate larger breeding populations. Our results demonstrate that the PHG is a useful research and breeding tool that maintains variant information from a diverse group of taxa, stores sequence data in a condensed but readily accessible format, unifies genotypes across genotyping platforms, and provides a cost-effective option for genomic selection.
成功管理和利用日益庞大的基因组数据集对于加速品种开发的育种计划至关重要。为此,我们开发了一个高粱实用单倍型图(PHG)泛基因组数据库,该数据库存储单倍型和变异信息。我们在高粱中开发了两个 PHG,用于从 Chibas 高粱育种计划的 24 个亲本中识别全基因组变体,序列覆盖度为 0.01x。该 PHG 调用单核苷酸多态性(SNP)的错误率为 5.9%,仅比仅调用 0.01x 覆盖度时的 PHG 错误率高 3%。此外,来自 Chibas 基因组选择(GS)训练群体的 207 个后代进行了测序,并通过 PHG 进行了处理。缺失基因型从 PHG 亲本单倍型中推断出来,并用于基因组预测。使用 PHG SNP 调用的平均预测准确性范围为.57-.73,与使用测序或靶向扩增子测序(rhAmpSeq)标记进行基因组预测的准确性相似。本研究展示了使用高粱 PHG 从低覆盖度序列数据中推断 SNP,并表明 PHG 可以统一多个测序平台的基因型调用。通过降低输入序列要求,PHG 可以降低基因分型成本,使 GS 更可行,并促进更大的育种群体。我们的研究结果表明,PHG 是一种有用的研究和育种工具,它可以维护来自多个分类群的变异信息,以压缩但易于访问的格式存储序列数据,统一不同基因分型平台的基因型,并为基因组选择提供一种具有成本效益的选择。