Plant Genome. 2019 Mar;12(1). doi: 10.3835/plantgenome2018.06.0044.
Mining crop genomic variation can facilitate the genetic research of complex traits and molecular breeding. In sorghum [ L. (Moench)], several large-scale single nucleotide polymorphism (SNP) datasets have been generated using genotyping-by-sequencing of KI reduced representation libraries. However, data reuse has been impeded by differences in reference genome coordinates among datasets. To facilitate reuse of these data, we constructed and characterized an integrated 459,304-SNP dataset for 10,323 sorghum genotypes on the version 3.1 reference genome. The SNP distribution showed high enrichment in subtelomeric chromosome arms and in genic regions (48% of SNPs) and was highly correlated ( = 0.82) to the distribution of KI restriction sites. The genetic structure reflected population differences by botanical race, as well as familial structure among recombinant inbred lines (RILs). Faster linkage disequilibrium decay was observed in the diversity panel than in the RILs, as expected, given the greater opportunity for recombination in diverse populations. To validate the quality and utility of the integrated SNP dataset, we used genome-wide association studies (GWAS) of genebank phenotype data, precisely mapping several known genes (e.g and ) and identifying novel associations for other traits. We further validated the dataset with GWAS of new and published plant height and flowering time data in a nested association mapping population, precisely mapping known genes and identifying epistatic interactions underlying both traits. These findings validate this integrated SNP dataset as a useful genomics resource for sorghum genetics and breeding.
挖掘作物基因组变异可以促进复杂性状的遗传研究和分子育种。在高粱[ L. (Moench)]中,已经使用 KI 简化基因组文库的测序基因型分析生成了几个大规模的单核苷酸多态性(SNP)数据集。然而,由于数据集之间参考基因组坐标的差异,数据的重复利用受到了阻碍。为了促进这些数据的重复利用,我们构建并描述了一个整合的 SNP 数据集,包含 10323 个高粱基因型在版本 3.1 参考基因组上的 459304 个 SNP。SNP 的分布在端粒染色体臂和基因区域(48%的 SNP)中高度富集,与 KI 限制位点的分布高度相关( = 0.82)。遗传结构反映了植物学品种、重组自交系(RIL)家族结构之间的群体差异。正如预期的那样,在多样性面板中观察到的连锁不平衡衰减速度快于 RILs,这是由于在多样化群体中重组的机会更多。为了验证整合 SNP 数据集的质量和效用,我们使用了基因库表型数据的全基因组关联研究(GWAS),精确地映射了几个已知基因(例如和),并确定了其他性状的新关联。我们还使用嵌套关联作图群体中的新和已发表的植物高度和开花时间数据的 GWAS 进一步验证了数据集,精确地映射了已知基因,并确定了这两个性状的上位性相互作用。这些发现验证了这个整合的 SNP 数据集是高粱遗传和育种的有用基因组资源。