Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, K1A 0C6, Canada.
Department of Biology, University of Ottawa, Ottawa, ON, K1N 6N5, Canada.
BMC Res Notes. 2023 Sep 14;16(1):220. doi: 10.1186/s13104-023-06496-8.
The 1,000 wheat exome project captured the single nucleotide variants in the coding regions of a diverse set of 890 wheat accessions to analyse the contribution of introgression to adaptation of wheat. However, this highly useful single nucleotide polymorphism (SNP) dataset is based on RefSeq v1.0 of the International Wheat Genome Sequencing Consortium (IWGSC) assembly of the bread wheat genome of Chinese Spring. This reference sequence has recently been updated using optical maps and long-read sequencing to produce the improved RefSeq v2.1. Our objective was to develop a reliable high-density SNP dataset positioned onto RefSeq v2.1 because it is the current standard reference sequence used by wheat researchers.
The 3,039,822 SNPs originally positioned on RefSeq v1.0 were projected to v2.1 using Liftoff with four different flanking regions, and 2,946,536 SNPs were consistently lifted to the same location irrespective of the flanking region lengths. Of these, 2,799,166 were located on the '+' ve strand. The distribution of the SNPs across the 21 chromosomes on RefSeq v2.1 was similar to that of RefSeq v1.0. Among the SNPs that were based on unanchored scaffolds in RefSeq v1.0, 11,938 were projected to one of the 21 pseudomolecules in the upgraded assembly. This SNP dataset constitutes a much-needed standardized resource for the wheat research community.
1000 个小麦外显子项目捕获了 890 个小麦品种的编码区的单核苷酸变异,以分析渐渗对小麦适应的贡献。然而,这个非常有用的单核苷酸多态性(SNP)数据集是基于国际小麦基因组测序联盟(IWGSC)对春小麦基因组的参考序列版本 1.0 构建的。这个参考序列最近已经使用光学图谱和长读测序进行了更新,产生了改进的参考序列版本 2.1。我们的目标是开发一个可靠的高密度 SNP 数据集,定位到 RefSeq v2.1,因为它是当前小麦研究人员使用的标准参考序列。
最初定位在 RefSeq v1.0 上的 3039822 个 SNP 使用 Liftoff 软件根据四个不同的侧翼区域投射到 v2.1 上,有 2946536 个 SNP 无论侧翼区域的长度如何都被一致地投射到相同的位置。其中,2799166 个位于正链上。在 RefSeq v2.1 上的 21 条染色体上,SNP 的分布与 RefSeq v1.0 相似。在 RefSeq v1.0 中基于未锚定支架的 SNP 中,有 11938 个被投射到升级组装的 21 个假染色体之一。这个 SNP 数据集构成了小麦研究界急需的标准化资源。