Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Science, Umeå, Sweden.
G3 (Bethesda). 2019 May 7;9(5):1623-1632. doi: 10.1534/g3.118.200840.
Norway spruce ( (L.) Karst.) is a conifer species of substanital economic and ecological importance. In common with most conifers, the genome is very large (∼20 Gbp) and contains a high fraction of repetitive DNA. The current genome assembly (v1.0) covers approximately 60% of the total genome size but is highly fragmented, consisting of >10 million scaffolds. The genome annotation contains 66,632 gene models that are at least partially validated (www.congenie.org), however, the fragmented nature of the assembly means that there is currently little information available on how these genes are physically distributed over the 12 chromosomes. By creating an ultra-dense genetic linkage map, we anchored and ordered scaffolds into linkage groups, which complements the fine-scale information available in assembly contigs. Our ultra-dense haploid consensus genetic map consists of 21,056 markers derived from 14,336 scaffolds that contain 17,079 gene models (25.6% of the validated gene models) that we have anchored to the 12 linkage groups. We used data from three independent component maps, as well as comparisons with previously published maps to evaluate the accuracy and marker ordering of the linkage groups. We demonstrate that approximately 3.8% of the anchored scaffolds and 1.6% of the gene models covered by the consensus map have likely assembly errors as they contain genetic markers that map to different regions within or between linkage groups. We further evaluate the utility of the genetic map for the conifer research community by using an independent data set of unrelated individuals to assess genome-wide variation in genetic diversity using the genomic regions anchored to linkage groups. The results show that our map is sufficiently dense to enable detailed evolutionary analyses across the genome.
挪威云杉((L.) Karst.)是一种具有重要经济和生态意义的针叶树种。与大多数针叶树一样,其基因组非常庞大(约 20 Gbp),且含有大量重复 DNA。当前的基因组组装(v1.0)大约覆盖了总基因组大小的 60%,但高度碎片化,由超过 1000 万个支架组成。基因组注释包含至少部分验证的 66632 个基因模型(www.congenie.org),然而,组装的碎片化意味着目前关于这些基因在 12 条染色体上的物理分布情况,几乎没有可用的信息。通过创建超密集遗传连锁图谱,我们将支架锚定并排列到连锁群中,这补充了组装 contigs 中提供的精细尺度信息。我们的超密集单倍体共识遗传图谱由 21056 个标记组成,这些标记源自包含 17079 个基因模型(验证基因模型的 25.6%)的 14336 个支架,我们已将这些基因模型锚定到 12 个连锁群上。我们使用了来自三个独立图谱组件的数据,以及与先前发表的图谱的比较,来评估连锁群的准确性和标记排序。我们证明,大约 3.8%的锚定支架和共识图谱中 1.6%的基因模型可能存在组装错误,因为它们包含的遗传标记映射到连锁群内或之间的不同区域。我们通过使用来自无关个体的独立数据集进一步评估了遗传图谱对针叶树研究界的实用性,以评估基因组范围内遗传多样性的变化,方法是使用锚定到连锁群的基因组区域。结果表明,我们的图谱足够密集,可以在整个基因组上进行详细的进化分析。