School of Biological Sciences, The University of Western Australia, Crawley, WA, Australia.
School of Agriculture and Food Sciences, University of Queensland, St. Lucia, Qld, Australia.
Plant Biotechnol J. 2017 Dec;15(12):1602-1610. doi: 10.1111/pbi.12742. Epub 2017 Jun 14.
As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome.
随着越来越多的植物基因组序列可用,很明显个体之间的基因内容存在差异,因此需要预测一个物种的基因内容。然而,基因组比较往往因组装和注释的变化而变得复杂。区分真正的基因缺失和组装或注释的变化对于准确识别物种中保守和可变基因至关重要。在这里,我们对甘蓝型油菜品种 Tapidor 进行了从头组装,并与 Brassica napus 品种 Darmor-bzh 的改进组装进行了比较。两个品种都使用相同的方法进行注释,以便比较基因内容。我们鉴定了每个品种特有的基因,并将这些基因与由于组装和注释的变化而产生的假基因区分开来。我们证明,即使对于密切相关的品种,使用通用注释管道也会导致不同的基因预测,并且在组装过程中崩溃的重复区域会影响全基因组比较。在考虑了组装和注释的差异后,我们证明 Darmor-bzh 的基因组比 Tapidor 的基因组包含更多的基因。我们的结果是首次对甘蓝型油菜基因组之间的真实差异进行比较,并强调了在未来生产甘蓝型油菜泛基因组时可能出现的错误来源。