Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia.
Nat Genet. 2020 Nov;52(11):1256-1264. doi: 10.1038/s41588-020-00717-7. Epub 2020 Oct 30.
Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.
尽管测序技术取得了进步,但由于多倍体和高重复含量,复杂植物基因组的组装仍然难以实现。在这里,我们报告了 PolyGembler,它通过遗传连锁分析将 contigs 分组并排序为拟南芥基因组。我们的方法还提供了一种准确的方法来检测和修复组装错误。使用模拟数据,我们证明我们的方法具有很高的准确性,并优于三种现有的最先进的遗传作图工具。特别是,我们的方法对缺失基因型数据和基因分型错误更具鲁棒性。我们使用我们的方法构建了利用 PacBio 长读长和限制酶相关 DNA 测序构建的异源四倍体草坪草的拟南芥基因组,以及利用 Illumina 读长组装并结合单核苷酸多态性阵列生成的基因型数据和测序基因分型构建的二倍体 Ipomoea trifida 和自四倍体马铃薯的拟南芥基因组。我们解决了已发布的 I. trifida 基因组组装中的 13 个组装错误,并将发布的马铃薯基因组中的 8 个未定位支架固定。