Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA.
Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.
Genome Res. 2018 Nov;28(11):1720-1732. doi: 10.1101/gr.236273.118. Epub 2018 Oct 19.
Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the and genomes, which exhibit karyotype-scale variations compared with other genomes from the family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.
尽管测序技术发展迅速,但将哺乳动物规模的基因组组装成完整的染色体仍然是生物信息学中最具挑战性的问题之一。为了帮助解决这个难题,我们开发了 Ragout 2,这是一种参考辅助组装工具,适用于大型和复杂的基因组。Ragout 2 通过采用一个或多个目标组装体(由 NGS 组装器生成)和一个或多个相关参考基因组,推断基因组之间的进化关系,并使用基因组重排方法构建最终组装体。通过使用 Ragout 2,我们将 16 个实验室小鼠品系的 NGS 组装体转化为一组完整的染色体,每个组装体的序列未定位部分 <5%。各种基准测试,包括 PCR 测试和长 Pacific Biosciences (PacBio) 读取的重新对齐,表明最终组装体中只有少数结构错误,与直接组装方法相当。我们将 Ragout 2 应用于 和 基因组,它们与 家族中的其他基因组相比表现出染色体规模的变异。染色体涂染图谱证实了 Ragout 2 检测到的大多数大规模重排。我们将 Ragout 2 应用于最近发表的三个猿类基因组的草图序列的改进。Ragout 2 将三组 contigs(仅使用 PacBio 读取生成)转化为染色体规模的组装体,其准确性与原始研究中使用 BioNano 图谱、Hi-C、BAC 克隆和 FISH 生成的染色体组装体相当。