Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Proc Natl Acad Sci U S A. 2013 Jan 29;110(5):1785-90. doi: 10.1073/pnas.1220349110. Epub 2013 Jan 10.
One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed "reference-assisted chromosome assembly" (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads. Evaluation of results using simulated and real genome assemblies indicates that our approach can substantially improve genomes generated by a wide variety of de novo assemblers if a good reference assembly of a closely related species and outgroup genomes are available. We used RACA to reconstruct 60 Tibetan antelope (Pantholops hodgsonii) chromosome fragments from 1,434 SOAPdenovo sequence scaffolds, of which 16 chromosome fragments were homologous to complete cattle chromosomes. Experimental validation by PCR showed that predictions made by RACA are highly accurate. Our results indicate that RACA will significantly facilitate the study of chromosome evolution and genome rearrangements for the large number of genomes being sequenced by NGS that do not have a genetic or physical map.
现代基因组学中最困难的问题之一是使用下一代测序(NGS)数据组装全长染色体。为了解决这个问题,我们开发了“参考辅助染色体组装”(RACA),这是一种算法,可以使用比较基因组信息和配对末端读取,可靠地将 NGS 和组装器生成的序列支架排列和定向为更长的染色体片段。使用模拟和真实基因组组装评估结果表明,如果有一个密切相关物种和外群基因组的良好参考组装,我们的方法可以大大改进由各种从头组装器生成的基因组。我们使用 RACA 从 1434 个 SOAPdenovo 序列支架重建了 60 个藏羚羊(Pantholops hodgsonii)染色体片段,其中 16 个染色体片段与完整的牛染色体同源。通过 PCR 进行的实验验证表明,RACA 的预测非常准确。我们的结果表明,RACA 将极大地促进对大量通过 NGS 测序但没有遗传或物理图谱的基因组的染色体进化和基因组重排的研究。