Putnam Nicholas H, O'Connell Brendan L, Stites Jonathan C, Rice Brandon J, Blanchette Marco, Calef Robert, Troll Christopher J, Fields Andrew, Hartley Paul D, Sugnet Charles W, Haussler David, Rokhsar Daniel S, Green Richard E
Dovetail Genomics LLC, Santa Cruz, California 95060, USA;
Dovetail Genomics LLC, Santa Cruz, California 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, California 95066, USA;
Genome Res. 2016 Mar;26(3):342-50. doi: 10.1101/gr.193474.115. Epub 2016 Feb 4.
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem, dramatically increasing the scaffold contiguity of assemblies. Here, we describe a simpler approach ("Chicago") based on in vitro reconstituted chromatin. We generated two Chicago data sets with human DNA and developed a statistical model and a new software pipeline ("HiRise") that can identify poor quality joins and produce accurate, long-range sequence scaffolds. We used these to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 20 Mbp. We also demonstrated the utility of Chicago for improving existing assemblies by reassembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kbp to 10 Mbp.
从短读长数据进行长距离、高精度的从头组装是基因组学中最紧迫的挑战之一。最近的研究表明,通过对活组织染色质中的DNA进行邻近连接产生的读对可以解决这一问题,显著提高组装的支架连续性。在此,我们描述了一种基于体外重构染色质的更简单方法(“Chicago”)。我们利用人类DNA生成了两个Chicago数据集,并开发了一种统计模型和一个新的软件流程(“HiRise”),该流程可以识别质量较差的连接并生成准确的长距离序列支架。我们利用这些构建了一个支架N50为20 Mbp的人类基因组的高精度从头组装和支架构建。我们还通过对美国短吻鳄基因组进行重新组装和支架构建,证明了Chicago在改进现有组装方面的实用性。仅使用一个文库和Illumina HiSeq测序的一个泳道,我们就将美国短吻鳄的支架N50从508 kbp提高到了10 Mbp。