Suppr超能文献

Tigmint:使用来自大分子量的连锁读取来修正组装错误。

Tigmint: correcting assembly errors using linked reads from large molecules.

机构信息

BC Cancer Genome Sciences Centre, Vancouver, V5Z 4S6, BC, Canada.

University of British Columbia, Michael Smith Laboratories, Vancouver, V6T 1Z4, BC, Canada.

出版信息

BMC Bioinformatics. 2018 Oct 26;19(1):393. doi: 10.1186/s12859-018-2425-6.

Abstract

BACKGROUND

Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap.

RESULTS

To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing.

CONCLUSIONS

Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.

摘要

背景

基因组测序从基因组中产生许多短 DNA 片段(reads)的序列。基因组组装试图从这些读取序列中重建原始基因组。由于测序数据中的缺口和错误、基础基因组中的重复序列以及杂合性,该任务具有挑战性。因此,组装错误很常见。在没有参考基因组的情况下,可以通过将测序数据与组装进行比较并寻找两者之间的差异来识别这些错误组装。一旦识别出这些错误组装,可以对其进行纠正,从而提高组装序列的质量。虽然存在使用 Illumina 配对末端和 mate-pair 测序来识别和纠正错误组装的工具,但还没有利用链接读取提供的大分子的长距离信息(如 10x Genomics Chromium 平台提供的信息)的此类工具。我们开发了工具 Tigmint 来解决这个差距。

结果

为了证明 Tigmint 的有效性,我们将其应用于使用 ABySS 2.0 和其他组装器组装的人类基因组。Tigmint 将 ABySS 组装中 QUAST 识别的错误组装数量减少了 216 个(27%)。虽然单独使用 ARCS 进行支架搭建将组装的支架 NGA50 从 3 增加到 8 Mbp,但 Tigmint 和 ARCS 的组合将组装的支架 NGA50 提高了五倍多,达到 16.4 Mbp。这种连续性的显著提高突出了组装纠正在改进组装方面的实用性。我们证明了 Tigmint 在纠正多个工具的组装以及使用 Chromium 读取纠正和支架搭建长单分子测序的组装方面的实用性。

结论

用 Tigmint 纠正的组装进行支架搭建,最终组装的准确性和连续性都比未纠正的组装要好得多。将单分子测序与链接读取结合使用,可以实现高序列连续性和高支架连续性的基因组序列组装,这是目前任何单一技术都无法实现的壮举。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7a7/6204047/cfec8d4ec7d6/12859_2018_2425_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验