Suppr超能文献

SpLitter:利用 TELL-Seq 连接读取和组装图进行二倍体基因组组装。

SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs.

机构信息

Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.

Universal Sequencing Technology Corporation, Carlsbad, California, United States.

出版信息

PeerJ. 2024 Sep 27;12:e18050. doi: 10.7717/peerj.18050. eCollection 2024.

Abstract

BACKGROUND

Recent advances in long-read sequencing technologies enabled accurate and contiguous assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions. To generate the telomere-to-telomere assemblies of diploid genomes, biologists now construct their HiFi-based phased assemblies and use additional experimental technologies to transform them into more contiguous diploid assemblies. The barcoded linked-reads, generated using an inexpensive TELL-Seq technology, provide an attractive way to bridge unresolved repeats in phased assemblies of diploid genomes.

RESULTS

We developed the SpLitteR tool for diploid genome assembly using linked-reads and assembly graphs and benchmarked it against state-of-the-art linked-read scaffolders ARKS and SLR-superscaffolder using human HG002 genome and sheep gut microbiome datasets. The benchmark showed that SpLitteR scaffolding results in 1.5-fold increase in NGA50 compared to the baseline LJA assembly and other scaffolders while introducing no additional misassemblies on the human dataset.

CONCLUSION

We developed the SpLitteR tool for assembly graph phasing and scaffolding using barcoded linked-reads. We benchmarked SpLitteR on assembly graphs produced by various long-read assemblers and have demonstrated that TELL-Seq reads facilitate phasing and scaffolding in these graphs. This benchmarking demonstrates that SpLitteR improves upon the state-of-the-art linked-read scaffolders in the accuracy and contiguity metrics. SpLitteR is implemented in C++ as a part of the freely available SPAdes package and is available at https://github.com/ablab/spades/releases/tag/splitter-preprint.

摘要

背景

最近长读测序技术的进步使得对大型基因组和宏基因组进行准确且连续的组装成为可能。然而,即使是长而准确的高保真(HiFi)reads 也无法解决长于 read 长度的重复序列。这种限制会影响二倍体基因组组装的连续性,因为两个单倍体共享许多长的相同区域。为了生成二倍体基因组的端到端组装,生物学家现在构建基于 HiFi 的相组装,并使用额外的实验技术将其转化为更连续的二倍体组装。使用廉价的 TELL-Seq 技术生成的带条形码的链接读取为解决二倍体基因组相组装中未解决的重复序列提供了一种有吸引力的方法。

结果

我们开发了 SpLitteR 工具,用于使用链接读取和组装图进行二倍体基因组组装,并使用人类 HG002 基因组和绵羊肠道微生物组数据集对其进行了基准测试,与最先进的链接读取支架器 ARKS 和 SLR-superscaffolder 进行了基准测试。基准测试表明,与基线 LJA 组装和其他支架器相比,SpLitteR 支架可将 NGA50 提高 1.5 倍,而在人类数据集上不会引入额外的错误组装。

结论

我们开发了用于使用带条形码的链接读取进行组装图相和支架的 SpLitteR 工具。我们在由各种长读组装器生成的组装图上对 SpLitteR 进行了基准测试,并证明了 TELL-Seq 读取有助于这些图中的相和支架。这项基准测试表明,SpLitteR 在准确性和连续性指标上优于最先进的链接读取支架器。SpLitteR 是作为免费提供的 SPAdes 包的一部分用 C++ 实现的,可在 https://github.com/ablab/spades/releases/tag/splitter-preprint 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e6c/11441382/6c385d60cc14/peerj-12-18050-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验