Suppr超能文献

SAMBA 工具利用长读段来提高基因组组装的连续性。

The SAMBA tool uses long reads to improve the contiguity of genome assemblies.

机构信息

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.

Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America.

出版信息

PLoS Comput Biol. 2022 Feb 4;18(2):e1009860. doi: 10.1371/journal.pcbi.1009860. eCollection 2022 Feb.

Abstract

Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.

摘要

第三代测序技术可以产生具有相对较高错误率的非常长的读段。这些读段的长度有时超过一百万碱基,对于解决使用较短读段无法组装的复杂重复序列非常有价值。许多高质量的基因组组装已经使用前一代测序数据进行了生成、整理和注释,并且使用长读段完全重新组装这些基因组并不总是可行或具有成本效益。一种升级现有组装的策略是使用长读段数据生成额外的覆盖,并将其添加到之前组装的 contigs 中。SAMBA 是一种设计用于使用额外的长读段数据对现有基因组组装进行支架和填补缺口的工具,从而大大提高了连续性。SAMBA 是唯一一种能够计算并填充支架中所有跨度缺口序列的工具,从而产生更长的 contigs。在这里,我们将 SAMBA 与几个能够使用长读段数据重新支架组装的类似工具进行比较,并表明 SAMBA 比竞争方法具有更好的连续性和更少的错误。SAMBA 是一个开源软件,可在 https://github.com/alekseyzimin/masurca 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/815c/8849508/dda59eca34e6/pcbi.1009860.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验