Suppr超能文献

扩展 rnaSPAdes 功能以进行混合转录组组装。

Extending rnaSPAdes functionality for hybrid transcriptome assembly.

机构信息

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.

Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy.

出版信息

BMC Bioinformatics. 2020 Jul 24;21(Suppl 12):302. doi: 10.1186/s12859-020-03614-2.

Abstract

BACKGROUND

De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data.

RESULTS

In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data.

CONCLUSION

To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

摘要

背景

当参考基因组不可用或注释较差时,从头 RNA-Seq 组装是分析转录组的一种强大方法。然而,由于 Illumina 读取的长度较短,通常不可能重建复杂基因和替代异构体的完整序列。最近出现的生成长 RNA 读取的可能性,例如 PacBio 和 Oxford Nanopores,可以极大地提高组装质量,从而提高后续分析的质量。虽然最近开发了用于分析长 RNA 读取的基于参考的工具,但尚无用于从头组装此类数据的既定流水线。

结果

在这项工作中,我们提出了一种新方法,该方法通过将短读的准确性和可靠性与从长易错读中提取的外显子结构信息结合起来,从而实现高质量的从头转录组组装。该算法的设计是通过将现有的混合 SPAdes 方法纳入 rnaSPAdes 流水线并对其进行调整以适应转录组数据来实现的。

结论

为了评估使用长 RNA 读取的好处,我们选择了包含 Illumina 和 Iso-seq 或 Oxford Nanopore Technologies (ONT) 读取的几个数据集。使用现有的质量评估软件,我们表明,与仅使用短读数据的情况相比,使用 rnaSPAdes 进行的混合组装包含更多的全长基因和替代异构体。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验