Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, 199004, 6 linia V.O. 11d, Russia.
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz100.
The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes.
Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers.
Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.
生成大型 RNA 测序数据集的可能性促使开发了各种基于参考和从头转录组组装器,它们各有优缺点。虽然基于参考的工具在各种转录组研究中得到了广泛应用,但它们的应用仅限于具有完成且注释良好的基因组的生物体。从短读序列中从头重建转录组仍然是一个悬而未决的挑战性问题,这是由于不同基因的表达水平、可变剪接和同源基因的差异造成的。
本文描述了一种新的转录组组装器 rnaSPAdes,它是在 SPAdes 基因组组装器的基础上开发的,并探索了转录组和单细胞基因组组装之间的计算并行性。我们还提供了 rnaSPAdes 组装的质量评估报告,使用多种评估方法在各种 RNA-seq 数据集上对其进行了比较,并简要地强调了不同组装器的优缺点。
根据不同组装方法之间的比较,我们推断根据所有质量指标和所有使用的数据集,不可能检测到绝对的领先者。然而,rnaSPAdes 通常通过组装的基因和异构体数量等重要属性优于其他组装器,同时与最接近的竞争对手相比,平均具有更高的准确性统计数据。