Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.
BMC Bioinformatics. 2024 Jan 2;25(1):2. doi: 10.1186/s12859-023-05614-4.
Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms.
We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors.
ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.
对于没有参考基因组的生物,涉及转录组的研究通常首先从原始 RNA-seq 读取中生成从头转录组或超转录组组装。然而,由于 mRNA 转录本丰度变化显著、可变剪接和测序错误等原因,组装超转录组是一项具有挑战性的任务。因此,流行的从头超转录组组装工具生成的组装包含部分组装、碎片化、假嵌合体或局部组装错误的 contigs,从而降低了组装的准确性。常用的组装改进工具主要依赖于使用密切相关的物种运行 BLAST,其准确性和可靠性取决于密切相关生物体的数据可用性。
我们提出了 ROAST,这是一种用于优化超转录组组装的工具,它使用来自 Illumina 测序平台的配对末端 RNA-seq 数据,仅使用 RNA-seq 比对工具生成的错误特征(包括软剪辑、意外表达覆盖和未映射到同一 contig 的 mates 映射的 reads)来迭代地识别和修复组装错误,以识别和修复各种超转录组组装错误,而无需对其他生物体进行 BLAST 搜索。使用模拟和真实数据集的评估结果表明,ROAST 通过识别和修复各种组装错误,显著提高了组装质量。
ROAST 提供了一种无需参考基因组的方法来优化超转录组组装,突出了其在优化非模式生物的从头超转录组组装中的效用。