Roast：一种用于无参考超级转录组组装优化的工具。

Roast: a tool for reference-free optimization of supertranscriptome assemblies.

机构信息

Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.

出版信息

BMC Bioinformatics. 2024 Jan 2;25(1):2. doi: 10.1186/s12859-023-05614-4.

DOI:10.1186/s12859-023-05614-4

PMID:38166712

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10763045/

Abstract

BACKGROUND

Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms.

RESULTS

We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors.

CONCLUSION

ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.

摘要

背景

对于没有参考基因组的生物，涉及转录组的研究通常首先从原始 RNA-seq 读取中生成从头转录组或超转录组组装。然而，由于 mRNA 转录本丰度变化显著、可变剪接和测序错误等原因，组装超转录组是一项具有挑战性的任务。因此，流行的从头超转录组组装工具生成的组装包含部分组装、碎片化、假嵌合体或局部组装错误的 contigs，从而降低了组装的准确性。常用的组装改进工具主要依赖于使用密切相关的物种运行 BLAST，其准确性和可靠性取决于密切相关生物体的数据可用性。

结果

我们提出了 ROAST，这是一种用于优化超转录组组装的工具，它使用来自 Illumina 测序平台的配对末端 RNA-seq 数据，仅使用 RNA-seq 比对工具生成的错误特征（包括软剪辑、意外表达覆盖和未映射到同一 contig 的 mates 映射的 reads）来迭代地识别和修复组装错误，以识别和修复各种超转录组组装错误，而无需对其他生物体进行 BLAST 搜索。使用模拟和真实数据集的评估结果表明，ROAST 通过识别和修复各种组装错误，显著提高了组装质量。

结论

ROAST 提供了一种无需参考基因组的方法来优化超转录组组装，突出了其在优化非模式生物的从头超转录组组装中的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfd4/10763045/26f7898fc641/12859_2023_5614_Fig1_HTML.jpg

相似文献

Roast: a tool for reference-free optimization of supertranscriptome assemblies.

BMC Bioinformatics. 2024 Jan 2;25(1):2. doi: 10.1186/s12859-023-05614-4.

Comparative performance of transcriptome assembly methods for non-model organisms.

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

A comparison of next generation sequencing technologies for transcriptome assembly and utility for RNA-Seq in a non-model bird.

PLoS One. 2014 Oct 3;9(10):e108550. doi: 10.1371/journal.pone.0108550. eCollection 2014.

TransRate: reference-free quality assessment of de novo transcriptome assemblies.

Genome Res. 2016 Aug;26(8):1134-44. doi: 10.1101/gr.196469.115. Epub 2016 Jun 1.

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance.

BMC Genomics. 2011 Jun 16;12:317. doi: 10.1186/1471-2164-12-317.

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

SQUAT: a Sequencing Quality Assessment Tool for data quality assessments of genome assemblies.

BMC Genomics. 2019 Apr 18;19(Suppl 9):238. doi: 10.1186/s12864-019-5445-3.

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms.

BMC Bioinformatics. 2012 Jul 18;13:170. doi: 10.1186/1471-2105-13-170.

De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

Genetica. 2015 Apr;143(2):225-39. doi: 10.1007/s10709-014-9790-5. Epub 2014 Sep 19.

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads.

BMC Bioinformatics. 2015 Nov 16;16:386. doi: 10.1186/s12859-015-0818-3.

本文引用的文献

A simple guide to de novo transcriptome assembly and annotation.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab563.

RNA sequencing describes both population structure and plasticity-selection dynamics in a non-model fish.

BMC Genomics. 2021 Apr 15;22(1):273. doi: 10.1186/s12864-021-07592-4.

Error, noise and bias in de novo transcriptome assemblies.

Mol Ecol Resour. 2021 Jan;21(1):18-29. doi: 10.1111/1755-0998.13156. Epub 2020 Apr 13.

rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.

Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz100.

Utilization of Tissue Ploidy Level Variation in Transcriptome Assembly of .

G3 (Bethesda). 2019 Oct 7;9(10):3409-3421. doi: 10.1534/g3.119.400357.

Effect of de novo transcriptome assembly on transcript quantification.

Sci Rep. 2019 Jun 5;9(1):8304. doi: 10.1038/s41598-019-44499-3.

TransLiG: a de novo transcriptome assembler that uses line graph iteration.

Genome Biol. 2019 Apr 23;20(1):81. doi: 10.1186/s13059-019-1690-7.

Minimap2: pairwise alignment for nucleotide sequences.

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.

Bioinformatics. 2018 Oct 1;34(19):3265-3272. doi: 10.1093/bioinformatics/bty378.

SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes.

Genome Biol. 2017 Aug 4;18(1):148. doi: 10.1186/s13059-017-1284-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Roast：一种用于无参考超级转录组组装优化的工具。

Roast: a tool for reference-free optimization of supertranscriptome assemblies.

机构信息

Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.