优化从头转录组组装从短读 RNA-Seq 数据：一项比较研究。

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

机构信息

Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China.

出版信息

BMC Bioinformatics. 2011 Dec 14;12 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-12-S14-S2.

DOI:10.1186/1471-2105-12-S14-S2

PMID:22373417

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287467/

Abstract

BACKGROUND

With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.

RESULTS

To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.

CONCLUSIONS

Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

摘要

背景

随着下一代测序技术的快速发展，高通量 RNA 测序已成为研究转录组的一种强大且经济有效的方法。从头组装转录本为没有参考基因组的生物的转录组分析提供了重要的解决方案。然而，人们对不同变量如何影响组装结果缺乏了解，也没有就如何根据 RNA-Seq 数据的特性通过选择软件工具和合适的策略来获得最佳解决方案达成共识。

结果

为了揭示不同程序在转录组组装方面的性能，本工作分析了一些重要因素，包括 k-mer 值、基因组复杂度、覆盖深度、定向读取等。测试了七种程序条件、四种单 k-mer 组装器（SK：SOAPdenovo、ABySS、Oases 和 Trinity）和三种多 k-mer 方法（MK：SOAPdenovo-MK、trans-ABySS 和 Oases-MK）。虽然小和大 k-mer 值分别更适合于重建低表达和高表达转录本，但 MK 策略几乎适用于所有表达五分位的范围。在 SK 工具中，Trinity 在各种条件下表现良好，但运行时间最长。Oases 消耗的内存最多，而 SOAPdenovo 运行时间最短，但重建全长 CDS 的效果不佳。ABySS 在资源使用和组装质量之间显示出一些良好的平衡。

结论

本工作比较了公开可用的转录组组装程序的性能，并分析了影响从头组装的重要因素。提出了一些从短读 RNA-Seq 数据进行转录本重建的实用指南。使用一些优化方法大大提高了中华绒螯蟹转录组的从头组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2703/3287467/6ee280a42337/1471-2105-12-S14-S2-1.jpg

相似文献

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

BMC Bioinformatics. 2011 Dec 14;12 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-12-S14-S2.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

Optimizing de novo assembly of short-read RNA-seq data for phylogenomics.

BMC Genomics. 2013 May 14;14:328. doi: 10.1186/1471-2164-14-328.

Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data.

BMC Genomics. 2012 Aug 14;13:392. doi: 10.1186/1471-2164-13-392.

De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz039.

Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species.

BMC Bioinformatics. 2019 Nov 6;20(1):553. doi: 10.1186/s12859-019-3166-x.

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms.

BMC Bioinformatics. 2015 Feb 21;16(1):58. doi: 10.1186/s12859-015-0492-5.

Comparative analysis of de novo transcriptome assembly.

Sci China Life Sci. 2013 Feb;56(2):156-62. doi: 10.1007/s11427-013-4444-x. Epub 2013 Feb 8.

Combining transcriptome assemblies from multiple de novo assemblers in the allo-tetraploid plant Nicotiana benthamiana.

PLoS One. 2014 Mar 10;9(3):e91776. doi: 10.1371/journal.pone.0091776. eCollection 2014.

引用本文的文献

A Potent Antibacterial Peptide (P6) from the De Novo Transcriptome of the Microalga .

Int J Mol Sci. 2024 Dec 23;25(24):13736. doi: 10.3390/ijms252413736.

Comprehensive Analysis of the Influence of Technical and Biological Variations on De Novo Assembly of RNA-Seq Datasets.

Bioinform Biol Insights. 2024 Dec 5;18:11779322241274957. doi: 10.1177/11779322241274957. eCollection 2024.

De novo transcriptome assembly and differential gene expression analysis in different developmental stages of Agriotes sputator (click beetle).

Sci Rep. 2024 Oct 18;14(1):24451. doi: 10.1038/s41598-024-74495-1.

Effects of cell morphology, physiology, biochemistry and genes on four flower colors of .

Front Plant Sci. 2024 Mar 1;15:1343830. doi: 10.3389/fpls.2024.1343830. eCollection 2024.

Comparative Analysis and Phylogenetic Study of and Mitochondrial Genomes.

Int J Mol Sci. 2024 Mar 5;25(5):3004. doi: 10.3390/ijms25053004.

Improved meta-analysis pipeline ameliorates distinctive gene regulators of diabetic vasculopathy in human endothelial cell (hECs) RNA-Seq data.

PLoS One. 2023 Nov 9;18(11):e0293939. doi: 10.1371/journal.pone.0293939. eCollection 2023.

Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis.

Sci Rep. 2023 Jul 31;13(1):12415. doi: 10.1038/s41598-023-39620-6.

Comparative Transcriptomics of Multi-Stress Responses in and .

Int J Mol Sci. 2023 Jul 11;24(14):11323. doi: 10.3390/ijms241411323.

Development of EST-SSRs based on the transcriptome of Castanopsis carlesii and cross-species transferability in other Castanopsis species.

PLoS One. 2023 Jul 20;18(7):e0288999. doi: 10.1371/journal.pone.0288999. eCollection 2023.

Construction of a de novo assembly pipeline using multiple transcriptome data sets from Cypripedium macranthos (Orchidaceae).

PLoS One. 2023 Jun 6;18(6):e0286804. doi: 10.1371/journal.pone.0286804. eCollection 2023.

本文引用的文献

De novo sequence assembly and characterization of the floral transcriptome in cross- and self-fertilizing plants.

BMC Genomics. 2011 Jun 7;12:298. doi: 10.1186/1471-2164-12-298.

Transcriptomic analysis of autistic brain reveals convergent molecular pathology.

Nature. 2011 May 25;474(7351):380-4. doi: 10.1038/nature10110.

Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883.

Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing.

Proc Natl Acad Sci U S A. 2011 May 31;108(22):9172-7. doi: 10.1073/pnas.1100489108. Epub 2011 May 12.

Comparative functional genomics of the fission yeasts.

Science. 2011 May 20;332(6032):930-6. doi: 10.1126/science.1203357. Epub 2011 Apr 21.

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds.

BMC Genomics. 2011 Feb 28;12:131. doi: 10.1186/1471-2164-12-131.

De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification.

DNA Res. 2011 Feb;18(1):53-63. doi: 10.1093/dnares/dsq028. Epub 2011 Jan 7.

The developmental transcriptome of Drosophila melanogaster.

Nature. 2011 Mar 24;471(7339):473-9. doi: 10.1038/nature09715. Epub 2010 Dec 22.

The developmental dynamics of the maize leaf transcriptome.

Nat Genet. 2010 Dec;42(12):1060-7. doi: 10.1038/ng.703. Epub 2010 Oct 31.

De novo assembly and analysis of RNA-seq data.

Nat Methods. 2010 Nov;7(11):909-12. doi: 10.1038/nmeth.1517. Epub 2010 Oct 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

优化从头转录组组装从短读 RNA-Seq 数据：一项比较研究。

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献