Suppr超能文献

优化从头转录组组装从短读 RNA-Seq 数据:一项比较研究。

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

机构信息

Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China.

出版信息

BMC Bioinformatics. 2011 Dec 14;12 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-12-S14-S2.

Abstract

BACKGROUND

With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.

RESULTS

To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.

CONCLUSIONS

Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

摘要

背景

随着下一代测序技术的快速发展,高通量 RNA 测序已成为研究转录组的一种强大且经济有效的方法。从头组装转录本为没有参考基因组的生物的转录组分析提供了重要的解决方案。然而,人们对不同变量如何影响组装结果缺乏了解,也没有就如何根据 RNA-Seq 数据的特性通过选择软件工具和合适的策略来获得最佳解决方案达成共识。

结果

为了揭示不同程序在转录组组装方面的性能,本工作分析了一些重要因素,包括 k-mer 值、基因组复杂度、覆盖深度、定向读取等。测试了七种程序条件、四种单 k-mer 组装器(SK:SOAPdenovo、ABySS、Oases 和 Trinity)和三种多 k-mer 方法(MK:SOAPdenovo-MK、trans-ABySS 和 Oases-MK)。虽然小和大 k-mer 值分别更适合于重建低表达和高表达转录本,但 MK 策略几乎适用于所有表达五分位的范围。在 SK 工具中,Trinity 在各种条件下表现良好,但运行时间最长。Oases 消耗的内存最多,而 SOAPdenovo 运行时间最短,但重建全长 CDS 的效果不佳。ABySS 在资源使用和组装质量之间显示出一些良好的平衡。

结论

本工作比较了公开可用的转录组组装程序的性能,并分析了影响从头组装的重要因素。提出了一些从短读 RNA-Seq 数据进行转录本重建的实用指南。使用一些优化方法大大提高了中华绒螯蟹转录组的从头组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2703/3287467/6ee280a42337/1471-2105-12-S14-S2-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验