Suppr超能文献

橡胶树中实现最佳转录组覆盖所需的RNA测序读长深度

RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis.

作者信息

Chow Keng-See, Ghazali Ahmad-Kamal, Hoh Chee-Choong, Mohd-Zainuddin Zainorlina

机构信息

Biotechnology Unit, Malaysian Rubber Board, Rubber Research Institute of Malaysia, Experiment Station, Kuala Lumpur 47000, Sungai Buloh, Selangor, Malaysia.

出版信息

BMC Res Notes. 2014 Feb 1;7:69. doi: 10.1186/1756-0500-7-69.

Abstract

BACKGROUND

One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.

FINDINGS

We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.

CONCLUSIONS

We devised a procedure, the "transcript mapping saturation test", to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5-8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.

摘要

背景

从头组装转录组的一个问题是确定所需的读段序列数量,以确保全面覆盖特定样本中表达的基因。在本报告中,我们描述了如何使用来自巴西橡胶树树皮的Illumina双端RNA测序(PE RNA-Seq)读段,设计一种转录本映射方法,以估计深度转录组覆盖所需的读段数量。

研究结果

我们使用Oases组装器,在一系列k-mer大小的基础上,基于16 Gb的Illumina PE RNA-Seq读段,优化了橡胶树树皮转录组的组装。然后,我们根据转录本N50长度和转录本映射统计数据,评估组装质量,这些统计数据与(a)具有完整开放阅读框的已知橡胶树cDNA、(b)一组核心真核基因和(c)橡胶树基因组支架相关。接下来是一个系统的转录本映射过程,其中将一系列增量树皮转录本的子组装与整个树皮转录组组装的转录本进行比对。该实验用于将读段数量与转录本映射水平相关联,后者是样本中表达的基因转录本覆盖程度的指标。随着读段数量或数据量增加到16 Gb,映射到整个树皮组装的转录本数量接近饱和。随后生成了一个颜色矩阵,以说明与总样本转录本覆盖程度相关的测序深度要求。

结论

我们设计了一种程序,即“转录本映射饱和度测试”,以估计深度覆盖转录组所需的RNA-Seq读段数量。对于橡胶树的从头组装,我们建议生成5-8 Gb的读段,通过优化k-mer和转录本N50长度,可以实现约90%的转录本覆盖。该方法背后的原理也可应用于其他非模式植物,或来自其他第二代测序平台的读段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e840/3926681/667d664b614d/1756-0500-7-69-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验