Department of Poultry Science, Texas A &M University College Station, TX 77843-2472, USA.
BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-12-S10-S5.
RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq.
Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads.
The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.
RNA-Seq 是一种最近开发的高通量测序技术,可用于对任何生物体的整个转录组进行分析。它比当前基于杂交的方法(如微阵列)具有几个主要优势。然而,RNA-Seq 的每个样本的成本仍然令大多数实验室望而却步。随着序列输出的持续改进,如果可以在单个泳道中对多个样本进行多路复用并进行足够的转录组覆盖测序,那么成本效益将非常高。本分析的目的是评估通过 RNA-Seq 检测鸡基因表达谱所需的测序深度。
最初对鸡肺的两个 cDNA 文库进行了测序,分别产生了 490 万(M)和 160 万(60bp)个读数。随着测序技术的显著改进,两个技术重复的 cDNA 文库被重新测序。两个样本共获得 2960 万(75bp)和 2870 万(75bp)个读数。在数据集中,超过 90%的注释基因被检测到,而在数据集中,只有 68%的基因被检测到。在同一样本中,技术重复之间的基因表达的相关系数为 0.9458 和 0.8442。为了评估用于 mRNA 分析的适当深度,使用随机抽样方法从每个样本中生成不同数量的读数。除高度丰富的基因外,所有基因的相关系数从 160 万到 1000 万的测序深度均显著增加。从 1000 万到 2000 万(75bp)的读数深度没有明显提高。
当前研究的分析表明,3000 万(75bp)个读数足以检测鸡肺中的所有注释基因。1000 万个(75bp)个读数可以检测到大约 80%的注释鸡基因,并且该深度的 RNA-Seq 可以替代微阵列技术。此外,测序深度对低丰度基因的表达测量有显著影响。最后,实验和模拟方法的结合是解决测序深度与转录组覆盖关系的有力方法。