Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, Florida, USA.
BMC Genomics. 2011 Jun 6;12:293. doi: 10.1186/1471-2164-12-293.
RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript.
In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage.
Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases.
RNA-seq 正在彻底改变我们研究转录组的方式。无需事先了解基因转录本,即可对 mRNA 进行调查。正在报告转录本异构体的选择性剪接和以前未知外显子的鉴定。报告了样品之间外显子使用和剪接的差异以及样品之间的定量差异的初始报告。据报道,生物变异大于技术变异。此外,由于随机抽样,技术变异被报道符合预期。但是,处理技术变异的策略将根据其大小而有所不同。本文研究了技术方差的大小和采样的作用。
在这项研究中,分析了三个包含技术重复的独立 Solexa/Illumina 实验。当覆盖率低时,技术重复之间的差异非常明显。当覆盖率小于每个核苷酸 5 个读数时,技术重复之间的外显子检测高度可变,并且当覆盖率低时,基因表达的估计值更有可能不一致。尽管在所有覆盖水平上都观察到表达的估计值存在较大差异。
技术可变性太高,不容忽视。技术变异性导致在低覆盖水平下外显子的检测不一致。此外,即使覆盖率水平较高,转录本的相对丰度的估计也可能存在很大差异。这可能是由于采样分数低,如果是这样,即使下一波技术产生更多的读数,它仍将作为一个需要在实验设计中解决的问题而持续存在。我们提供了处理技术可变性的实用建议,而不会大幅增加成本。