Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America.
PLoS One. 2013 Aug 19;8(8):e71745. doi: 10.1371/journal.pone.0071745. eCollection 2013.
The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina's mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12-20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17-20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5-6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20-25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and "novel" lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
多聚 A 选择测序是最常用的文库制备方法。对于有限量或降解的 RNA,已经开发了替代方案,如 NuGEN。然而,不同的文库制备方法如何影响 RNA 测序的广泛应用的下游分析尚不清楚。
用 Illumina 的 mRNA-Seq 多聚 A 选择和 NuGEN ENCORE 文库制备对 8 个人类乳腺上皮细胞 (HMEC) 系进行测序。进行了以下分析和比较:1) 每种方案捕获的基因数量;2) 方案对生物重复之间差异表达基因检测的影响;3) 表达单核苷酸变异 (SNV) 的检测;4) 非编码 RNA,特别是 lincRNA 的检测;和 5) 基因内基因表达。
NuGEN 方案的序列比对率较低 (75%),而多聚 A 方案的比对率超过 90%。NuGEN 方案检测到的基因较少 (少 12-20%),大部分reads 映射到非编码区。两种方案之间检测到大量差异表达基因。两种方案之间检测到的生物重复之间的差异表达基因中约有 17-20%是共同检测到的。在 NuGEN 样本中检测到的 SNV 数量显著增加 (5-6 倍),主要来自基因内和基因间区。NuGEN 捕获的外显子较少 (少 25%),并且碱基水平覆盖度变化较大。在 PolyA 样本中,有 6.3%的reads 映射到基因内区,而在 NuGEN 样本中,这一比例要高得多 (20-25%)。NuGEN 方案没有检测到更多已知的非编码 RNA,如 lincRNA,但针对小的和“新”的 lincRNA。
不同的文库制备方法会对 RNA-seq 数据的下游分析和解释产生重大影响。NuGEN 为有限量或降解的 RNA 提供了一种替代方案,但它在某些 RNA-seq 应用中存在局限性。