Kawaji Hideya, Lizio Marina, Itoh Masayoshi, Kanamori-Katayama Mutsumi, Kaiho Ai, Nishiyori-Sueki Hiromi, Shin Jay W, Kojima-Ishiyama Miki, Kawano Mitsuoki, Murata Mitsuyoshi, Ninomiya-Fukuda Noriko, Ishikawa-Kato Sachi, Nagao-Sato Sayaka, Noma Shohei, Hayashizaki Yoshihide, Forrest Alistair R R, Carninci Piero
RIKEN Preventive Medicine and Diagnosis Innovation Program, Saitama 351-0198, Japan;
Genome Res. 2014 Apr;24(4):708-17. doi: 10.1101/gr.156232.113. Epub 2014 Mar 27.
CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5' end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.
CAGE(帽分析基因表达)和RNA测序是用于识别转录本丰度及结构的两项主要技术。它们通过对带帽分子的5'端(CAGE)或沿转录本长度随机分布的标签(RNA测序)进行测序来测量表达水平。用于克隆扩增的第二代测序平台(Illumina、SOLiD、454生命科学公司[罗氏公司]、Ion Torrent)的文库构建方案通常在克隆扩增之前采用PCR预扩增,而第三代单分子测序仪可以对未扩增的文库进行测序。尽管这些转录组分析平台已被证明各自具有可重复性,但尚未对它们进行系统的比较。在这里,我们使用第二代和第三代测序仪比较CAGE,并使用基于来自两个人类细胞系的一组RNA混合物的第二代测序仪比较RNA测序,以检验区分生物学状态、检测差异表达基因、测量线性度和定量可重复性的能力。我们发现跨平台的基因表达定量水平在很大程度上具有可比性,并得出结论,CAGE和RNA测序是互补技术,可用于改进不完整的基因模型。我们还在第二代和第三代平台中发现了系统偏差,这可能是由于接头连接、限制性酶切和PCR扩增等步骤导致的。本研究提供了这些平台性能的相关观点,这将是设计进一步实验以处理在广泛细胞类型中发现的复杂转录组的基线。