OMICs Science Center, RIKEN Yokohama Institute, Tsurumi-ku, Yokohama, Japan.
Genome Res. 2011 Jul;21(7):1150-9. doi: 10.1101/gr.115469.110. Epub 2011 May 19.
We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 μg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-μg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-μg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5' associated, we also observe a low level of signal on exons that is useful for defining gene structures.
我们报告了一种简化的帽状分析基因表达 (CAGE) 方案的开发,该方案适用于单分子测序仪,避免了第二链合成、连接、消化和 PCR。HeliScopeCAGE 直接对被帽捕获的第一链 cDNA 的 3' 端进行测序。与之前的 CAGE 版本一样,我们比已知模型更好地定义了转录起始位点 (TSS),确定了转录的新区域和替代启动子,并发现了两种主要的 TSS 信号类型,即尖锐峰和宽峰区。然而,使用这种方案,我们观察到了个体 TSS 位置更精细水平的可重复调节证据。文库在 5 个数量级上具有定量性,并且高度可重复 (Pearson 相关系数为 0.987)。我们还将样品需求降低到 5μg 总 RNA 用于标准 HeliScopeCAGE 文库,100ng 用于低量版本。当相同的 RNA 以 5μg 和 100ng 版本运行时,100ng 仍然能够以相同的阈值检测到 13468 个基因座中约 60%的表达,允许对即使是罕见的细胞群体进行比较分析。在三重复孔的 HeLa 和 THP-1 样本上对该方案进行差异基因表达测量的测试,我们发现与 Illumina 微阵列测量相比,对数倍变化高度相关 (0.871)。此外,HeliScopeCAGE 发现了数千个更多基因座的差异表达,包括阵列上的探针。最后,尽管大多数标签与 5' 相关,但我们也观察到外显子上的低水平信号,这对定义基因结构很有用。