Faulkner Geoffrey J, Forrest Alistair R R, Chalk Alistair M, Schroder Kate, Hayashizaki Yoshihide, Carninci Piero, Hume David A, Grimmond Sean M
The Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia.
Genomics. 2008 Mar;91(3):281-8. doi: 10.1016/j.ygeno.2007.11.003.
Cap analysis gene expression (CAGE) is a high-throughput, tag-based method designed to survey the 5' end of capped full-length cDNAs. CAGE has previously been used to define global transcription start site usage and monitor gene activity in mammals. A drawback of the CAGE approach thus far has been the removal of as many as 40% of CAGE sequence tags due to their mapping to multiple genomic locations. Here, we address the origins of multimap tags and present a novel strategy to assign CAGE tags to their most likely source promoter region. When this approach was applied to the FANTOM3 CAGE libraries, the percentage of protein-coding mouse transcriptional frameworks detected by CAGE improved from 42.9 to 57.8% (an increase of 5516 frameworks) with no reduction in CAGE to microarray correlation. These results suggest that the multimap tags produced by high-throughput, short sequence tag-based approaches can be rescued to augment greatly the transcriptome coverage provided by single-map tags alone.
帽分析基因表达(CAGE)是一种基于标签的高通量方法,旨在检测带帽全长cDNA的5'端。CAGE此前已用于确定哺乳动物中全局转录起始位点的使用情况并监测基因活性。迄今为止,CAGE方法的一个缺点是由于其映射到多个基因组位置,多达40%的CAGE序列标签被去除。在这里,我们探讨了多映射标签的起源,并提出了一种新策略,将CAGE标签分配到其最可能的源启动子区域。当将这种方法应用于FANTOM3 CAGE文库时,CAGE检测到的蛋白质编码小鼠转录框架的百分比从42.9%提高到57.8%(增加了5516个框架),而CAGE与微阵列的相关性没有降低。这些结果表明,由高通量、基于短序列标签的方法产生的多映射标签可以被挽救,从而大大增加仅由单映射标签提供的转录组覆盖范围。