Division of Genomic Technologies, RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan.
RIKEN Omics Science Center, Yokohama, Kanagawa 230-0045, Japan.
Sci Data. 2017 Aug 29;4:170112. doi: 10.1038/sdata.2017.112.
In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.
在 FANTOM5 项目中,人类和小鼠基因组中的转录起始事件以单碱基分辨率进行了绘制,并通过 CAGE(基因表达的帽分析)与单分子测序进行了监测,以监测其频率。大约三千个样本,包括各种原代细胞、组织、细胞系以及细胞激活和发育过程中的时间序列样本,都经过了 CAGE 数据生产的统一处理。分析流程从测量 RNA 提取物开始,以评估其质量,然后继续进行 CAGE 文库的生产,使用机器人或手动工作流程、单分子测序和计算处理来生成转录起始的频率。所得数据代表了哺乳动物细胞中每个分析状态下转录调控的结果。在 CAGE 图谱上识别出大约 200000 个和 150000 个峰(人类和小鼠基因组),这些峰无重叠,并进行了注释,以提供已知启动子和新启动子的精确位置,并量化它们的活性。