RIKEN Center for Life Science Technologies, Division of Genomic Technologies, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
Sci Data. 2017 Aug 29;4:170107. doi: 10.1038/sdata.2017.107.
The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.
FANTOM5 联盟通过使用具有单分子测序功能的 CAGE(基因表达分析)描述了人类和小鼠的启动子水平表达图谱。在原始出版物中,分别使用 GRCh37/hg19 和 NCBI37/mm9 组装作为人类和小鼠的参考基因组;后来,基因组参考联盟发布了更新的基因组组装 GRCh38/hg38 和 GRCm38/mm10。为了提高图谱在未来研究中的实用性,我们重新处理了数据,以便在最近的基因组组装上使用。这些数据包括基于 CAGE 读取重-align 的转录起始位点 (TSS) 的观测频率,以及基于先前参考转换而来的 TSS 峰。峰名称的注释也基于最新的公共数据库进行了更新。重新处理的结果使我们能够在最近的基因组组装上检查转录起始的频率,并在整个基因组组装上一致地使用更新的信息来参考启动子。