Suppr超能文献

FANTOM3中的转录本注释:基于物理cDNA的小鼠基因目录。

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

作者信息

Maeda Norihiro, Kasukawa Takeya, Oyama Rieko, Gough Julian, Frith Martin, Engström Pär G, Lenhard Boris, Aturaliya Rajith N, Batalov Serge, Beisel Kirk W, Bult Carol J, Fletcher Colin F, Forrest Alistair R R, Furuno Masaaki, Hill David, Itoh Masayoshi, Kanamori-Katayama Mutsumi, Katayama Shintaro, Katoh Masaru, Kawashima Tsugumi, Quackenbush John, Ravasi Timothy, Ring Brian Z, Shibata Kazuhiro, Sugiura Koji, Takenaka Yoichi, Teasdale Rohan D, Wells Christine A, Zhu Yunxia, Kai Chikatoshi, Kawai Jun, Hume David A, Carninci Piero, Hayashizaki Yoshihide

机构信息

Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan.

出版信息

PLoS Genet. 2006 Apr;2(4):e62. doi: 10.1371/journal.pgen.0020062.

Abstract

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

摘要

国际FANTOM联盟旨在基于广泛的cDNA文库以及全长富集cDNA的功能注释,构建哺乳动物转录组的完整图谱。之前的数据集FANTOM2包含60,770条全长富集cDNA。功能注释显示,该cDNA数据集仅包含约一半估计数量的小鼠蛋白质编码基因,这表明仍有许多cDNA有待收集和鉴定。为了获取涵盖所有预测小鼠基因的完整基因目录,自FANTOM2以来一直在继续进行全长富集cDNA的克隆和测序。在FANTOM3中,对42,031条新分离的cDNA进行了功能注释,并更新了4,347条FANTOM2 cDNA的注释。为了实现准确的功能注释,我们通过引入新的编码序列预测程序改进了自动注释流程,并开发了基于网络的注释界面以简化注释程序,减少人工注释错误。自动编码序列和功能预测之后是由专业编审进行人工管理和审核。总共对102,801条全长富集小鼠cDNA进行了注释。在102,801条转录本中,有56,722条在功能上被注释为蛋白质编码(包括部分或截短的转录本),据我们所知,这是目前全长cDNA对小鼠蛋白质组的最大覆盖范围。不同的非蛋白质编码转录本总数增加到34,030条。由自动计算预测、人工管理和最终专家审核组成的FANTOM3注释系统,有助于对小鼠转录组进行全面表征,并可应用于其他物种的转录组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5867/1449903/916553ce933f/pgen.0020062.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验