Suppr超能文献

人类基因的转录组分析及其在蛋白质组分析中的应用。

Transcriptome analyses of human genes and applications for proteome analyses.

作者信息

Suzuki Yutaka, Sugano Sumio

机构信息

Laboratory of Functional Genomics, Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639, Japan.

出版信息

Curr Protein Pept Sci. 2006 Apr;7(2):147-63. doi: 10.2174/138920306776359795.

Abstract

By utilizing recently developed full-length cDNA technologies, large-scale cDNA sequencing was carried out by several cDNA projects. Now full-length cDNA resources cover the major part of the protein-coding human genes. Comprehensive analyses of the collected full-length cDNA data revealed not only the complete sequences of thousands of novel gene transcripts but also novel alternatively spliced isoforms of hitherto identified genes. However, it was not as easy as expected to deduce their encoded amino acid sequences based solely on the full-length cDNA sequences. It was neither always the case that the longest open reading frame corresponded to the real protein coding region nor that the first ATG was the translation initiator codon. Also, proteome-wide mass-spectrometry analysis has shown that there is an unexpectedly large population of small proteins, encoded by so-called upstream open reading frames, within the cell. Since sound manual annotations by experts were still indispensable to address these problems, an international meeting to make transcriptome-wide functional annotations of cDNAs was held, namely the H-invitational. In this meeting, functional annotations were made both manually and computationally for most of the pre-existing full-length cDNAs collected from world-wide cDNA projects. The achieved integrated information for each of the cDNAs was published as a database. It was also shown that the full-length cDNA data were useful for identifying alternative splicing variants, exact transcriptional start sites of the mRNAs and the adjacent promoter regions. Rapidly accumulating genome data as well as versatile use of the transcriptome information will shortly lay a firm foundation for proteome-level understanding of human gene networks.

摘要

通过利用最近开发的全长cDNA技术,多个cDNA项目开展了大规模cDNA测序。现在,全长cDNA资源覆盖了人类蛋白质编码基因的大部分。对收集到的全长cDNA数据进行的综合分析不仅揭示了数千种新基因转录本的完整序列,还发现了迄今已鉴定基因的新的可变剪接异构体。然而,仅基于全长cDNA序列推断其编码的氨基酸序列并不像预期的那么容易。最长的开放阅读框并不总是对应于真正的蛋白质编码区,第一个ATG也不总是翻译起始密码子。此外,全蛋白质组质谱分析表明,细胞内存在大量由所谓上游开放阅读框编码的小蛋白质,数量出乎意料。由于专家进行可靠的人工注释对于解决这些问题仍然不可或缺,因此召开了一次国际会议,对cDNA进行全转录组功能注释,即H-会议。在这次会议上,对从世界各地cDNA项目收集的大多数现有全长cDNA进行了人工和计算功能注释。为每个cDNA获得的综合信息作为一个数据库发布。还表明,全长cDNA数据对于鉴定可变剪接变体、mRNA的确切转录起始位点和相邻启动子区域很有用。快速积累的基因组数据以及转录组信息的广泛应用将很快为在蛋白质组水平上理解人类基因网络奠定坚实基础。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验